Archive of posts with category 'incident management'

The Inevitable - Failures in Distributed Systems

Experiencing failure at scale is as the popular Marvel character Thanos would say “Inevitable”. Memory leaks, software or hardware or network I/O failures are just a few. It’s a problem...

Data Aggregation

TL;DR: Data aggregation is the process of collecting and organizing large sets of data from multiple sources in order to provide a comprehensive view of a particular situation or system....