Operating Systems Research Group
Operating Systems Research on Energy, Reliability and Autonomy
Errlog: Failure diagnosis report release
In this page you will find over 150 well-documented failure diagnosis reports. These failures are randomly sampled real-world user reported failures that are used in our paper (please cite this paper if you are using our failure set in your research):
Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Ding Yuan, Soyeon Park, Peng Huang, Yang Liu, Michael M. Lee, Xiaoming Tang, Yuanyuan Zhou and Stefan Savage. In the Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI'12), Hollywood, CA, October 2012. [PDF]
The failures are from five widely used open source software projects: Apache httpd web server, PostgreSQL database, Squid web cache, SVN (Apache Subversion), GNU Coreutils. In each diagnosis report, we document in detail the symptom and the root cause of the failure, as well as how the execution propagates from the root cause to the failure symptom. If a failure was reproduced by us, we also document the detail procedure of reproducing it. We also describe how an error log message can help the diagnosis. It took 4 authors 4 months of time to diagnose these failures and document the diagnosis report.
Failure set
Ding Yuan, yuan at eecg dot toronto dot edu
Peng Huang, huang at cs dot jhu dot edu