Skip to main content

Useful Links: Trade-offs Under Pressure

These two posts dive into John Allspaw's (previous Head of Engineering at Etsy) Masters Thesis on heuristics on decision making under pressure, specifically in the context of dealing with an outage to a software service: https://blog.acolyer.org/2020/01/22/trade-offs-under-pressure-part-1/ and https://blog.acolyer.org/2020/01/24/trade-offs-under-pressure-part-2/ There are two noteworthy aspects to this: firstly the subject matter itself is useful. It identifies heuristics that engineers use to make trade-offs during outages. The second noteworthy thing is the methodology used: it demonstrates both an excellent methodology for conducting incident reviews. The visualization and classification of the timeline is very informative.

Comments

Popular posts from this blog

First Post

Hello and welcome to the inane ramblings of an Irish software developer. The title of the blog comes from Lewis Carroll's, Through the Looking Glass . In the book, Alice goes running with the Red Queen, but they don't seem to make any progress. Alice remarks on this, saying, "Well in our country, you'd generally get to somewhere else - if you ran very fast for a long time as we've been doing." The Red Queen replies, "A slow sort of country. Now, here, you see, it takes all the running you can do, to stay in the same place." The Red Queen Effect is quite applicable to the software industry, and as I probably will be talking quite a bit about the software industry, I thought it would be a good name for a blog. I have a few objectives for my new blog. By writing here, I hope to learn how to write well. That is, I hope to learn how to write clearly and concisely, and be interesting at the same time. I also hope that this blog will become a good prof

Operational Metrics and Alerts for Distributed Software Systems

This post will be about operational metrics and alerts for distributed software systems. What do I mean by that? I mean the metrics and alerts that allow operations personel to detect failure of of a distributed software system and helps them to quickly diagnose what is wrong. Metrics The metrics are measurements of characteristics of the system collected at regular(ish) intervals and stored somewhere for processing - rendering into graphs, triggering alert notifications, etc. Metrics can be divided into 3 categories: input metrics, output metrics, and process metrics. Input metrics are measures of the inputs to the system, for example, the number of user requests, counts of particular characteristics of the requests - where they are from, how large the request data is, counts of particular features in the request (for example, which resources/items/products are being asked for). Output metrics are measures of the output of the system. Examples of these would include orders s

Repost: ANTLR Trinity

This post is a repost of an article I had on a previous incarnation of this blog. I hadn't intended to transfer it over, as the technology is old now (ANTLR is on version 4), but I recently came acros a slide deck online, where the post was referenced, so I am reposting in case anyone was looking for it. There are 3 components to a really useful software development technology: innovative features, clear and comprehensive documentation, and solid tools. The recent release of ANTLR v3.0 is a perfect example of this. This parser generator tool has all 3 components and each component is done superbly. ANTLR is a parser generator tool that is capable of targeting multiple output languages. Out of the box it will generate Java, Python, C, C#, or Ruby code for parsers. Other target languages are possible if the code generators are written. Amongst its cool features are: LL(*) parsing: This is an extension to the normal, top down with looka