Skip to main content

Posts

Showing posts from April, 2020

Useful Links: Going faster with continuous delivery

Just thought I would share a blog post on how Amazon does continuous deployment. The title of the article highlights a key goal: faster deployment of completed features. This is a key metric that identifies high performing teams, i.e., deployment latency. In her book, Accelerate: The Science of Lean Software and DevOps, Nicole Fosgren identified this as one of four highly predictive metrics for high performing software teams. The section on risk management is especially worthwhile. The risk reduction strategies mentioned in the article can be implemented with AWS Code Pipeline and/or Kubernetes Deployments. https://aws.amazon.com/builders-library/going-faster-with-continuous-delivery/

Useful Links: Deploys - It’s Not Actually About Fridays

This: https://charity.wtf/2019/10/28/deploys-its-not-actually-about-fridays/ Read. Contemplate. Incorporate. Seriously, there are 4 metrics that reliably indicate a high function software organization (see Accelerate, by Fosgren, et al., for details - https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations-ebook/dp/B07B9F83WM ): Lead time for changes Deployment frequency Time to restore service Change failure rate This article addresses the 'change failure rate' one, by improving the first two with observability tooling.

Useful Links: Logs and Metrics

This article explains why storing log messages alone is insufficient for robust operation of a software service. Metrics also need to be gathered and stored. https://medium.com/@copyconst…/logs-and-metrics-6d34d3026e38 tl;dr - Log volume can spike dramatically when user activity increases, especially when things go wrong. This makes it possible for an alerting system based on logs to be swamped. For a metrics system, volume increases with the number of metrics collected. This is stable and much less likely to fail or slow down during a crisis.

Useful Links: AWS Cost Optimization 101

An interesting article on AWS cost optimization. I am not in 100% agreement with all of it (re-architecting apps to minimize inter AZ traffic and not using AWS endpoints, for example), but there are some good tips in there: https://cloudonaut.io/aws-cost-optimization-101/

Useful Links: Trade-offs Under Pressure

These two posts dive into John Allspaw's (previous Head of Engineering at Etsy) Masters Thesis on heuristics on decision making under pressure, specifically in the context of dealing with an outage to a software service: https://blog.acolyer.org/2020/01/22/trade-offs-under-pressure-part-1/ and https://blog.acolyer.org/2020/01/24/trade-offs-under-pressure-part-2/ There are two noteworthy aspects to this: firstly the subject matter itself is useful. It identifies heuristics that engineers use to make trade-offs during outages. The second noteworthy thing is the methodology used: it demonstrates both an excellent methodology for conducting incident reviews. The visualization and classification of the timeline is very informative.