The Fast Country

Posts

Useful Links: Going faster with continuous delivery

Just thought I would share a blog post on how Amazon does continuous deployment. The title of the article highlights a key goal: faster deployment of completed features. This is a key metric that identifies high performing teams, i.e., deployment latency. In her book, Accelerate: The Science of Lean Software and DevOps, Nicole Fosgren identified this as one of four highly predictive metrics for high performing software teams. The section on risk management is especially worthwhile. The risk reduction strategies mentioned in the article can be implemented with AWS Code Pipeline and/or Kubernetes Deployments. https://aws.amazon.com/builders-library/going-faster-with-continuous-delivery/

Useful Links: Deploys - It’s Not Actually About Fridays

This: https://charity.wtf/2019/10/28/deploys-its-not-actually-about-fridays/ Read. Contemplate. Incorporate. Seriously, there are 4 metrics that reliably indicate a high function software organization (see Accelerate, by Fosgren, et al., for details - https://www.amazon.com/Accelerate-Software-Performing-Technology-Organizations-ebook/dp/B07B9F83WM ): Lead time for changes Deployment frequency Time to restore service Change failure rate This article addresses the 'change failure rate' one, by improving the first two with observability tooling.

Useful Links: Microservices Prerequisites

This is a great article describing what capabilities a team has to have in order to run a system which has a microservices architecture: https://martinfowler.com/bliki/MicroservicePrerequisites.html

Useful Links: Logs and Metrics

This article explains why storing log messages alone is insufficient for robust operation of a software service. Metrics also need to be gathered and stored. https://medium.com/@copyconst…/logs-and-metrics-6d34d3026e38 tl;dr - Log volume can spike dramatically when user activity increases, especially when things go wrong. This makes it possible for an alerting system based on logs to be swamped. For a metrics system, volume increases with the number of metrics collected. This is stable and much less likely to fail or slow down during a crisis.

Useful Links: The Practice of Practice

This is a very interesting talk on practicing for Operational events. The speaker draws parallels with musicians practicing for a performance: https://www.youtube.com/watch?v=87EhBrC2L1U

Useful Links: Logging Rules of Thumb

Some very useful advice in here for developers. https://engineering.hellofresh.com/logging-rules-of-thumb-f6c0f71a2351