Sunday, January 1, 2017

Architecting for Continuous Delivery - Jez Humble - Summary

I got a link to this video from Nont just before the Christmas weekend and watched it a few days ago.



Jez is the author of Continuous Delivery and Lean Enterprise. In this video, he talked about Architecting for Continuous Delivery in DevOps Enterprise Summit on 20 Oct 2015.

He brought the DevOps topic to another level so DevOps is not just how to make continuous integration and continuous delivery but it also involves changes in organizational culture and system architecture to support such process.

Here are what I captured from the video:






He opened the talk with interesting speech: "It doesn't matter how much you could have paid for the DevOps fairy wave of magic wanding, if your architecture does not fundamentally support Continuous Delivery then you're going nowhere". Microservices definitely have a lot of advantages to this but it is also hard with some complexity on it.

Here is his definition of continuous delivery:


CD is also the ability to push changes to Production during normal business hours in fully-automated push-button way without waiting for the evening or the weekend. And anyone should be able to do that any time they want. This is an architectural concern.

In order to achieve this, there are 2 golden rules you have to follow:


Developers must break down their problems into smaller ones so that they can develop and check-in their new features frequently. By doing this, you force developers to talk to each other a lot more even it sounds a bit awkward to get developers to talk with people (instead of coding).

Once you achieved these, there are 3 ingredients you have to care about:


  1. Configuration Management - It should be possible to provision a new hardware into production in fully automated way including system configuration and required softwares installed. This is hard but achievable.
  2. Continuous Integration - This is a practice that everyone working on the same trunk of software has a comprehensive set of automated tests that run fast and give you a fast feedback so you can fix the issue immediately if you break something. This is super hard and it is not just about running Jenkins on your software trunk but it is also to make sure your software is always in a workable state,
  3. Automated Testing - To do the CI, you need a comprehensive test automations at multiple different levels and this is sometimes painful and quite a lot expensive.

Once we do this, we then create the deployment pipeline:


Once checked-in, it runs automated unit test. If it fails then it won't allow you to check in and you need to fix that. If it passes, then it triggers automated acceptance tests.

Automated tests will give you the high level of confidence that your software is working and you can get people to focus more on the non-functional tests like exploratory test, performance test, usability test, for example.

In order to validate our golden rules, there are architectural considerations that you need to follow:


  • The software must be testable. This is where the technology like Docker can be used to replicate enough of your production environment into your development machine that you can actually do the testing which give you some level of confidence that the software is deployable. If you have to buy or provision an expensive integration environment to get confidence that your software is releasable then this is your architectural problem that you need to fix.
  • The software must be deployable like a pushbutton affair. You can't just do this by replacing the complex painful manual process with complex painful automated process. If your system architecture is so tightly coupled and complex and you try to automate it you will find it is extremely painful and difficult and it may not gonna work.

Microservices is one way to achieve this goal. It is not the only way but if you're interested in it, here is a good book to read:


This one sentence encapsulates 90% of everything you need to know to build a web-scale architecture:


In order to archive this, we need to go back to the most fundamental principle of building software that we know since 70's like componentization, modulalization, and services.


By decomposing system into components and services, this makes the system more maintainable by the better encapsulation and lower coupling and make it easier to build and test against what have been built to get a fast feedback.

More importantly, it also enables collaboration at scale. To understand why architecture enables collaboration at scale we need to take a look into Conveys law which probably the most important law in architecturing:


If you have 4 teams building 4 compilers, you will end up in 4 past compilers. The essential point we trying to make is done by Rebecca Parsons:


If you have a bunch of teams in different parts of the world. Within those teams, you have high communication bandwidth. Between the team, you have relatively low communication. If you want to add a feature which require high communication bandwidth so how do you make sure that your organization is structured so then you don't require low bandwidth communication channel to make things work.

At Amazon, they solved this by having one team per service. A service means an end-to-end implementation including all different layers. Different services can have different internal platforms. If you want to add a new feature you tend to add the feature to one service (if you decompose your system correctly). That team have a very high communication bandwidth.


But the other teams who depends on that service do not need to know what's happening inside the service. They just need to know what the API changes are. This is the key: You rely on low bandwidth communication channel across the teams with your API boundary.

You should avoid splitting team based on architecture layers like web team, middleware team, database team, etc. or decomposing team by function and having Dev team, Test team and Operation team. Don't organize by function or architecture later if you want to move fast.

There are 2 ways to combind those components and modules. In realistic, it is not just one end or another but rather a spectrum between the two ends that you end up somewhere along the spectrum.

Bind Component at Run Time (microservice)

One end is you bind the components at the runtime like microservices which you can independently deploy each service.

The "Death Star" in the slide below represents the services at Amazon.com and their relationships.


The key practice is your team is responsible for making sure that downstream teams don't break. It is your problem, not their problem. You can use technique like API versioning and you need to think about this since the beginning because API versioning may not solve the problem for some technologies.

You have to also think about monitoring. If you have a performance problem in the Death Star, one thing to note that this usually not from just one service but rather from the interactions between the bunch of them. In this case, you can't just investigate the problem by tracing down and see their latency of each service. So there's also a complexity with this model (microservices).

Bind Components at Build Time (Monolithic)

The other end is to bind the components at build time where you build all components into a big binary file and deploy that. This is the technique used by Facebook several years ago.

If you decide to deploy a large binary file, you should be able to see any interaction problem between components as fast as you can by CI process.

At Google, a couple of years ago, they have CI in place where they have 200K tests in the code which run 10M times per day. This also requires a lot of effort upfront.


The practice is if I break something, they they can revert my change from the version control.

Let's turn to "Unreliable Platform" topic. This book tells you about how to deal with the unreliable platform:


There are architectural strategies to maintain these architectural attributes. If you develop an application that is not fundamentally scalable, resilience, etc. You can't just pay any mount of DevOps money to get these attributes in.

These things must be think about since the beginning to understand how the production works. In order to do this, one way is to deploy as earlier as you can so you can run production-like tests on it.

Most of us have to move from A to B, from Monolithic to Microservices. So how to do this?

In 2001, Amazon has a monolithic architecture with a single DB that could not be scaled anymore. That motivated them to re-architecture their system from monolithic to service-oriented architecture.

Jeff Bezos is a smart guy. He sent this out to his technical teams when he re-arcitecturing his Amazon.com:


One thing to note specifically is the point no. 5. The reason why Amazon Web Service is so successful is that they built everything from the ground up to make money and to be something that customers want to use.

The other rule he love follow is:


The team that build the software must run it on production. This is not the only way to do. You can have high-performing organization with functional silos where they collaborate for greater interests of their customers.

However, if it does not happen that way, this is one of the counter measure to force you to go that way. Having the team co-located is one way you can follow the Convey's law explicitly and this is pretty much what they did.

This took 4 years of transition from 2001 to 2005 at Amazon. Netflix did the same thing in just 2 years.

There is a pattern to change the organization like this picture:


There is a tree growing up there. One day, there is a bird came and poop on the tree with a seed of strangler tree then the stranger grows up and kill the tree. This is how we get rid of the horrible monolith.


When we build a new functionality, we are not gonna add it to the existing app. But we gonna build the new service around the app using SOLID principles, componentization, good encapsulation, type coupling, etc.

Of course, we definitely don't go and rebuild new stuffs in a new architecture, especially not without enough research on how customers will use it. Otherwise, you are rebuilding the system designed for the people 10 years ago. So always do the research and build new stuffs as they come in and in the mean time you strangle the old things.

Another thing is there is no end state for this. The idea to get a perfect architecture is false! The reality is, every states of evolution of the organization require different architectures. The architecture is always evolving. There is no To-Be architecture diagram.

We should always follow our basic fundamental principles i.e. componentization and make sure the software is testable and deployable regardless of architecture it is using.


Related Blog

Related Articles

6 comments: