Credits : Josh Calabrese

Cloud-Native — A philosophy that powers a $ Trillion+ economy

If you are like one of the curious minds in the technology space, it is hard to miss the term cloud-native. It is followed by 6.5 million developers around the globe, growing exponentially. According to IDC, over 500 million digital apps and services will be developed and deployed using cloud-native approaches by 2023.

In the middle of the current C-19 pandemic, the four largest public clouds continue to beat expectations and post impressive revenue growth YoY, AWS: 29%, Microsoft Azure: 47%, Google Cloud: 43%, and Alibaba: 59%.

Market Cap of the public cloud industry has grown 25 times in the last decade to reach $ 1 Trillion in February 2020 with a 45% CAGR. And the story doesn’t end here, the market cap is expected to $ 2 Trillion by 2021

Source : Bessemer Ventures

Top 100 private cloud companies are valued at $267 B in 2020, with annual growth of 60% YoY

Source : Bessemer Ventures

So what is Cloud Native?

Cloud-native applications are “citizens of cloud”, which means they are built grounds up keeping in mind the design principles of cloud, architecture, and technologies.

To draw an analogy, movies transformed theatres in 1888. They were not merely assimilation of plays but leveraged all aspects of film-making — Cinematography, Musical effects, Editing, Collaboration, etc. Interestingly, Netflix was the pioneer of cloud-native. They disrupt their business model leveraging cloud-native principles way back in 2008 when their network was down for hours.

Cloud-native is also the shared philosophy of the large growing open-source community CNCF (Cloud Native Computing Foundation), founded in 2015. Their objective is to build the cloud-native ecosystem, through projects, conferences, etc.

By leveraging Cloud-native principles, companies can bring new ideas to market faster and respond sooner to customer demands. Importantly, they offer on-demand access to computing power along with modern data and application services for developers.

Three concepts are critical to understanding — Microservices, Containers, and DevOps

In this long-form article, we will cover these topics :

  1. Deconstruct Microservices, Containers, and DevOps as poets would understand
  2. How Netflix pioneered the Cloud-native movement

Let’s dive deep

Traditionally applications are built with a single codebase where all different functionalities like payment, photos, messages, and all other services mesh together as a single unit of application. This concept is called a monolithic app, pretty much like a 1000 ton boulder with historical updates of redundant logic, thousands of lines of code, written in a single, not so modern programming language, based on outdated software architecture patterns and principles. A monolith has a complex architecture running on a single system to satisfy its compute power, memory, storage, and networking requirements. Scaling such a system is next to impossible. Upgrades, patches, or migrations of these monolith apps result in significant downtimes, requires tighter scheduling of maintenance windows to avoid service disruption to clients

A 1000 ton boulder can be moved, when broken down into small pebbles

Enter microservices...

Loosely coupled pebbles capable of performing the specific business functions. Combine all the rocks and you get the equivalent of a boulder. This concept is not new and has been borrowed from event-driven and service-oriented architecture. Complex applications are composed of small independent processes that can talk with each other via APIs.

This sort of API architecture call for a rehashing of company structures organized around product team with microservices and interacting with each other via API.

Jeff Bezos sent a memo to all their employees now titled the API manifesto which summarizes as … Everyone at Amazon will have APIs to everyone else, if you don’t comply you will be fired.

Microservices presents fundamental advantages :

  1. Separation of Concerns — Modularity and Encapsulation :
  • Each microservices can be developed and written in a modern and the most suitable programming language
  • Data and code are neatly encapsulated together

2. Scalable

  • Each microservice can be scaled individually, either manually or automated
  • Seamless upgrades/ patches: Since upgrades are rolled out seamlessly — one service at a time, rather than having to re-compile, re-build and re-start an entire monolithic application this results in literally no downtime and no service disruption to clients

3. Virtualization and Elasticity :

  • They can be deployed individually on separate servers provisioned on-demand with fewer resources
  • They can scale up and scale down resources as required
Micro-Services Architecture

But monolith applications cannot be run as microservices. The natural route then is to refactor and then migrate. As always, there are two approaches -:

  1. Big Bang — This approach is a high-risk proposition — long delays can affect the business viability of the application
  2. Incremental — new features are developed and implemented in a phased manner and migrate to the cloud. This leads to gradual fading out of monolith as most of its functionalities are modernized into microservices

Some of the critical decisions during the refactoring process are :

  • Which business components to separate from the monolith to become distributed microservices
  • How to decouple the databases from the application to separate data complexity from application logic
  • How to test the new microservices and their dependencies

It is one thing to decouple modules with application logic and database, but the challenge is to keep these modules resilient. These modules cannot have runtime environment conflict between different libraries leading to errors or failures.

A very common response to this problem was virtual machines running single modules per server. However, this is an expensive proposition when it comes to server resources. Sometimes, the OS would consume more resources than the module itself.

Containers

Containers provided a clean cut to the earlier problem of resource allocation and utilization, providing consistent software environments for developers, testers, all the way from Development to Production. They provided encapsulated lightweight runtime for application modules, with multiple applications deployed on the same server, each running in their own execution environments isolated from one another, thus avoiding conflicts, errors, and failures.

Let me draw an analogy,

Shipping containers disrupted global trade by significantly slashing the cost of global transportations. In 1956, an oil tanker carried 58 shipping containers from Newark to Houston Prior to this docks used to be flooded with swarms of workers who would carry the shipments manually. The whole process was haphazard, wasteful, and expensive. Shipping containers, when launched and popularized don’t care which ship they’re on. As long as the ship is sturdy enough all critical parameters to ships goods such as temperature, moisture, etc. were taken care of by the containers. Similarly, Docker containers are self-sufficient. As long as they are hosted on the right OS. That’s their contract with the external environment.

Shipping containers transformed the economics beyond cost advantages. New global consumption patterns were created, new ports gained prominence, new industries sprung up. Similarly, containers transformed the cloud migration process giving more power to developers and powered the dev-ops philosophy.

Containers also provide higher server utilization, scalability of modules, interoperability, and ease of integration. One of the popular ways of setting up containers is via dockers. One has to just build the docker configuration once and all developers can run the system on a docker run command irrespective of the underlying operating system.

Following diagram illustrates the difference between Virtual machines and Containers

Meet one such orchestra conductor for containers — Kubernetes

It is very hard to manually maintain hundreds and thousands of containers running on global infrastructure

Although we can manually maintain a couple of containers or write scripts to manage the lifecycle of dozens of containers, orchestrators make things much easier for operators especially when it comes to managing hundreds and thousands of containers running on a global infrastructure.

Kubernetes was originally created and built by Google engineers, as an open-source project. Google donated Kubernetes to its Cloud Native Computing Foundation back in 2015. It is a container-orchestration system for automating deployment, scaling, and operations of application containers across clusters of hosts. It runs above the OS and manages all layers of infrastructure, so the developer doesn’t need to know where the applications are running.

The main elements of a Kubernetes cluster are the Master Node, Worker Nodes, and Pods. The components that make global decisions about the cluster, like the API server, are located on the Master Node.

Kubernetes Architecture

Kubernetes Master Node

Master Node administer Worker Nodes and assign individual tasks to each. It is responsible for establishing and maintaining communication within the cluster and for load balancing workloads.

  • API Server — The API Server communicates with all the components within the cluster.
  • ETCD cluster- A light-weight distributed key-value store used to accumulate all cluster data.
  • Controller — Uses the API Server to monitor the state of the cluster. It tries to move the actual state of the cluster to match the desired shape from the manifest file.
  • Scheduler — Schedules newly created pods onto worker nodes. Always selects nodes with the least traffic to balance the workload.

Kubernetes Worker Node

Worker Nodes are the machines where the containerized workloads and storage volumes are deployed.

  • Kubelet — A program that runs on each node and responds to the master’s requests to create, destroy, and monitor pods on that machine.
  • Kube-proxy- A network proxy that maintains network communication to your Pods from within or from outside the cluster.
  • Pod — A pod is the smallest element of scheduling in Kubernetes. It represents a ‘wrapper’ for the container with the application code. If you need to scale your app within a Kubernetes cluster, you can only do so by adding or removing pods. A node can host multiple pods
  • Container Runtime Engine — A container runtime retrieves images from a container image registry and starts and stops containers. This is usually a 3rd party software, such as Docker.

Dev-ops, the core philosophy

Microservices and Containerization are great technologies but any cloud-native strategy will be incomplete without DevOps.

According to AWS … “DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity. Under a DevOps model, development and operations teams are no longer “siloed.” Sometimes, these two teams are merged into a single team where the engineers work across the entire application lifecycle, from development and test to deployment to operations, and develop a range of skills not limited to a single function”

Implementing, a dev-ops approach requires a fundamental shift in organization design — communication structures and culture of the organization. Companies need to embrace open communication channels between departments.

There are clear benefits to deploy Dev-ops at your organization. As per the “State of DevOps report 2019”, Comparing the elite group against the low performers, Elite performers have …

  • Frequent code deployments: 208 TIMES
  • More time to recover from incidents: 2,604 TIMES FASTER
  • Lead time from commit to deploy: 106 TIMES FASTER
  • Change failure rate: 1/7 as likely to fail

At the heart of the DevOps movement is the process of Continuous Integration & Continuous Deployment (CI/CD) :

  • Continuous integration (CI): To improve efficiency in application development, multiple developers working simultaneously on different features of the same app. Continuous integration is the process for developers to merge their code back to a shared agreed upon environment, sometimes as frequent as every hour. Once changes to the code are merged, they can be validated during the build and automated test phases and can be fixed faster.
  • Continuous delivery/ deployment (CD) :
  • Once you have the development pipeline ready with CI, Continuous Delivery focuses upon keeping the codebase ready for deployment to a production environment. The goal is to deploy an app to production quickly and easily
  • Subsequently, continuous deployment automates releasing an app to production

From a developer’s point of view, their code changes to a cloud application could go live within hours of making the changes if it passes the automated testing. This reduces the feedback loops significantly and makes the application less risky.

Dev-ops is a philosophy that powers cloud-native technologies, while CI/ CD are practices that enable such a shift. It is well summarized by this quote :

That’s a nice pivot to explore the Netflix case study.

Netflix Case Study …

In August 2008, Netflix saw a major database corruption, and they could not ship DVDs to members. This happened, they were about to launch streaming services and realized a disruption like such could be catastrophic to business and may have lost at least $1.8 M of revenue per day. They decided to move away from vertically scaled single points of failure, like relational databases in the datacenter, towards highly reliable, horizontally scalable, distributed systems in the cloud. Netflix was a pioneer in cloud adoption way early in 2008 when most of the Fortune 500 were skeptical until the mid-2010s.

If you plot the cloud adoption curve for enterprises with time beginning 2010 all the way to 2020. This is how the graph would look like :

Netflix has to make a lot of difficult choices during the cloud migration journey. Lift and shift of the system to AWS would have resulted in a suboptimal migration with all their current problems and limitations to the cloud.

They chose the cloud-native approach, rebuilding almost all of the technology and fundamentally changes to the operations.

The monolithic application architecture was migrated to hundreds of micro-services. They adopted the dev-ops principles to enable continuous integration, continuous delivery cycles, coming from a world of Budget approvals, centralized release coordination, and multi-week hardware provisioning cycles. Engineering teams made independent decisions to innovate and build new systems.

Netflix’s cloud migration timeline would look as follows :

From a metric standpoint, Netflix set three objectives :

  1. Scalability
  2. Performance, and
  3. Availability

Scalability

Netflix’s subscribers grew exponentially both in the united states and internationally. This kind of expansion couldn’t have been possible with data centers. With cloud, it takes a few minutes to spin thousands of servers especially if you have shorter provisioning times. Expansion into international markets became simpler as they no longer have to optimize for data center capacities and overlay significant capital and resources. They could pretty much operate like startups in international markets beginning with a tiny footprint and minimal investment and scale as the customer traffic begin to build up.

As you can see from the statistics below, an exponential scale journey couldn’t have been possible by playing safe in traditional environments

Performance

AWS provided both economies of scale and the elasticity with pay as you use models. As a result, cost per streaming start slashed significantly by 87% or to (1/8)th.

The prime time for Netflix’s subscribers starts at 7 PM and drastically reduces by midnight in each local region. Almost nobody watches Netflix from 2–7 AM on weekdays. So, by growing and shrinking the size of clusters as traffic grows and shrinks they can be extremely cost-effective

Availability

When you want to build world’s best streaming product it means a lot of things should fall in place — Great UI personalization, deep content catalog, supporting a lot of devices but at the end of the day the service has to work

Netflix set a goal to achieve a service uptime of four nines ( 99.99%) which means they cannot be down for more than 60 seconds at any given week during peak capacity. Interestingly, they were able to achieve this way early in 2012–13 during their journey.

By leveraging cloud architecture they could build redundancy and fault-tolerance. What that means is a system where the failure of individual components cannot affect the availability of the entire system needed to be designed. They used techniques like graceful degradation and deliberately induced “Chaos Monkeys” also known as the Simian Army. In a nutshell, they were designing antifragile systems.

CNCF ( Cloud Native Computing Foundation) nurturing the ecosystem

The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was founded in 2015, with a charter to make cloud-native computing ubiquitous. CNCF contributes to this ecosystem by stewardship of projects, conducting conferences, promotes technologies, and common technical common standards. CNCF also lists a lot of interesting case studies on its websites.

CNCF’s membership has grown exponentially in the last five years with more than 450 members.

Key takeaways

  1. Cloud-native is different than merely “operating on cloud” via lift and shift. It refers to building applications grounds up with design principles and architecture of cloud
  2. Through a complete philosophy that leverages microservices, containers, and DevOps, Cloud-native offers better scalability, elasticity, security, significantly lower costs, and astronomical speeds to develop and launch applications
  3. This is a high growth ecosystem nurtured by CNCF that attracts talent, enterprises, and investors Market Valuation of public cloud companies is 1 Trillion growing at 45%, and the top 100 private cloud companies is $ 267 B growing at 60%
  4. Netflix is one such interesting case study, please refer to the catalog of case studies here

Footnotes

https://aws.amazon.com/devops/what-is-devops/

https://github.com/cncf/toc/blob/master/DEFINITION.md>

https://www.youtube.com/watch?v=CZ3wIuvmHeM&t=770s

https://netflixtechblog.com/the-netflix-simian-army-16e57fbab116

https://www.youtube.com/watch?v=QJ4fODH6DXI

https://www.youtube.com/watch?v=5U-6sxR5DaQ&t=133s

https://mael.substack.com/p/cloud-native-viewed-by-a-vc-investor>

https://phoenixnap.com/kb/what-is-kubernetes

GOTO 2018 • Developing a Chaos Architecture Mindset • Adrian Cockcroft

https://services.google.com/fh/files/misc/state-of-devops-2019.pdf

https://hbr.org/sponsored/2020/10/dont-focus-on-digital-transformation-focus-on-quick-strategic-wins

Head — Strategy, Biz Ops, & Corp Dev@ Bristlecone. With Interest ranging from Business to poetry, I learn from multidisciplinary pursuits. Views are personal