The paradigm of Change of state vs Change a whole block

I’m an engineer that worked with infrastructure as a system engineer at that time when we didn’t know what’s Jenkins because it was called Hudson. When NGNIX had a logo that looks like a Russian jet.

it was the Jenkins at that time
my preferred logo of NGINX

In that time nobody spokes anything about Infrastructure as Code, and Immutable Infrastructure, which is pretty recent concepts and best practices.
We had an idea that we need an agent, inside of our machines(bare-metal) or Virtual Machines(new technology for that time) to automate our configuration and keep everything on track. Some years in that wave yet, some authors and specialists were starting sharing where did this bring us:

agents in a host

1) Working for the tools not to what the tools can provide

In the day-by-day, in real-life, we worked several hours creating scripts to install the agents for Zabbix, Nagios, Chef, Puppet, VMWare, Storage Systems, etc.. etc.. After that work we have to monitor these agents, update it. When we finished this work we didn’t have better configuration management, we don’t have a better monitoring system, or whatever these agents can bring to us, we just have our environment ready to use. What results we can deliver to business in this stage? nothing! we had only working to attend the requirements of the tools that we decided to use.

when tools are more relevant than results…

2) Cloud as commodity & Network

We starting seeing in 2006 Amazon launching the AWS and the movement of Cloud was being more professional that we never saw before. We still had thousands of ISP, Hosting Providers but the first global cloud appeared. Our strategy moves faster to add proxies for these regions to connect our corporate headquarters(HQ), MPLS/VPN networks with these Hosting Providers, or Amazon AWS. At that moment we expand our agent issues to proxies issues, communications between agents, and central servers are constant in our backlog.
The network was a pretty static resource people didn’t have yet Software-defined networking(SDN) because it was not developed yet. As I mentioned NGINX was an option growing in the market, Squid was another proxy, and we have some proprietary tools are expensive, we don’t have load balancers, proxies, CDNs like we have in nowadays. It makes impossible thinking in Immutable Infrastructure at that time but we were are getting progress in Infrastructure as Code.

global fiber network currently

3) What you shake too much you can be broken

In our vision, these agents not only will be provisioning a configuration to the hosts but keep it updated, and here was our huge mistake. All new host deployed work properly until starting receiving your first updates or changes, in those moments we start having kernel panic in our bare-metal servers and in our Virtual Machines we have corruptions in the system, in the image and sometimes simple freeze the VM completely. In the case of disaster recovery, we have to make all set up again, run and wait the agents install all packages/apps/services, and configured everything to the latest state that we had in our backup. It was common in the market one incident extends for days until the normal states come back.

making changes into an environment

What we learn from that time to here?

Someone that works with Build Systems or Software Upgrades can understand what this strategy has many more chances to fail in the long term than we imagine in the infra/systems side. Based on what we can hear from build system engineer some words like: “clean build”, “avoid cache..”, etc.. they know much more than us, thank you guys for your years of work creating and sharing these practices with other engineers.
Other types of engineers gave us amazing insights like Releases Engineers. When they start a new release they create by scratch the release building the packages, configuration the apps/packages, and deploy the new image to test! in that point, we can see, why they don’t install the previous image/release and start upgrading using agents to test a new Linux distribution release? They mentioned that it is not reproductive, trackable, and auditable to understand where it fails, to find the root cause of an issue and fix it, for them the old state doesn’t matter the new state that they pretend publish is what matters.
They also say that the library’s conflicts are not simple things to solve and sometimes we have broken capabilities in some apps/packages/services that not only the behavior changed but the configuration way also. The last thing is how they test upgrade releases? Based on them the upgrade process is tested from some versions in some states to other targets, normally they use a specific tool to it also. In that case, it is not one unique way but is an optional way that if you have issues always we can deploy a new release from scratch, in the clean host or empty VM.

The current state of the market

With this bunch of information and shared knowledge, we can imagine okay nowadays everyone it’s not using anymore that type of tool based on agents, that generates specific knowledge like a special language, long-tail curve of adoption, etc…. Unfortunately, you are wrong my friend, based on the public aggregation of statistics Datanyze we continue to have some market share with these tools. They are losing market share every year but still in for some companies.

What tools are leading the new principles Infrastructure as Code & Immutable Infrastructure?

Ansible vs Microsoft System Center Configuration Manager

Ansible & Microsoft System Center Configuration

Both tools have similar architecture: Agentless, easy to start today, and prove results on the same day, similar market share, etc.. In the two cases, Ansible focused in the Linux/Open-Source world and MSCC with a focus in Windows Platform sharing the same ideas to reach that position.

Market Share Tools:

Conclusions & Thoughts

The companies, independent of their size have to keep the attention in the concepts, best practices, and principles to define their choices of Cloud, Tools, Systems, etc.. because the most of the companies are looking for a DevOps, SRE, or Infrastructure Engineer to accelerate your process and improve their workflow to make more with the same human resources, all engineers are limited to the current demand that is so higher.
These professionals have been rarer gradually, the requirements of job opportunities are gradually bigger and bigger the experience in that is fundamental, junior engineers tend to choose a more calm and easy path studying one, two maybe 5 things to be another type of engineer than to understand a group of complex tools, cloud resources, cloud providers, kubernetes, CI/CD, Python, Golang, Bash, YAML, Service Mesh, Storage, Networks, Security, Compliance, etc..
The pressure in these professionals is huge and in my personal opinion, anyone that it’s in the infra teams is not an amateur. Everyone wants to be productive, deliver results faster, receive bonuses, got stock options, and grow faster in those companies.
We are living the wave that requires to be simple and focus on results that definitely an Infrastructure as Code and Immutable Infrastructure are the present, and a path to continue been the future also.


Immutable Infrastructure?

Immutable Infrastructure

Achieving DevOps Success Via Immutable Infrastructure

Build secure immutable infrastructure in the cloud — DevSecOps

Making the difference via open source

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store