Ansible: A tool to assist in automating system configuration. Red Hat tool – Scale IT Automation and Deliver Value Continuously.
See also puppet, chef, for loops & sysadmin grunt work.
Background
As a consultant, I am tasked with building systems for customers. Occasionally I have to build a shed load of these systems. While I can build one and then clone it, this approach will only get me so far before I am presented with an estate of machines where most of the systems do not meet the client’s specific requirements.
In addition, at some point in the future, the customer’s virtualisation estate will go south and they’ll lose half of their VMs (meaning half of the production estate has been erased from history).
Solution
In steps Ansible with its ability to define [m] systems that meet [x] specification, [n] systems that meet [y] specification and [o] systems that meet [z] specification.
Oh, and did I mention that halfway through the build process, the client (or their 3rd party software vendor) will create additional requirements [u], [v] & [w] that are mapped to some of [m], all of [n] and none of [o]. In addition, they have suddenly discovered that they need to resize the estate to include DR, which has the same requirements as production, but with a different naming convention (which is defined in [x], so now we have [x1] too).
With Ansible, I get to define the requirements for these systems in code, which means I can look for common recurring themes (such as a subset of common users, or installed packages), and reuse/reduce the amount of work I have to do.
After the virtualisation failure – in steps Ansible again which allows me to reconfigure those systems with just a few commands.
What can it do for me?
- I can abstract the requirements for the systems from the Ansible code that configures those attributes so that I can more easily adjust the configuration, re-use the configuration,
- I can make lists of elements required if they meet that definition and the Ansible tool is suited to writing them as a list (e.g. users, installed packages, …)
- I can rebuild systems later in the day if the current ones are corrupted / polluted/lost (part of a bare-metal recovery strategy)
Why use it?
- Speed (faster than for loops and a damn site faster than a sysadmin logging into each system in turn)
- Reliability (does the same job as the code defines the job)
- Reproducibility (will do the same thing until you redefine the code)
Lessons learned
- Test before you go live
- Use modules, they have been well tested and will adjust for different OS’s where applicable
- Test before you go live
- Read the module descriptions – some things don’t work as expected, for instance creating users on Red Hat by hand will automatically create a group of the same name as the user by default, in Ansible the create user command will throw an error as you try to map the user to a group that does not exist
- Test before you go live
- The first time you create an Ansible definition to build a customer’s systems for you it will take a considerable amount of time – mainly learning the painful way, plus some time to build a test infrastructure, time to redefine your code when you realise it doesn’t work or the client changes their mind.
- The second time will whizz past & you’ll actually have a smile on your face when you realize you’ve done it, especially if the second client has a lot in common with the first.
- In an ideal world, you’ll continue to use Ansible to make the ad-hoc changes from now on – meaning that the Ansible configuration will always be in sync with how the real system should be. That means when you rebuild/replace/augment the current estate the target will be exactly what you need, with no unexplained differences cropping up. If you don’t use Ansible for this and find that operators &/or admins are logging in and making manual changes, then your reliance on Ansible to replace/augment the target estate after a disaster or when your capacity requirements change will be misplaced.
What you need before you start
- Somewhere to run Ansible from (in our case a system we call the control host – a jumpbox that we use specifically for running Ansible jobs)
- Targets to connect to – the target will have a base OS – Ansible doesn’t do the bare metal builds / recovery – try spacewalk/cobbler / cloned sysprepped images / DRBL / clonezilla
- A way to connect to the target systems (usually SSH – meaning that the target needs to have at least a known address)
- Target name resolution (DNS or Hosts file is suitable) for SSH to work. If you’re part of a team working on this project then you really need to use DNS.
What you need for testing
- A test control Host
- A test infrastructure that you can throw away & rebuild over and over again (VMs with snapshots, vagrant scripted system deployments, …) including any necessary network access, DNS, etc
- A means to confirm that what you are building is correct (Ansible has a check feature, but if you’ve defined the Ansible configuration wrong, the test will pass when the target is wrong, capisce?)
/me – riding off into the sunset
then I remember I don’t know how to ride a horse – there’s an Ansible module for that isn’t there?