DevOps

Some lessons from QCON

Posted by hossg on March 18, 2014 · 11 mins read

DevOps is a software development method that stresses communication, collaboration and integration between software developers and IT Operations (e.g. Support, Helpdesk, Infrastructure).

I view it as a breaking-down of the barriers between these groups and an end to the “silo” mentality, with its associated hand-offs that are essentially waste (in the Lean sense). That’s not to say that the assurance, compliance, etc, that are associated with IT Operations are not important, but really to emphasise that if these can be achieved without the waste of rigid silo-isation, then that can give us a much more efficient and affective organisation.

That was a thrust of a talk by Damon Edwards (@damonedwards) at QCon last week where he spoke about how to introduce DevOps to an organisation, with a focus on doing this from the development team. For a synopsis and copies of the slides, please see: this link.

I also went to a talk by Ola Ellnestam (@ellnestam) about DevOps at a small Scandinavian bank. See this link for a synopsis and slides. Ola’s talk was very interesting, though it felt very unreal to me as the entire staff of this bank is 12 – with only 3 people in IT – naturally giving a DevOps requirement. Of course necessity is the mother of invention, so given a need to operate in a DevOps style, it was very interesting to hear Ola’s experiences and to think how these could be translated to bigger organisations.

Ola said that working in such a small environment forced people to take on responsibility that might not be defined by their nominal role, and that it was essential to focus on Activities not Roles. Funnily enough despite having worked in some organisations several orders of magnitude larger than Ola’s, the same characteristic seems important to me. In big hierarchical organisations people have no way of navigating and understanding the complex structures, decision-making processes, etc, and instead look to the Role that people have as a proxy. Then they look inward and focus on their own Role, and the definition of that role in turn becomes a proxy for achievement, recognition, reward and self-worth. Promotion, rank, grade, job title all assume far greater apparent value than they deserve and occupy an unnatural amount of attention – and thus result in Waste.

Operating at such a small scale also forces Ola to focus on minimalism and pragmatism. There’s a note of caution here – if you only focus on these characteristics there’s a danger of only producing “duct-tape” solutions to problems that neither stand the test of time or change in an organisation. Ola also espouses a need to look at Holism – the understanding of the whole, and when this is also brought into the equation it’s possible to look for really powerful but Lean solutions. Ola summed it up with a great slide: DevOps

Damon also brought in concepts of Holism when he spoke about “Seeing the System” and “Organisational Alignment”, but more on these later.

From a more practical standpoint, Ola talked about the need for reducing the time taken to roll-back, and he described this brilliantly as having a Ctrl-Z strategy. We always undo typos and mistakes when we’re working at a computer, so why should doing releases/deploys be any different? Being able to do this would certainly give more confidence in an ability to “release now” which was a central theme of the talk about deployment at Etsy. Interestingly at Etsy they cannot roll-back – their release cycles are so frequent, they adopt a roll-forward approach.

One of the rationales for this Ctrl-Z strategy is in order to have better control over the “error surface”, which is the impact of an error multiplied by its duration. I don’t think this is something that is easily quantified in most circumstances, but the point is this: it’s very difficult to know whether a given error (e.g. a typo) will result in a large impact or a small impact. One thing you can try and control however is the duration of that impact – and a key technique for this is the ability to easily and swiftly roll-back, i.e. Ctrl-Z.

Ola achieves this Ctrl-Z approach by investing in a pure build – i.e. building and releasing from scratch rather than in an incremental fashion. Clearly an approach like this commoditizes the “running instances” and can be useful in other scenarios – e.g. failover to BCP locations, etc. If you always release by doing a “virgin” build, then it’s easy to have confidence in doing the same in an extenuating, disaster-like situation.

Other techniques Ola mentioned were to do with database changes. It’s imperative to make these easy to roll-back as well, so they need to be non-destructive where possible, e.g. by use of shadow-copies of DB tables, and by adding columns and renaming old ones, allowing the reverse operations in case of roll-back.

Interestingly he moved his team from Subversion to Git and explained that the focus of development should be on conversations rather than complex-locking. In some ways there’s no surprise here, but the most interesting aspect to me is that he found value in the SCM switch (and the cultural, behavioural switch!) even for a team of three. I’m a big fan of Git, but it’s reassuring to hear about its benefits even in very small teams. The final notion that I took from Ola was a fantastic metaphor:

How do you eat an Elephant?

EatElephant

The normal approach is to cut it up into (small!) slices! Ola’s answer is instructive: better to shrink it into a small elephant! This is a great way of thinking about Agile – do the “whole of something” many times, but on a small scale, rather than trying to do “part of something” big and hoping you can get through it all!

How to make the transition to a DevOps model?

Back to Damon’s talk now. Damon presented a “recipe” for introducing DevOps to an organisation, from a developer’s perspective. I’ll summarise that recipe here, though it seems to me to require quite an intensive effort, and funnily enough to be be fairly “non-agile” in its approach (i.e. the implementation, not the end-state). I say that because I think it requires some considerable blocked out time for a team to analyse their current processes, etc, and agree a program of change. Damon himself suggests a workshop lasting a few days to kick things off, though he does say with practise this can be significantly reduced. In my own organisation, we’re only likely to do this once (we are pretty small) and hence this represents a pretty high fixed cost. Furthermore, I think we have other lower-hanging-fruit to pick first, for example introducing some Agile development practises, and being more selective about what we build versus what we buy.

Anyway, all that being said here’s Damon’s DevOps Recipe for a Developer:

  1. Socialize the concepts and vocabulary
  2. Visualize the system:
    • value stream mapping
    • time analysis
    • waste
  3. pick metrics that matter
  4. identify projects and experiment and measure against baseline and repeat 2-4

Damon’s recipe does seem to me to involve some pretty heavy lifting in terms of the value stream mapping and the time analysis. I guess these things could be done quickly – but I suspect you need considerable experience to be able to do this, and some strong guidance and mentoring. He suggested doing this analysis by looking at two things: the flow of artefacts the flow of information

Waste is somewhat an easier concept for me – at least in as much as I am familiar with Lean, and so I find these more natural to look for and identify. The seven wastes of software development:

  • Waste #1 – Partially Done Work
  • Waste #2 – Extra Features
  • Waste #3 – Relearning
  • Waste #4 – Handoffs
  • Waste #5 – Delays
  • Waste #6 – Task Switching
  • Waste #7 – Defects

When it comes to metrics – clearly and important topic if you’re looking to iteratively improve something, Damon suggested:

  • cycle time
  • mean time to detect
  • mean time to repair
  • quality at the source (scrap) – how often does the problem escape the place it was created?

He also emphasised the need to tie the metrics to the individual, i.e. to be help to answer the question: “what can i do to improve the metrics?”

When it comes to implementation, Damon suggested viewing the transition from the point of view of the challenges that an Ops group face, for example a desire for Audit, Compliance, predictable workload etc, and then suggesting solutions that can specifically address those, but in an automated way. E.g. to address the “queuing” problem of hand-offs, for example using Chef/Runbook deploys, self-service tooling and so on, but demonstrate how these can be Audited, Secure, etc, etc.

Damon also suggested launching with a “burst of energy” which he proposed would be in the form of a multi-day workshop, along with a “brand name” to help rally the cause and establishe a common goal, e.g. “Ticketless IT”. While I can see how that would work, I think the bigger gains to be had in my own organisation are probably to be found elsewhere (other than DevOps) so I think my own approach will be to look for some specific, obvious, examples of waste and to try and address these initially and then look to DevOps as a second or third stage transition.