Tuesday, 6 May 2014

We like good documentation, but why don't we like to write it?

'...The problem with Software A is that the documentation is really lacking. Look - there's no reference for these commands, and the online reference is outdated. I don't think the product is mature.'
'Yeah, it takes too long to work out how to do something. Have a look at Product B, the features are similar but they've got tutorial on common tasks and online docs are much better.'

When we're evaluating a new piece of software, this type of conversation is common. But when we need to create documentation ourselves the conversation is quite different.

'Mate, I've got a customer who want to configure a new public facing web server in a DMZ. Is there documentation for this?'
'Well, what you have to do is talk to Networks and get them to set it up. Actually no, they need information about customer VLANs and new server needs to be in the same ip range as their other stuff'
'Ok, where are the ip ranges documented? And how does it work behind the load balancer?'
'For the ip ranges, there's a document on the 'S drive' but I can't remember exactly where it is. Not sure about the load balancer, check with John he set it up. He might have some notes somewhere...'

The attitudes towards internal documentation and what we expect from third parties couldn't be more different. The key impacts are
  • Takes longer to resolve incidents on average - For example, if a website is becomes unavailable when a load balancer is failed over, it's probably down to missing configuration somewhere. But if the engineer doesn't know or can't remember how the website was provisioned, she will have to work out from scratch how to provision a web site.
  • Changes take longer - modifying or improving services takes longer because you have to repeatedly determine the specifics of how a service works
  • Changes are more likely to cause incidents - In the real world, an engineer has a limited amount of time to determine the possible effects of the change. With lacking documentation about the service, she is less likely to understand fully what the change will do and therefore more likely to cause an unintended effects.
  • Engineers time is wasted repeatedly explaining the same thing - In a team, one person may have to repeat an explanation to different engineers as they require it. But then people often forget, especially if they don't do that thing frequently so they'll have to ask again, consuming time from both engineers and delaying work.

So why don't we make documentation better?
  1. Too much focus on resolving incidents instead of preventing them - Live incidents get a lot of attention and once they're resolved it's on to the next thing. But is this incident a one-off or has it happened before? Wouldn't writing the solution down be useful to the next engineer? Was the incident a result of a change that wasn't implemented correctly because the implementer didn't know exactly what to do? If we start trying to prevent incidents, the value of documentation becomes more obvious.
  2. A belief that memorizing details is the way to increase knowledge - Some are of the opinion that with more experience, an engineer should be able to remember enough about the environment to manage it effectively. This approach simply doesn't scale well. Once an engineer stops working on something for a while, the details begins to slip away. How many times have you come back to a script or service you built and struggled to remember how it worked?
  3. The cost of poor documentation is not obvious - The biggest effect is loss of productivity - less work takes more time. But the average team probably doesn't track how long tasks take. Lack of weekly review of incidents and changes tends to hide the fact the people are spending too much time on tasks that could be much quicker.
  4. Lack of skills to solve non-technical problems - Most of engineers are geared to technically analyze issues. But there isn't much training or focus on how to identify and resolve non-technical challenges like knowledge sharing or incident prevention. Companies end up recruiting technically proficient teams but don't recruit people who can see the non-technical components.

IMHO, the fact that many engineers don't like to write docs is not the real problem. The real issue is that in the culture of IT Operations, there's isn't a strong understanding of how capturing knowledge helps make us better engineers.

No comments:

Post a Comment