Friday, 13 June 2014

A template for the team weekly roundup

While working in a previous position, I noticed that although the team discussed a lot, the right things were always coming up. That team supported live television services so it was important that engineers were up to date on outstanding problems, workarounds and changes. I found that even when engineers sit next to each others, there was no guarantee that they would exchange the right information that each of them needed. So I initiated a round up meeting that would occur on a Friday to solve this issue.

The focus on the weekly round up was not to review everything that happened in the week, but to determine where problems were and communicate activity to the team. Action points were captured out of the meeting and assigned later. Minutes were sent out that day. Always write some kind minutes for review meetings otherwise decisions and knowledge are lost. The meeting is limited to 30 minutes. 

Weekly Roundup Template

  • Changes: Any failed changes or changes that took longer than one hour to complete? (In that environment, most well written changes could be performed in less than an hour. Any longer and there was probably something wrong).
  • Incidents: Any repeated incidents?
  • Projects
    • What's been done this week
    • What's the next step
  • The Good - What went well
  • The Bad - What should we be doing better
  • Any Other Business - anything else people want to discuss

How this helped:
  • The team had greater awareness important activity.
  • Engineers had the opportunity to review problems as a team - engineers have differing levels of expertise and experience. Putting problems to the team enabled all strengths to be applied.
  • Better identification of problems - There are numerous occasions where engineer adopted a practice of successfully working around a problem, however it still cost a significant amount of time to do this. Specifically discussing repeated incidents helped bring underlying problems to light.
  • Improvement for team morale - It's frustrating to be firefighting much of the week only to know that next week will be the same. Being able to raise problems and track progress of solutions helped to improve morale.

Not everybody was a fan of the meeting. But I found that in those cases it reflected the engineer's approach to teamwork, rather than the meeting itself. It was especially useful to newer engineers as they learned much about things that were going on that wouldn't normally be discussed with them due to their experience. Overall, it turned out to be a very useful 30 minutes out of the week.

Tuesday, 6 May 2014

We like good documentation, but why don't we like to write it?

'...The problem with Software A is that the documentation is really lacking. Look - there's no reference for these commands, and the online reference is outdated. I don't think the product is mature.'
'Yeah, it takes too long to work out how to do something. Have a look at Product B, the features are similar but they've got tutorial on common tasks and online docs are much better.'

When we're evaluating a new piece of software, this type of conversation is common. But when we need to create documentation ourselves the conversation is quite different.

'Mate, I've got a customer who want to configure a new public facing web server in a DMZ. Is there documentation for this?'
'Well, what you have to do is talk to Networks and get them to set it up. Actually no, they need information about customer VLANs and new server needs to be in the same ip range as their other stuff'
'Ok, where are the ip ranges documented? And how does it work behind the load balancer?'
'For the ip ranges, there's a document on the 'S drive' but I can't remember exactly where it is. Not sure about the load balancer, check with John he set it up. He might have some notes somewhere...'

The attitudes towards internal documentation and what we expect from third parties couldn't be more different. The key impacts are
  • Takes longer to resolve incidents on average - For example, if a website is becomes unavailable when a load balancer is failed over, it's probably down to missing configuration somewhere. But if the engineer doesn't know or can't remember how the website was provisioned, she will have to work out from scratch how to provision a web site.
  • Changes take longer - modifying or improving services takes longer because you have to repeatedly determine the specifics of how a service works
  • Changes are more likely to cause incidents - In the real world, an engineer has a limited amount of time to determine the possible effects of the change. With lacking documentation about the service, she is less likely to understand fully what the change will do and therefore more likely to cause an unintended effects.
  • Engineers time is wasted repeatedly explaining the same thing - In a team, one person may have to repeat an explanation to different engineers as they require it. But then people often forget, especially if they don't do that thing frequently so they'll have to ask again, consuming time from both engineers and delaying work.

So why don't we make documentation better?
  1. Too much focus on resolving incidents instead of preventing them - Live incidents get a lot of attention and once they're resolved it's on to the next thing. But is this incident a one-off or has it happened before? Wouldn't writing the solution down be useful to the next engineer? Was the incident a result of a change that wasn't implemented correctly because the implementer didn't know exactly what to do? If we start trying to prevent incidents, the value of documentation becomes more obvious.
  2. A belief that memorizing details is the way to increase knowledge - Some are of the opinion that with more experience, an engineer should be able to remember enough about the environment to manage it effectively. This approach simply doesn't scale well. Once an engineer stops working on something for a while, the details begins to slip away. How many times have you come back to a script or service you built and struggled to remember how it worked?
  3. The cost of poor documentation is not obvious - The biggest effect is loss of productivity - less work takes more time. But the average team probably doesn't track how long tasks take. Lack of weekly review of incidents and changes tends to hide the fact the people are spending too much time on tasks that could be much quicker.
  4. Lack of skills to solve non-technical problems - Most of engineers are geared to technically analyze issues. But there isn't much training or focus on how to identify and resolve non-technical challenges like knowledge sharing or incident prevention. Companies end up recruiting technically proficient teams but don't recruit people who can see the non-technical components.

IMHO, the fact that many engineers don't like to write docs is not the real problem. The real issue is that in the culture of IT Operations, there's isn't a strong understanding of how capturing knowledge helps make us better engineers.

Sunday, 23 March 2014

A new leaf for Project Snowflake

I've been a bit disappointed with the current state of Project Snowflake. It was (and is) much harder problem than I thought it be but I think I had determined the key parts of the solution at least three years ago. Coding is nowhere near as far as I thought it would be.

Sometime in January my wife sent me a link to a video by Anthony Robbins - someone who I would generally describe as a life coach. After confessing that his 20 minute talk was actually intended to be five minute talk, he was somewhat more verbose than needed but he did have a good point: making a significant change in your life require a change of habit. This is not a new concept, famously written about by Stephen Covey (7 Effective Habits), but it does help to get a reminder every so often. Importantly, he stressed the fact that bad routines contributed to getting you where you are now. I wanted to make change with the way I was  So I decided to have an honest look at my work to date and see where things might be going wrong...

  • I planned to work on Snowflake regularly, but didn't schedule other parts of my life. I was finding that when I wanted to code, often other things came up which needed to be resolved or attended to. This could be tasks around the house, errands, shopping for a gift, fixing things around the house, social engagements, exercise, getting married (well worth it but the eight previous months were major stress!) the list goes on.
  • Not keeping a regular log of my work. I aimed to keep a weekly report for myself but often failed to do it regularly. As a result, I was less conscious of the time I was spending on a particular task particularly when solving problems.
  • Not taking enough regular time out. Obviously I wanted to see the project moving ahead as soon as possible but I wasn't taking time out to relax regularly which caused stress and made me less productive
  • Too much reading, not enough doing - I had to do a lot of reading as there were many firsts on this project. But I feel there were many times where putting thing into practice sooner, would have advanced my understanding quicker, rather than reading more about the topic
  • Imagining technical problems I would face in the mid to long term. Like too many programmers, I spent too much time think about scalability, the merits of different APIs, version control systems, etc. It worth considering these things, but these future problems only became real if you had an application built, which I didn't. The real issues were learning how to build an application fit for purpose in my spare time.

So this year I decided to make some significant change to get things done

  1. Make a realistic schedule - One that includes making time for a personal life and relaxation. This is the single most useful thing that I've done.
  2. Keep that weekly log going - keeping track of what I've done helps me see quicker when I'm coming off track
  3. Stick to tried and tested technologies unless there's a really good reason. It's hard enough getting tried and tested stuff to do what you want without introducing other complications
  4. Keep the goals small - Ambition is too often the enemy of progress. It's easier complete small goals instead trying to complete a big one. It's also better for morale.
Work has definitely been more productive since I made these change and I feel positive about where I'll be six months from now. I've also decided to devote more time to this blog which has been much neglected of late so expect updates soon.