Sunday, 25 August 2013

If it ain't broke, don't forget to upgrade it

Stable systems are great. Day to day, they require little management and they keep doing their job. It makes sense to leave well enough alone and work on other stuff, right? It's a common approach and it kind of makes sense. Except, if you ever need to update software on that system. And that's when the pain begins.

Here's a few examples of straightforward tasks that have ended up being unreasonably time consuming, infeasible due to the number changes required on production boxes, or near impossible due to the level risk introduced when changing software the was depended upon by other components.

  • Installing Dell OpenManage utilities 
  • Updating Subversion
  • Updating Net-SNMP
  • Installing Puppet 
  • Administering EMC CX3 NAS
I've found that the requirement to update software often comes when improving management of the server. The software packages that provide the key services like databases, application servers or web servers are not upgraded that often and the update process is fairly straightforward. Although updating between major versions tend to require a whole new set of dependencies that may be more challenging to satisfy.

Now even though you're improving the system to make it more manageable or reliable it doesn't mean that there won't be objections to upgrades from customers or within your own team. Some typical objections:
  • It's been working well enough up until now, there's no need to change it.
  • An application developed by in house development team will have to be re-tested or changed. They don't have time for that.
  • We've got fires to put out, that's not even burning!
None of these objections are valid reason not to upgrade something, but they are things you take into account when managing your approach to upgrades. IT Operations should be adept at keeping things running optimally and being proactive at doing so. Waiting for something to break is not a great strategy. The time spent investigating incidents that are actually caused by known bugs will easily outweigh the time taken to make small upgrades. If another application needs to be re-tested or changed then you'll have to work with the development team to do so. This may be difficult because they won't have the same priorities that Operations have but as long as the problem with not upgrading can be expressed in ways that affect the running of that application, you should be able to get some traction.

It won't always be easy to make upgrades, and you will have to compromise sometimes. But it's important to have a strategy of regular updates in order to maintain a stable environment.

Sunday, 18 August 2013

Make your ticket system work for you

If you don't have a proper ticket system for your team, get one. By proper ticket system I means one that handles multiple teams, has email integration, customizable metadata for tickets, and notification of events. Anything else is not really worth wasting your time with. I've heard reasons why a 'good enough' ticket system can be improvised using Excel, Sharepoint, a whiteboard, a perl app knocked together in 27 seconds, etc. But, to be blunt, all the reasons are rubbish. I can't even be bothered to go into the reason why they're rubbish (unless someone really wants me to). There's enough free and affordable systems for any company to get something for fit for purpose.

But if you're one of those people who feels that logging every change you do is an overhead which you could do without, or that you know so much you don't need to worry about tracking anything, here's a few of the core benefits for engineers.

1. Solve incidents faster

Many incidents are related by changes made by the team. If you've got a system where you can easily find out who changed what and why, resolving incidents will take a fraction of the time.

2. Work more efficiently with your team

If you try to keep abreast of everything that's going on in your team with conversations, you'd get very little done. Tickets systems allow you to keep up to date on what's going on without interrupting your colleague and you can selectively choose what you want to know about. Many tickets systems allow you to subscribe to queues or selected tickets by email or RSS feed.

When you have a better idea of what's going on, you can proactively help out on issues you've had experience with, see potential problems on the horizon, and coordinate your work with others.

3. Make it easy to track important tasks

There's often a lot going on and it's annoying when something you know you should have done, comes back to bite you later. Logging tasks enable you to review things regularly and prioritise as appropriate. It's much easier for computers to remember something than for you to try and keep it in your head, don't waste the effort.

4. Remember how you fixed it

When I'm working on a system regularly, it's pretty easy to recall the command line options to perform a task. But when you stop working on that thing for a period of time, you begin to 'forget' how to achieve certain tasks or why something is configured in this way. I find this is particularly true when programming.

You haven't really forgotten but the information has moved out of your 'cache' of quickly accessible facts into longer term memory. If you log notes about what you've done in a ticket, these notes, while not being complete details of the actions performed, act as 'pointers' to elements in your long term memory and it's easier to recall facts. Without these pointers it will take much longer or you may never remember exactly what you did to fix something.

Remember, a well designed ticket system will work for you, allowing you to get more done, and get home on time.