Thursday, 21 April 2011

Change request - or a can of worms?

Every so often you get a change request which is not really a change request. It's a can of festering worms. Unless you like worms (I'm sure there are decent recipes somewhere), it's important to know how to turn the CR into something more digestible.

Once I received the following request:

"Please install an SNMP agent so we can monitor the performance our web application."

This request was received from a developer for a production application. There wasn't much more details to the request. They had discussed it with the development team and decided that this is what they needed to satisfy their requirements.

However this looks a bit dodgy to me. SNMP is useful for monitoring but by default it doesn't have much that is appropriate for monitoring web-app performance. What kind of information did they want? How are they going to collect the data? I've learnt that such unknowns often mean more work for me to turn the CR into something that can actually be implemented. So I look at it this way. I'm the engineer. The developer is the client with the problem. I don't expect the client to know how to solve their own problems because that's my job. I've got three basic things to find out to solve this issue properly.

1. What is the client trying to do?
2. Why are they trying to do it? (Often the most important thing to understand)
3. Is this the best way to do it?

In this case the answers were as follows

1. The client needed to know when web server was responding slowly.
2. Diagnose what made the app slow
3. Well, no... What they wanted to do be achieved using the existing monitoring system (Nagios) and a few additional checks.

If I didn't asked the questions, I would have installed the SNMP agent, worked with the developer to write scripts to extract the right information, tested it, then found out that the required information could have been obtained with tools already in place. Wasting a lot of time that I could have spent playing Sudoku online.

If you want spend more time fixing real problems (or max out your online gaming), don't be afraid to ask a few questions when you get a CR that doesn't seem right. It's better for both parties if you start solving the right problem from the beginning.

Tuesday, 5 April 2011

The five most important things I learnt about IT Operations

My first job was working for a small IT consultancy with no more than 15 people. I joined two years after the dot-com crash and times were still hard for IT. The company was self funded and the owners worked hard to keep the company alive.

Here I learnt more about business of providing IT services than in any other company since. Here are five things that really stuck with me.

1. The customer is king.
Without them, there is no business. It's a two-way relationship that you have to manage well. Be accomodating but don't go too far. Attempting to satisfy every request will put you out of business.

2. Do not break production systems
Be careful when executing commands, the wrong one can put you in a world of pain. Inevitably things will break, but try and make sure you're not at fault. The customer pays you to keep services up, and if you don't meet agreed service levels, they're going to want that money back.

3. Time is money - track where you spend it
This tells you who's bringing in the money and who's taking it out of the company. It also gives you a great insight how your team really works and helps you to identify problems and trends.

4 Solid network monitoring is crucial
If your systems break, you need to know about it. Many incidents can be averted by monitoring the basics; disks, load, memory,processes and logs. Your usefulness to the business will come into question if your customers can spot problems before you can.

5. "You have to kick ass!"
This is a quote from my first boss. Don't rest on your laurels, be proactive. The customers is demanding and as a business, you're always in competition. If you get lazy you're going to be out of business one way or the other.

These are my rules of engagement for frontline, what are yours?