This blog highlights our discussion and some points I would like to stress.
Incident management is is the process of getting something that is broken back into a working state quickly. So why do things break? Taking care of incidents is a waste of time for those involved, so why not improve the way we design things to reduce the quantity of incidents?
To improve design processes we need the right skills, tools and, most importantly, people involved. Is DevOPS the answer? DevOPS is certainly a driver for change, as the goal is make available secure, quality and wanted products to a user as quickly as possible. Silo design by PMO and Development who then chuck it over the wall to Service Management (applications, operations, service desk or a vendor) is, thankfully, going away. We must work together and include the end user in creating this thing that they want.
We do know what they do not want, and this is any issue that stops them from using the product. If incident management is the process of fixing, then what is the process of not letting incidents happen? Lean, DevOPS, even ITIL, respond with problem management. The skills of quick fix are in the hands of whoever created the item; the skills of ensuring that things break less often should be in the everyone’s toolkit.
This is where people get confused. Problem management is not logging an unknown error (how do you log something unknown anyway?). It is not ‘wait for a few incidents to happen and then let’s get together to think why’. It is not only used if a major incident occurs. Come on, when you personally call for support, is it not major (to you)?
These two processes are artificial from the user perspective. They do not experience incidents or problems, they experience a product that works or does not. They do not call in to report an incident or problem but instead call in for help. What we do next is paramount to satisfying and keeping that person happy, and the argument of ‘they are internal so where can they go?’ is short sighted. If people do not like what you give them, there are enough cloud-based applications to let them do their job without you. Yes, this opens a whole new set of issues, but that is the point.
So who allowed this to happen? Suppliers and tool vendors sell the framework that is best known. Incident management is easy to tool or sell: 6 steps. Problem management is not so easy, or is it? Lean has the A3, improvement Kata, visual management of GOOD with stop-look why-resolve tools and mentality. Why can’t we adopt and adapt this to our ITIL world?
Is it because the delivery of your service in many organisations is delegated? Is it because the manner in which we manage our services is based on monthly, and let’s face it, tweaked reports? Who allowed this to happen?
Leadership! If you are going to alter the attitude of performance then this cannot be delegated or introduced via a programme of certification. The organisation must adopt a way of working that creates things with quality, and quickly. Management must lead this, but management must also act as if this is their way of working as well.
Simple step: pick a target that means something across the organisation. My first CIO had a measure of MTBK: Mean Time Between Kicking. Not us but him! He measured the number of hours, then days, then weeks (as we improved) between the time a peer commented negatively on the performance of IT. He then asked his management team to create a KPI to map to his in our world and keep going down to the service desk or server engineer. This became a KPI tree with a branch on performance of 6 metrics, and if any one of these failed, the whole branch was RED. But Red was a good colour, it got us working together top-down to resolve the issue and try to ensure it never happened again. This is the true meaning of problem management.
Know what is wrong, prioritise as to whether we should address it, and by when, and if we do, let everyone learn from the experience. If we get incident management to become a learning methodology reinforced by the application of problem-solving techniques, then we create value. If all we do is solve things quickly, who cares? After all this is our job.
So does Incident management have value? Or is it bigger than this single process? Let me know please.