What is a Problem? – A Problem is “an unknown, underlying cause of one or more Incidents“.
Did you know? 80% of incidents are caused by 20% of ICT infrastructure components!
Problem Management is responsible to minimize the adverse effects of the Incidents and Problems caused by errors in the (ICT) Infrastructure on the business and to proactively prevent the occurrence of such errors, incidents and problems.
Problem Management looks for the underlying causes of Incidents and Problems and provides long-term (permanent) resolutions. It functions both Proactively and Re actively:
- Proactively: by trying to prevent the occurence of issues by intelligently analyzing problem trends and available statistics.
- Reactively: by identifying underlying problems which are causing the incidents and find a permanent resolution or an immediate workaround.
When Problem Management successfully identifies a problem and a suitable resolution to it – the resolution is implemented through the Change Management process.
Prioritization of problems is generally done by the “Pain Factor (PF)”. (The Pain Factor is nothing but the number of people affected by the problem and the impact it is having on the business.) So, higher the PF, higher the priority.
Responsibilities of the Problem Management team:
- Problem Control: Transform Problems into Known Errors by identifying the root cause of the problem and providing a temporary workaround. (This converts a Problem into a Known Error.)
- Error Control: Resolves the Known Errors under the control of Change Management as soon as possible and whenever it is financially justifiable.
- Proactive Prevention of Problems: Carry out trend analyses and provide support to the organization.
- Providing Management Information from Problem Data: Carry out trend analyses and provide support to the organization.
- Conducting Major Problem Reviews: This is done after a major problem has been resolved so that future problems can be prevented.
The Problem Management process consists of the following stages:
- Identification: The first step is to identify a new Problem. If there are no matching records in the existing Problem or Known Errors database, then it is classified as a new Problem.
- Recording: A new record is created and a unique ID is assigned. All related Configuration Items are linked to it as well as all related Incidents/Known Errors.
- Classification: The Problem is classified appropriately and the impact of the Problem on the Service Levels are determined so that relevant resources can be assigned to resolve it.
- Investigation: The Problem is investigated so that a resolution is identified and it can be classified as a Known Error.
- Diagnosis: Techniques such as Kepler Tregoe analysis and Ishikawa Fishbone analysis are used. The end result again is the identification of a resolution or a temporary workaround to the problem so that it is converted into a Known Error.
- Review & Closure: After every Problem is resolved – it is thoroughly reviewed so that the following questions can be answered:
1. What was done right?
2. What was not done right?
3. What could have been done better?
4. How can we prevent it from happening again?