Incident Management

What is an Incident?

An incident is any event which is not a part of the Standard Operation of a Service and which causes or may cause an interruption to or reduction in the quality of that Service.

The aim of Incident Management is to restore normal services as quickly as possible.

Some best practices:

  • All inquiries should be recorded as incidents.
  • Service Requests (request for a standard operational item, eg: password resets) should be recorded as incidents.
  • A request for a new product or service should be recorded as a Request for Change (RFC).
  • Automatically generated incidents (such as hardware or network failure) should also be recorded as incidents.

The Incident Life-Cycle

DETECTION & RECORDING:

  • Provide a unique ID for each incident, even if it is a known issue.
  • Record how the incident was reported – what were the Services and Configuration Items affected?
  • Classify the incidents – like Hardware, Software or Service Requests.
  • Match the current incident against previously reported incidents.
  • Assign a priority to the incident. (Priority of an incident is determined by the Impact, Urgency, Availability of resources and the existence of certain parameters in the Service Level Agreement [SLA]).
  • Provide initial support to the incident or provide a workaround. If it is a new workaround provided by the IT Service Desk, record it for future use.
  • If the incident cannot be resolved, escalate the incident functionally.

INVESTIGATION & DIAGNOSIS:

  • This may lead to resolution of the Incident right away or having it funcationally escalated ( to Level 2 support.) If that process is taking too much of time, it might also get heirarchically escalated.

RESOLUTION & RECOVERY:

  • This can be done by raising an RFC and getting it implemented. Recovery just means “restoring a service or an ICT component back to its previously working condition“.

INCIDENT CLOSURE:

  • This happens upon confirmation of resolution of the problem by the user.

Note:

  • Impact is the measure of the level of effect the incident has on the business, for example: number of users affected or amount of revenue lost because of the incident.
  • Urgency indicates the timescale within which the incident needs to be resolved.

For an incident to be considered High Priority – both the Impact & Urgency should be high.