Availability is usually calculated based on a model involving the Availability Ratio and techniques such as Fault Tree Analysis, and includes the following elements:
- Serviceability – where a service is provided by a 3rd party organisation, this is the expected availability of a component.
- Reliability – the time for which a component can be expected to perform under specific conditions without failure.
- Recoverability – the time it should take to restore a component back to its operational state after a failure.
- Maintainability – the ease with which a component can be maintained, which can be both remedial and preventative.
- Resilience – the ability to withstand failure.
- Security – the ability of components to withstand breaches of security.
Availability Management and IT SecurityIT Security is an integral part of Availability Management, this being the primary focus of ensuring IT infrastructure continues to be available for the provision of IT Services.
Some of the above elements are really the outcome of performing a risk analysis to identify any resilience measures to be put in place, identifying just how reliable elements are and how many problems have been caused as a result of system failure.
The risk analysis also recommends controls to improve availability of IT infrastructure such as development standards, testing, physical security, the right skills in the right place at the right time, etc..
Mission StatementOptimize the capability of the IT infrastructure, services and supporting organization to deliver a cost effective and sustained level of service availability that meets business requirements.
Process GoalAchieve the process mission by implementing:
- ITIL-aligned Availability Management policies, processes and procedures
- Dedicated Availability Management Process Owner
- Holistic management of IT service availability versus independent technical silos
- Actions to ensure availability levels meet established service level targets
- Service Improvement Projects (SIPs) to address availability
- shortfalls and concerns
- Actions to proactively seek availability improvements where needed
- Actions to ensure appropriate levels of availability have been built into new IT solutions
Critical Success Factors (CSFs)The Critical Success Factors (CSFs) are:
- Maintaining Availability And Reliability Of IT services
- Providing Availability Cost Effectively
- Proactively Addressing Availability Improvements Where Needed
Key ActivitiesThe key activities for this process are:
- Determine availability requirements
- Compile availability plans
- Monitor availability
- Monitor maintenance obligations
- Provide management information about Incident management quality and operations
Examples of Key Process Performance Indicators (KPIs) are shown in the list below. Each one is mapped to a Critical Success Factor (CSF).
Maintaining Availability and Reliability Of IT Services
- Number of incidents caused by hardware failures
- Number of incidents caused by maintenance failures
- Number of incidents caused by resilience failures
- Number of incidents caused by security failures
- Number of incidents caused by operational failures
- Number of incidents caused by application failures
- Number of incidents caused by data issues/problems
- Number of incidents caused by lack of support skills
- Number of incidents caused by customer actions
Providing Availability Cost Effectively- Percentage of delivery cost per customer related to availability activities
- Percentage of delivery cost per customer related to resiliency measures implemented