Availability Management

The goal of the Availability Management process is to ensure that the level of service availability delivered in all services is matched to or exceeds the current and future agreed needs of the business, in a cost-effective manner. Availability Management is concerned with the design, implementation, measurement and management of IT Infrastructure Availability to ensure the stated business requirements for Availability are consistently met (ITIL, 2003) (ITIL, 2007).
Terminology:

  • Availability is the ability of an IT Service or component to perform its required function at a stated instant or over a stated period of time.
  • The reiliability of an IT Service can be qualitatively stated as freedom from operational failure.
  • Maintainability relates to the ability of an IT Infrastructure component to be retained in, or restored to, an operational state.
  • Security is related to confidentiality, Integrity and Availability (CIA) of the data associated with a service; an aspect of overall Availability.
  • Serviceability decribes the contractual arrangements made with Third Party IT Service Providers

Participants: 1 to 4
Hours: 1
Participants are involved in:

  • Determine requirements of availability of IT Service
  • Restaruration of IT service after an interruption.
  • Improve of the availability of IT Service

 

1) Mark with an “X”, the activities associated to Availability Management defined in your organization:

  • Determining the Availability requirements from the business for a new or enhanced IT Service and formulating the Availability and recovery design criteria for the IT Infrastructure(  )
  • In conjunction with ITSCM1 determining the vital business functions and impact arising from IT component failure. Where appropriate reviewing the Availability design criteria to provide additional resilience to prevent or minimize impact to the business (  )
  • Defining the targets for Availability, reliability and maintainability for the IT Infrastructure components that underpin the IT service to enable these to be documented and agreed within SLAs, OLAs and contracts (  )
  • Establishing measures and reporting  of Availability, Reliability and Maintainability that reflects the business, User and IT support organization perspectives (  )
  • Monitoring and trend analysis of the Availability, Reliability and Maintainability of IT components (  )
  • Reviewing IT Service and component availability and identifying unacceptable levels (  )
  • Investigating the underlying reasons for unacceptable Availability (  )
  • Producing and maintaining an Availability Plan which priorities and plans IT Availability improvements (  )

Based on the activities selected which documents, roles and organizational units can be referred:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Others activities defined in the organization:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

 

2) Mark with an “X”, the steps associated to determine the Availability requirements that you recognize in your organization:

  • Determine the business impact caused by loss of service (  )
  • From the business requirements specify the Availability, reliability and maintainability and serviceability requirements (  )
  • For IT Services and components provided externally, identify the serviceability requirements (  )
  • Estimating the costs involved in meeting the Availability, reliability, maintainability and serviceability requirements (  )
  • Determine with the business if the costs identified in meeting the Availability requirements are justified (  )
  • Determine from the business the costs likely to be incurred from loss or degradation of service (  )

Based on the steps associated to determine the Availability requirements selected which documents, roles and organizational units can be referred:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Others steps associated to determine the Availability requirements defined in the organization:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

 

3) Mark with an “X”, the methods and techniques that you recognize for indentifying availability improvement opportunities in your organization:

  • Component failure Impact Assessment (CFIA2)  (  )
  • Fault Tree Analysis (FTA3) (  )
  • CRAMM—can be used to identify new risks and provide appropriate countermeasures associated with any change to the business availability requirement and revised IT Infrastructure design (  )
  • Systems Outage Analysis (SOA4) (  )
  • Technical Observation Post—its purpose being to monitor events, real time as they occur, with the specific aim of identifying improvement opportunities or bottlenecks which exist within the current IT Infrastructure.

Based on the methods and techniques selected which documents, roles and organizational units can be referred:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Others methods and techniques defined in the organization:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

4) Mark with an “X”, the tools of Availability management defined in your organization:

  • IT component downtime data capture and recording(  )
  • Database  repositories for the collection of appropriate Availability  data  and information (  )
  • Report generation  and Statistical analysis (  )
  • Availability Modeling—is required to forecast availability and to assess the impact of Changes to the IT Infrastructure (  )

Based on the tools selected which documents, roles and organizational units can be referred:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Others tools of IT service defined in the organization:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

 

5) Mark with an “X”, the Performance Indicators of Availability management defined in your organization:

  • Availability of IT services (  )

Availability: the ability of a service, component or CI to performance its agreed function when is required. It is often measured and reported as a percentage:
Availability (%)= 

  • Reliability of IT services (  )

Relaibility: a measure of how long a service, component or CI can perform its agreed function without interruption. It is often measured and reported as Mean Time between Service Incidents (MTBSI) or Mean Time between Failures (MTBF):

Reliability (MTBSI in hours) =

Reliability (MTBF in hours) =

  • Maintainability of IT Services (  )

Maintainability: a measure of how quickly and effectively a service, component or CI can be restored to normal working after a failure. It is measured and reported as Mean Time to Restore Service (MTRS) and should be calculated using the following formula:

Maintainability (MTRS in hours) =

Manage availability and reliability of IT service:

  • Percentage reduction in the unavailability of services and components (  )
  • Percentage increase in the reliability of services and components (  )
  • Effective review and follow-up of all SLA, OLA and underpinning contract breaches (  )
  • Percentage improvement in overall end-to-end availability of service (  )
  • Percentage reduction in the number and impact of service breaks (  )
  • Improvement in the MTBF (Mean Time Between Failures) (  )
  • Improvement in the MTBSI (Mean Time Between Systems incidents) (  )
  • Reduction in the MTRS (Mean Time to Restore Service) (  )

Satisfy business needs for access to IT services:

  • Percentage reduction in the unavailability of services (  )
  • Percentage reduction of the cost of business overtime due to unavailable IT (  )
  • Percentage reduction in critical time failures, e.g. specific business peak and priority availability needs are planned for (  )
  • Percentage improvement in business and users satisfied with service (  )

Availability of IT infrastructure achieved at optimum costs:

  • Percentage reduction in the cost of unavailability (  )
  • Percentage improvement in the Service Delivery Costs (  )
  • Timely completion of regular Risk Analysis and system review (  )
  • Timely completion of regular cost-benefit analysis established for infrastructure Component Failure Impact Analysis (CFIA) (  )
  • Percentage reduction in failures of third-party performance on MTRS/MTBF against contract targets (  )
  • Reduced time taken to complete (or update) a Risk Analysis (  )
  • Reduced time taken to complete an availability Plan (  )
  • Timely production of management reports (  )
  • Percentage reduction in the incidence of operational reviews uncovering security and reliability exposures in application designs. (  )

Based on the performance indicators selected which documents, roles and organizational units can be referred:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
Others performance indicators defined in the organization:
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

 

 

Research Group at the National University of Engineering in Nicaragua

Authors Johnny Flores Leonel Plazaola

 

References

ITIL. 2003. Service Delivery. Second edition. s.l. : The Stationery Office, 2003. 0-11-330015-8.
—. 2007. Service Design. s.l. : The Stationary Office, 2007. 978-11-331047-0.

 

1 ITSCM—IT Service Continuous Management—is focused on ensuring that IT technical and service facilities can be recorvered wihin required, and agreed, business timescales.
2 CFIA can be used to predict and evaluate the impact on IT Service arising from component failures within the IT Infrastructure
3 FTA is a technique that can be used to determine the chain of events that causes a disruption to IT Service.
4 SOA is a technique designed to provide a structure d approach to indentifying the underlying causes of service interruption to the User.