Rechercher Contactez-nous

Suivez-nous sur Twitter

Freely subscribe to our NEWSLETTER

Opinion

Dominic Hill, Siemens Enterprise Communications Ltd: Business Continuity – or is it? Are we missing the point??

February 2008 by Dominic Hill, Consultant, Siemens Enterprise Communications Limited

There have been a number of papers and presentations recently looking at the nature of Business Continuity (BC) and tools used to deliver it – from the future of the BIA to the importance of building evacuations. With the imminent arrival of Part 2 of the British Standard for Business Continuity Management (BS 25999-2), there will be a defined management system – the BCMS - and a means of measuring performance of Business Continuity capabilities, should organisations choose to do so. But are we missing something? Have we created our own definition of continuity?

The Oxford English Dictionary (1999 edition) defines continuity as “the unbroken and consistent existence or operation of something over a period of time”.

In BS 25999-1:2006, business continuity is defined as “strategic and tactical capability of the organisation to plan for and respond to incidents and business disruptions in order to continue business operations at an acceptable pre-defined level”.

In this definition, the “unbroken and consistent existence” has been replaced with “plan for and respond to” and “continue”, words which imply reaction and recovery. If we look at the services offered within the BC/DR arena today, it is easy to see the focus on responding to incidents and recovering capabilities in:

• The provision of disaster recovery services;
• The provision of work area recovery services;
• The variety of software to generate, maintain and disseminate plans;
• A plethora of communications tools allowing call cascades and other abilities.

Many of these services, and the BC capabilities of the organisations that use them, are reaching levels of maturity never before seen, and are thus giving those organisations a degree of confidence in their ability to recover.

This is laudable, nay essential, as the BC manager’s maxim should be “Expect the unexpected”! But do these services really provide continuity for the business? It could be argued that this is really business recovery, although for some that term has its own distinct meaning. Are we missing something? Would it not be even better to avoid the incident or business interruption in the first place, leaving the recovery for when there is no other option?

Why have a disaster if you can avoid it?

Many organisations spend a significant amount of money and effort on recovery capabilities and the associated plans, but neglect to address the issues that would make the operation more resilient and less in need of recovery in the first place. Could that money be better spent on disaster avoidance in the first place? To a degree the answer is going to be dependent upon the state of the organisation, its ability to change and the willingness, of those in charge, to accept risk.
A key tenet of BS 25999 is “embedding the BCM culture within the organisation” and this is probably the single most important thing when it comes to being pro-active about disasters. When a system, regardless of whether it is business or IT, is designed and operated with continuity in mind, the subsequent need to mitigate risks with recovery capabilities can be reduced.

Resilience: The unbroken operation

In order for a system to have unbroken operation, the threats to that operation must be reduced or removed. When BCM is a recognised part of the daily processes, and not something that gets retrofitted in the later stages of the system lifecycle, it is easy to consider these potential threats at the start of that lifecycle. Typically the causes of threats include:

Location of the system – This has a wide scope and should consider location at all levels – both physically (geographically and within the campus and building) and logically (within the organisation). Taking as an example a new IT system, are there opportunities to implement it in a location discrete from main user population as well as from physical risks arising from location and environmental factors.
From the business viewpoint, the who and how should be considered. Does the system require input from certain members of staff whose roles make them unlikely to be available at the same time? Is specialist knowledge vested in a single individual, thus creating a potential single point of failure?

Access to the system – Again this works at both physical and logical levels. Again considering an IT example, there is little point in implementing a new system and a corresponding recovery capability if the system is situated in a location that does not afford it appropriate protection – environmentally or from a physical security point of view. A classic technology example is siting critical equipment in an IT suite that is used by members of IT staff as a shortcut to other parts of the building. A large number of incidents arise from human error in some shape or form, accidents do happen.
Similarly from a business viewpoint – especially in these days of increased concerns over the safety of data – who has access to what, by what means and for what purpose must be considered. For example, are personnel records only available as paper copies – if so where are they held, is it secure?

Design of the system – A single IT system can look cheaper than a design that addresses potential single points of failure with some sort of redundancy of functionality. On paper that is. When the cost of the corresponding recovery capability is included the picture may be very different. Similar arguments exist for non-IT tasks, where the ability for multiple teams (possibly on different sites) to carry out the same activity can address not only loss of site scenarios but also loss of staff – whether through pandemic or other cause.

Systems documentation - or the lack of it - In today’s fast moving world it is not uncommon for less than ideal documentation to be produced during the development phases, as the pressure to deploy the system increases. Limited documentation leads to a potential lack of understanding of how things work, which increases the threat of mistakes. Furthermore it is very hard to maintain and protect the system if it is not clearly understood where the interdependencies lie and the possible impacts when changes occur around it.

Understanding the business is one of the four stages in B2 25999 and is as essential to the resilience aspects of BC as to the recovery aspects. Good systems documentation has a major part to play in this.

Control of changes to the system – most systems will, after an initial period, operate in a steady state, until something changes! This is especially true in IT, which due to the ever developing nature of the technology is probably subject to more change than most business processes – the changes occurring in the form of software patches, upgrades, hardware enhancements for capacity improvements etc. The same can also be seen in the non-IT space, where changes to business process manifest as the results of mergers and acquisitions or the outsourcing of parts of the operation. By controlling the way change occurs – especially considering the impacts from all aspects – the threat from change can be minimised.

When these areas are considered throughout the whole lifecycle of a system and appropriate decisions made, the result will be a more resilient system that is fit for the purpose for which it was intended. As with anything in the BC space, this is not rocket science, just common sense, but it appears to be something that is often ignored in favour of cheaper or short-term solutions or because the challenges are too great.

Challenges associated with implementing resilience

Implementing resilience can have significant challenges associated with it, including:

• Cost;
• Outsourcing/Supply chain management;
• How to get there from here

However, each of these challenges provides a means to it’s own solution as they can be used to improve resilience.

Total Cost of Continuity

This is a variant of the well known “Total cost of ownership” concept and is proposed here as a means to understand exactly what costs are incurred in providing true continuity for an organisation.

Typically organisations look at their recovery contracts, sum the costs and label the result as the cost of BC. This is misleading as it takes no account of the cost involved in setting up and maintaining BC within the organisation. In particular it ignores the cost of resources required for the exercising (testing) of recovery plans, both IT and non-IT. These costs can be quite considerable when the effort required for preparation and carrying out exercises across the different departments is considered, but they are often lost within the operational costs of the departments involved. Also. the more specialist the recovery processes the more resource is required, in addition to a potential for greater frequency of exercising (to ensure that all appropriate staff gain the necessary experience).

If a more realistic approach is taken and the resource and exercising costs (in particular) are included, the total cost of continuity may well look very different. This may provide sufficient justification for implementing a more robust design that negates the need for much recovery.

Outsourcing

More and more the outsourcing of discrete parts of operation is seen as a cost saving exercise. While this may be true, there may also be benefits in the form of decoupling those parts of the operation physically as well as logically. Resilience may be improved, but out of sight is out of mind as the saying goes – so the emphasis shifts to one of supplier management, which must be supported by carefully prepared and suitably detailed legal contracts. This is an area of BC that is experiencing rapid growth as organisations mature in their own continuity capabilities and start to look more closely at those suppliers (outsourcers included) on which they depend.

Change as a mechanism for delivering resilience (and hence continuity)

Applying changes to an existing system in order to improve resilience is rarely easy – especially if it involves withdrawing previous access. It is easy to argue that things “have always been done that way” and that disasters had not occurred so change is unnecessary. The point can be illustrated with statistics, but not conclusively, for either side! The governing factor must be what is best for the unbroken operation of the business in a fit for purpose solution.

Fortunately, change can work in favour of these attempts to achieve resilience. In the area of technology (not exclusive to IT) the need to refresh equipment every three or four years provides an opportunity to implement measures to improve resilience. Similarly in the business space, changes in process, whether brought about by technology or changes in business practice, can be used to improve resilience here too.

Subscribe

Freely subscribe to our NEWSLETTER

See previous articles

See next articles

Security Vulnerability

Toutes nos news en Francais

Alle unsere News auf deutsch

Your podcast Here

All new podcasts

Global Security Mag Copyright 2011