Bad things can and do happen, but rather like insurance, business continuity and disaster recovery (BC/DR) plans are things that most companies hope they will never need to use. Of course, this presupposes that business has a plan of action and that everyone knows what to do when the balloon goes up.
When something does go wrong, it is already too late to start worrying about plans. Naturally, certain industries face specific risks that must be anticipated – a mining company, for example, clearly has to be prepared for an underground collapse, flooding or fire, not just as a preventative measure but also to ensure everyone knows what to do should the worst happen. But how do companies prepare for seemingly random yet potentially disastrous events?
Firstly, it is as well to acknowledge that there is a thin line between preparedness and paranoia when it comes to planning; in most geographies the risk of losing an entire organisation in a biblical flood is surely quite low! But what happens if a major event prevents most or all employees from getting to work? What happens if there is a major power outage or computer connections fail for an extended period? Who takes charge? How will key business partners and clients be informed in such an event? Does everyone affected know what to do and when to do it? It is not difficult to think of events in recent times that fit this description; suddenly a plan seems like a sensible proposition.
The kind of catastrophes (potential or actual) businesses may face are many and varied and include: single or multiple system failure (such as servers, internet, telephone or power outages), the loss of office or workplace accessibility or even total loss of real estate (perhaps a result of a terrorist attack, fire or a natural disaster such as flooding, earthquake or storm). The loss of key staff in an air disaster or during a national or international pandemic is possible. Mass industrial action by operational staff too will shut down a business quickly, but less obvious personnel disruptions may also impact a business; in 2013 not one of the ten employees of a UK recruitment agency based in Liverpool turned up for work the next day following their syndicate’s £28m National Lottery win.
What is business continuity and disaster recovery?
A business continuity (BC) plan is a means of enabling companies to help themselves prepare for the worst and to recover and sustain operations during and after an event as quickly and cost-effectively as possible. Essentially, a BC plan is a fully-documented agreement between management and key personnel (with the buy-in of all staff) that is taken in advance and which covers the steps the organisation, and particular individuals, must take to ensure critical operations are protected.
At its most fundamental level, a BC plan may be the difference between survival and failure. But even in purely commercial terms, being prepared limits the possibility of having to call for assistance in a state of desperation (always very expensive) and goes a long way to maintaining or even enhancing client confidence.
For preparedness to be effective, the core of a BC plan must be sufficiently flexible to cover a multitude of unpredictable scenarios, as above, offering sufficient protection against internal impacts across multiple geographies from local to global, as appropriate. But it must also consider the external impact, including the likely effect on key business partners and the ramifications of their failure to operate on your own business and the impact of your failure on theirs. In essence, a BC plan needs to be a living, evolving and regularly tested strategy that will give a business the best chance of survival if the worst happens.
A key part of a BC plan is a series of functionally-specific disaster recovery (DR) plans. These are commonly IT-driven, focusing on recovery of software, hardware and data to at least allow resumption of critical business functions following an event. A BC/DR plan must also consider the effect on each function of the loss of key personnel by providing a contingency plan.
Key elements of BC/DR planning
According to DisasterRecovery.org, an independent organisation that provides guidance and information on disaster recovery, a plan must include the following stages:
A policy statement, stating the goal of the plan, the reasons for it and the resources required.
A risk assessment will identify the situations that are most likely to occur.
A business impact analysis, describing how a catastrophic event may impact the business practically, financially and in other ways. It should also try to identify any preventive steps that can be taken.
Recovery strategies must explain how and what needs to be recovered and with what priority/speed.
The plan development stage will require documentation of the plan and implementation of elements as required.
Plan buy-in and testing is essential to ensure everyone knows and understands what the BC/DR plan is, what to do and when.
Plan maintenance and testing is important to ensure it is relevant and that it works.
Whilst third-party system vendors should be included in any BC/DR planning process to ensure they have the capacity to deliver when they are needed most, they should not be seen as a ‘get out of jail free’ card. Asking the right questions of them is an essential part of taking responsibility for DR/BC planning. Key points to raise (and include in any Service Level Agreement) would be:
How long will it take to recover operations following an event (referred to as the Recovery Time Objective)?
How much data could potentially be lost (Recovery Point Objective)?
The reliability (proven up-time) of the platform.
It is one thing having your own house in order, but how do critical system providers such as banks and core technology vendors view the disaster management process?
The core technology vendor
Properly executing, these stages can provide a business with reassurance that it is prepared for the worst. However, a common problem, according to Phil Pettinato, Chief Technology Officer of TMS vendor, Reval, is that within a company there is often no clear ownership of DR. “A lot of business operations people – including treasury – think IT will take care of it,” he notes. Whilst this may be the case, those IT people may not always fully understand how critical each business operation is. This suggests a lack of co-ordination which, when creating a plan, is unhelpful at best. “Each business operation is responsible for ensuring it has a clear plan but that does not mean it can build and execute it on its own,” states Pettinato.
Of course, a SaaS-based TMS provider such as Reval should have a responsibility to its clients to provide DR as part of the deal, but it is the clients’ responsibility to know what to do in the event of a disaster. The same goes for the vendor in consideration of its own operations. Although Reval’s own IT function co-ordinates these plans, with guidance from an internal audit operation, ownership is very much accorded to each business unit. This ensures each is able to identify its own critical systems and operations and to put in place and test an effective plan so that everyone knows what to do and when in a co-ordinated manner.
Rather than isolating BC/DR processes, Reval tries wherever possible to bring them into its daily operations. By making them into “a second alternative to operating our business” and by actually using that alternative periodically they become ingrained into the collective consciousness of the staff, explains Pettinato. “Once a month or once a week we will operate using our DR platform; this is tied into the production platform to make sure it is operational.” He cites having seen companies build up “impressive DR and BC platforms, test them a couple of times and then forget about them”. But it is important to keep those platforms and procedures up to date and make them part of your operations. “If you are using it regularly you will know it works.”
The documentation of BC/DR must have a “lead of governance” in that it ensures the business does what it has to do, and that it is to the point in stating what to do in an emergency. “If you keep it simple you will be able to maintain it and operate from it on a regular basis,” notes Pettinato. Indeed, he urges companies to take a pragmatic approach to policy building. “If you over-document you will spend too much time managing that documentation.”
Reval’s practical BC plan for its own business operations (as distinct from its client operations) allows it to operate from a number of different offices and even virtually, with staff able to connect remotely if necessary. It has all of its core infrastructure and systems in professional co-location facilities that offer redundant power supplies, communications links and so on, and it also replicates all of its data in real-time using two different data centres connected but situated in geographically diverse locations.
With many companies globalising their operations, thinking in terms of operating from another location is not unusual. But Pettinato further urges companies embarking on expansion – especially through acquisition – to make sure that all operational locations are prepared for major events and, if possible, to try to leverage those locations in the BC plan context; having operations in the US, Europe and Asia for example gives fail-over options across a wide spread of time zones.
However DR is managed, simply backing up data every day and sending it over the internet to another location may have been okay a few years ago, but in a world of Big Data and complex analytics, losing a day’s worth of data is a big deal for many businesses. “Any company that believes it can get away with running a simple daily backup and restoring from that is clearly running a huge risk,” comments Pettinato.
Whilst best practices such as real-time replication are gathering momentum in this space, he has one further thought to share. “You need to think about what your backup for your backup plan is too,” he warns. “Once you have lost your primary system and you are likely to be on your DR platform for an extended period, you are on your own.”
When disaster strikes, ‘keep calm and carry on’ would be a suitable adage for treasurers, but it would be hoped that the banks would play their part in keeping the machine moving. Routine operations such as making payments and checking cash positions become a serious challenge when a host-to-host banking platform is unavailable following a major event.
Banks are cognisant of this fact and Cindy Murray, Head of Global Treasury Product Platforms and eChannels, Bank of America Merrill Lynch, says clients in this situation will be advised to use the bank’s online banking platform as a means of carrying on in the interim. “If a client cannot send a file to us, they can go online to initiate urgent payments, including payroll,” she says. “If a client receives its banking intra-day and prior-day statements host-to-host, we can put those statements, in the same format, online as part of a disaster recovery plan.”
Incorporating mobile solutions into DR/BC planning is sensible but requires preparation. Accessing online banking requires the right people to have the security credentials and tokens necessary to function but they also need to know how to execute transactions in an emergency. “We recommend our clients test the process at least annually so that they know how to release manual payments,” advises Murray. She adds that it is also essential to have a process in place to avoid duplication of manual payments that may be contained in the original files if those files eventually make it through to the bank via the normal channels.
Whilst inclusion of banking in DR plan is crucial, corporates are curiously quiet when it comes to checking the preparedness of their key partners, notes Paul Taylor, Head of Sales, GTS EMEA, Bank of America Merrill Lynch. There is, he says, “an expectation nowadays that our products will conform to BIS (Bank for International Settlement) principles and stand up to any DR scenario”. Rules, such as the minimum acceptable distance between a bank’s data centres, exist “to give a level of common comfort” for clients, says Taylor. “But I think it is implicit in what we do that we could demonstrate our capacity to still operate as their bank, that we could still provide services and that we do have plans in place.”
The implicit assumption that tier one banks are prepared for the worst has substance. Murray confirms that Bank of America Merrill Lynch runs contingency production systems for each application from a different data centre. The system has the capability and capacity for ‘active-active’ operation for “mission-critical applications” such as reporting and payments. This means it can take transaction flows into multiple sites enabling real-time synching of databases. In the event of a disaster, there is minimal disruption because the data is already live.
“Banks exist to provide security and must demonstrate that we can function whatever happens,” comments Taylor. He believes there is increasing market interest in the sustainability of platforms, business models and processing capabilities. He also notes that the industry is seeing more co-sourcing of technology and more platform investment. “When you look at 9/11, at what happened in Japan in 2011, and at all the different security considerations around the world today, we could not predict these events, but we still have to be able to answer clients’ questions.”
Whoever is entrusted with BC/DR, internally and externally, it is essential for all concerned to have the required skills to be able to execute the plan in an emergency. As mentioned above, an assessment of suitable third-party providers must be carried prior to engagement, but to ensure readiness of all concerned internally, the plan must become part of company culture. This is the advice of Paul Kirvan, an independent BC consultant, member of the BCI Global Membership Council and TechTarget network contributor. To bring this state about, Kirvan suggests frequent communication to all employees – including briefings to senior management – of current BC/DR activities. This process could be used to demonstrate how the company’s programme can add value to their activities, protect their safety and wellbeing and even their jobs. BC/DR training should also be given to all new employees, with regular refreshers for existing employees. He further points out that acceptance by the company as a whole is more likely if senior management have bought into it, adding that formally enshrining BC/DR into policy – by stating that the policy is to be followed by all employees – will further drive acceptance.
However, Kirvan warns that BC/DR programmes may ultimately fail “because the organisation – at all levels – does not adopt and incorporate business continuity as part of its ongoing business processes and methods”. So, without across-the-board acceptance of a well thought out and managed BC/DR plan, it seems that companies should expect the worst when the worst happens.