Your system is dead?

Red rose growing through soil against spooky tree

When a core treasury system such as a TMS or ERP fails, either partially or catastrophically, you had better be prepared to take control. But how likely is this to happen? And even if it does, what can be done? Treasury Today looks at what happens and what to do if your TMS or ERP goes down.

You may believe that you have the most secure and dependable technology known to treasury-kind but the fact is, there are no infallible systems on the market. This sounds like a gloomy prognosis but system failures occur, sometimes without any warning. For treasurers, the key questions that should be asked are: ‘are you aware of the risk, and will you be ready to deal with the outcome?’

Catastrophic system failure resulting in a significant and sustained loss of access, loss of data, or damage to hardware or its total failure is extremely rare, notes Alex Ellison, an independent consultant (and a former Director of Treasury Solutions Business Development at SAP). In the event of a catastrophic or partial failure, she believes most “mission critical” systems within a large organisation will be (or should be) capable of ‘hot switching’ between mirrored data centres to allow almost immediate take up and continued use of live data.

Unfortunately, notes Bob Stark, VP Strategy at technology vendor Kyriba, most treasurers will have previously made the assumption that their system is well-supported and, as such, do not necessarily have a recovery programme beyond their treasury system being available. As a result, he believes that many will be severely tested when the worst happens.

Is it really likely?

But what are the chances of a modern ERP/TMS failing to perform either catastrophically or partially? Outside of substantial physical damage caused by natural disaster or terrorist attack, for example, the risk really is uncertain says Tim de Knegt, Head of Strategic Finance and Treasury for the Port of Rotterdam. “With standardised processes, a good and integrated testing procedure, and knowledgeable people, there should be a limited likelihood of this happening,” he believes. “But real life experience shows us that system failure – including ERPs and TMSs – happens rather more than we might imagine in recent years.”

One of the reasons that there is a reasonable chance of failure of a core system is down to the model of deployment notes Stark. Although diminishing in number, an internally installed TMS is completely dependent on the configuration offered when it was deployed: if the in-house server goes down it will take the TMS with it and if IT has not built a fully redundant backup to take over, then treasury will be system-less for the duration. Similarly he says non-cloud hosted software runs the same risk with a single point of failure. “The risk is a lot higher than people realise because the majority of systems on the market are not true cloud and do not have the capability of behind-the-scenes disaster recovery and business continuity.”

In practical terms, the risk comes down to whoever manages the IT component: if it is installed on premises then it will be internal IT’s task to get it back. The majority of systems sold today are hosted and, states Stark, the vendor has 100% responsibility to ensure that their system is up and running and goes back online following disruption. “This is why many organisations are not choosing on premise any more – but then they must rely on the vendor having everything in order and having made all the right investments to be able to get you back up and running very quickly.” Service Level Agreements (SLAs) provide a commitment to delivery but if the business continuity and disaster recovery is not effective and downtime exceeds what is comfortable then Stark says it doesn’t matter whose responsibility it is, “ultimately treasury is made to suffer”.

The immediate effect

Indeed, should the worst happen, there will be a number of immediate concerns to tackle and the absolute prerequisite is “to avoid, under every circumstance, panic and uncontrolled action”, warns Thomas Stahr, Managing Partner of Stahr Treasury Consulting and a senior treasurer of many years’ experience. This is where robust planning comes into force. In practical terms, the first task is to convene an emergency meeting with the most senior responsible personnel. “Designate an immediate task-force, ensure clear definition of tasks and responsibilities and enlarge it where appropriate and necessary,” he advises. This and all subsequent direction will form part of the business continuity plan (BCP), of which more later.

Consider it as an opportunity to avoid it in the future by introducing a scenario-procedure guideline, keeping in mind that second time around it could be worse and have an ever harder impact.

Thomas Stahr, Managing Partner, Stahr Treasury Consulting

Back in the here and now, with the sudden loss of a TMS or ERP, the impact on the daily treasury process will straightaway hit payments, the reconciliation of bank statements and related cash flows on all external and internal accounts. “Still having ‘old’ permissions to get immediate access to the accounts with individual banking software would be an asset,” notes Stahr. “If not – and this is mostly the case – at least an up-to-date contact sheet for your bank account managers, with their phone numbers and emails, is a must.” In such a situation, he adds that re-learning how a fax machine works would be beneficial too. The same burden applies to all following procedures such as gathering information on exposures, maturities of trades, the need for new trades, and so on.

De Knegt immediately alights on the ability to make payments too and adds concern over how open futures positions are closed without a trading platform. He further notes that it will be necessary to somehow secure accurate market data and bank statements and to acquire cash forecasts from subsidiaries. The bad news in all this, he warns, is that the most probable immediate remedial action for treasurers will see them having to resort to a series of manual steps – typically calling upon spreadsheets and bank portals for essential activities – to make do until their workflows within the wider system are back online.

As events unfold in the minutes and hours following a major technical event, Stahr is adamant that not only is it essential to have treasury personnel with a solid grounding in all underlying treasury processes but also that a “good team-spirit” pervades the entire operation. “It must be clear to everybody that nine-to-five days are suspended as long the problem exist,” he cautions. As such, “serious but calm communication” with all stakeholders will be required of the senior treasurer. However, detail should be only as much as is necessary to avoid confusion. In the case of catastrophic system failure, all upstream process stakeholders – including accounting, controlling, the CFO and possibly even the Board – must be made aware of the situation without delay.

It is important too for certain downstream participants to be alerted: risk owners (business units bearing the underlying risk) should be clear that their orders for hedging need to be submitted now by e-mail or fax for the duration. Doing so by phone is not advisable says Stahr. “The phone line will be glowing hot anyway; avoid making the situation any more hectic than it is already.” Conversely, he warns of the need to keep awareness of the problem “at the lowest possible level” as far as external relations are concerned, avoiding reputational damage and stakeholder panic.

After the fact

Once the worst has been contained there will be time for reflection. Post-event analysis should reveal any limitations. “Consider it as an opportunity to avoid it in the future by introducing a scenario-procedure guideline, keeping in mind that second time around it could be worse and have an ever harder impact,” notes Stahr. However, he advises treasurers not to wait for the worst to happen but to try instead to pre-empt worst-case scenarios and to present them alongside the usual risk-mitigation strategies to the risk committee.

To help mitigate the risk of system failure, both Stahr and De Knegt refer to a ‘scenario handbook’ which should be one of the first ports of call for any treasury system failure. Creating this will come either from experience or from stress-testing various ‘what if’ situations for disruption of the different functions of a TMS or ERP.

When initiating steps to mitigate the risk of technical failure, Stahr urges participants never to be afraid of thinking “unpopular” thoughts. For a third-party system failure, this may demand serious questioning of the vendor’s capability under pressure. If its response was less than satisfactory, he says reflect carefully and consider finding an alternative; this may be an unpopular decision but a working and well-supported system is an essential tool. “The risk of a fail is too serious not to take full control”.

Having a good relationship with a core system vendor is thus vital and it is a joint responsibility between the treasury team and internal IT not only to find the most suitable technology but, as Ellison notes, also to ensure that the vendor has continuity in terms of investment in R&D and its own financial resilience. The impact of ongoing consolidation in the TMS market and the issue of ownership is one that must be considered in the context of product longevity.

It is your problem

“In essence, responsibility for uptime of systems lies with the IT department,” says De Knegt. “However, it is quite a bit easier for a CFO to place this responsibility with the treasurer to ensure they are on top of the game and that there is an understanding of the implications and which risks need to be mitigated.” For Stahr, responsibility begins with the treasury user and he believes all should be obliged to escalate issues to try to resolve problems, even though this may temporarily create more work. “Never try to resolve apparently simple problems with workarounds – that’s no solution at all, that’s just a risk-increasing action.”

Rewinding to a time before anything has actually happened, Stahr has a further, if unexpected, take on accountability. He believes that some treasurers may be storing up trouble for themselves should the worst happen. “Running a TMS is on the one hand increasing the efficiency and safety of treasury processes but on the other it can allow treasury to drift into standard processes where tight guidelines do not permit the treasury manager individual thoughts.” Some treasury staff, he feels, are “becoming the function of pressing the right button at the right time within a strict daily, weekly or monthly guideline”. The net result is that in the event of core system failure, treasury personnel who learned the business “mostly as a non-creative job” and who are suddenly thrust into an emergency situation “are often not able to think in an alternative way”.

There is another issue to consider here concerning the level of treasury understanding. “What can be difficult for some is that maybe before treasury invested in a TMS the requirements were less complex,” says Stark. When investing in such a system, he notes that treasuries are not necessarily trying to recreate what they did before; they naturally want to take it to the next stage. “As they adopt the capabilities of a treasury system, there will be an expected increase in complexity. Suddenly it becomes that much more difficult to replicate advanced processes for anything more than a couple of hours at a time.”

Regardless of the origin of such difficulties, in the event of a major system failure, Stahr asserts that of necessity there may need to be a “quick and dirty education” on treasury fundamentals. “Assuming that there is anybody in the treasury department who has good practical experience working without a TMS, then only now will some staff learn those pure underlying treasury processes,” he comments.

Continuity planning

Of course, as a general guide to maintaining treasury operations under emergency conditions, a Business Continuity Plan (BCP) – which should also include a Disaster Recovery (DR) plan – is an essential tool for any business. This should be clearly documented, easily accessible and regularly tested. BCP should cover likely emergency scenarios and provide the broad means of keeping critical business functions running following such an event – the emphasis being on ‘critical’, states Ellison. It will include input from multiple functions and cover the direction of people, locations and technology. The role of DR is that of a subset of BCP and is typically an IT-driven set of procedures that focus on the recovery of software, hardware and data.

A number of key BCP elements are noted by Michael Baum, Senior Manager, KPMG, in his December 2015 Insight piece in KPMG Corporate Treasury News. There are two determining factors that are the guiding principles for the generation of a treasury-specific BCP, he writes: availability and efficiency. The key enquiry when devising an approach to availability is to pinpoint the maximum tolerated period that any given process can be forgone. Importantly, identifying critical processes must primarily be the responsibility of treasury: all other steps – in particular IT technical steps – must be based on the outcome of this analysis. “Issues of possible threats, risk mitigation and security needs, particularly for time-sensitive treasury processes, logically lead to greater investment needs to protect availability,” suggests Baum. “This is where the second guiding factor comes into play: the efficiency of requirements needs to be ascertained to achieve the best possible balance between investment and risk tolerance.” In short, he contends that an exercise in prioritisation is essential not just for operational needs but in terms of economic effectiveness too.

Know your system

Of course, the best protection is never to let your system fail in the first place. If the right decision is made when selecting core technology, treasury dramatically reduces the chances of having to face a major disruption, says Stark. Indeed, under such circumstances, he feels total failure becomes “extremely unlikely”. If, however, treasury has made some incorrect assumptions around the capabilities of its technology, it could find itself in a situation where workflow cannot be brought back online as easily as needed. If service is not resumed within a comfortable timeframe then resorting to manual operations is almost inevitable.

To reduce the likelihood of a major technology fail, treasurers should therefore in the first instance ask their supplier (internal or external) what the backup plan is, should the system go down. Be sure to clarify what ‘backup’ actually means, warns Stark. Simply backing up a database to ensure data is not lost is one thing, but a treasury system is a vastly more complex platform requiring a whole new approach. Indeed, questioning and understanding whether a core treasury system is suitably protected may not necessarily sit within the skillset of the typical treasurer and there is a wide range of details within this line of enquiry which must be dealt with satisfactorily. Unless fully conversant with the likes of SOC 2 Type 2 evaluation reports and the new Trusted Service Principles, Stark urges “bringing your experts into the room to make sure you are asking the right questions and making the right decisions”.

Ultimately, treasury must not be on an island from an information security standpoint, let alone one that meets all the needs for hosting, disaster recovery and business continuity. For Stark, the reason for adopting this viewpoint is simple: “It is not a pretty situation to go back to a manual workflow”.

Risk Management

Published: Mar 2016