Cloud Adoption: A Deep Dive into the Swiss Cheese Model
Barry Smart was the CTO at Hymans Robertson, the largest independent firm of consultants and actuaries in the UK, between 2012-2019. He was responsible for leading the firm's technology strategy, modernising the 100 year old organisation, and transforming them through the adoption of Microsoft's Azure Cloud platform to enable them to innovate faster and produce new digital products and services.
Since 2013, endjin has been helping Hymans Robertson with their Azure adoption process. As a Financial Services organisation, they are regulated by the FCA, and as early adopters of Azure, were at the bleeding edge of storing Personally Identifiable Information (PII) in the cloud. THis meant that they were ahead of the regulator, were breaking new ground, and needed a evidence-based approach to demonstrate to the regulator that they could identify and mitigate the risks of moving to The Cloud.
Since developing this approach, we (endjin) have used it for many different customers; from global enterprises to early start-ups, in many difference sectors. It's a useful strategic tool to help you understand the the gotchas of moving to the cloud, and how you can put people, process and technical barrier in place to prevent the worse outcome from ever happening.
In this three-part series, Barry Smart describes the risk & mitigations analysis process we came up with, and explains how you can use the same process to understand the risk of your own cloud journey.
- Part 1 - Cloud Adoption: Risks & Mitigations Analysis
- Part 2 - A Deep Dive into the Swiss Cheese Model
- Part 3 - Benchmarking the Cloud against on-premise data centres
This is the second in the series of blogs about a risk driven approach to embracing Public Cloud technology.
Part 1 described the process I adopted to engage a wide range of stakeholders to bring them on board with adopting the Cloud. I used a risk analysis technique called the Swiss Cheese Model as a structured way of exploring the risks, mapping out the key controls and consolidating our due diligence. I then used the one page summary of this model as a key artefact when engaging with stakeholders.
In this second installment, I want to walk through the Swiss Cheese model in more detail. I'm going to walk through the model in the order that we constructed - we began with the end (undesired outcomes), then flipped to the start (threats) before completing the middle (barriers).
The first step we took was to identify the set of undesired outcomes on the right Swiss Cheese Model. There were three types of outcome we were seeking to avoid:
- Information Security - was our primary concern throughout and covered two key areas:
- Uncontrolled Leakage Of Information – information that is hosted on the Cloud platform is accessed by or released to unauthorised parties;
- Breach Of Data Protection Law – we fail to comply with our obligations under law or the law changes such that the use of Public Cloud becomes untenable.
- Performance - spanned a range of outcomes where the Cloud provider failed to provide the level of performance required to support our business:
- Unplanned Outages – the Cloud platform suffers from a technical issue that takes it out of service;
- Slow Performance – the Cloud platform does not perform as anticipated resulting a poor experience for end users;
- Insufficient Capacity – the Cloud provider is unable to match supply with demand in terms of providing raw infrastructure resources such as network, storage and compute, or scale the services that run on top of this infrastructure;
- Commercials - captured any situation where unfavourable commercial impact would arise:
- Commercial Overspend – a Cloud tenant does not understand or successfully manage Cloud, this results in charges for usage of Cloud services exceeding budgeted levels;
- Vendor Lock In – the tenant finds is locked into the Cloud provider such that the option of moving back "on premise" or to an alternative Cloud provider is extremely difficult or impossible.
The relative priorities of these undesired outcomes will vary based on different business models. For us, Information Security was given top priority. We used these to drive the completion of the other parts of the model.
Second we developed the threats on the left of the Swiss Cheese Model. We identified 21 of them. These were built up by considering what could trigger each of the undesired outcomes on the right. We based this analysis on direct experience operating our own "on premise" data centres and through research into historic incidents in the IT industry where major outages or information security breaches had occurred.
We identified threats that ranged from external threats such as hacking through to internal threats such as bugs in our own code:
- Force Majeure;
- Terrorism / Activists;
- Criminal Activity / Hacking;
- Utility Service Outage;
- Denial of Service Attacks;
- Snooping by Government Agencies;
- Requests for Data by Government Agencies;
- Regulatory / Legal / Legislative Change;
- Civil Unrest / Pandemic / Wide Scale Industrial Action;
- Data Is Intercepted in Transit;
- Data Centre Hardware Failures;
- Bug / Vulnerability in Infrastructure;
- Cloud Provider Goes Out of Business;
- Contract with Cloud Provider is Terminated;
- Strategic Shift by Cloud Provider;
- Bug / Vulnerability in Application Code;
- Uncontrolled Usage of Resources;
- Spike in Use of Services;
- Enforced Upgrades;
- Disgruntled Employee;
- Mistake by Employee.
The final element we put in place was the core of the model - the barriers. We identified 42 barriers and grouped these into four discrete categories:
- Physical / Technical - are "hard" barriers that are implemented as a combination of physical and technology based mechanisms. These are often the first line of defence. This type of barriers is often the focus of due diligence exercises covering things like physical security of the cloud data centre and technical features such as the ability to encrypt data at rest. We identified 16 barriers in this category;
- Process - barriers that are fulfilled by a specific process, such as a procedure is carried out at frequent intervals in time or a pre-defined action that is triggered by a specific event occurring. These processes are often concerned with protecting the integrity of the physical / technology barriers described above, and therefore become an important mitigation in their own right. We focused on specific processes we felt that were important to managing the threats we had identified. One important example which we looked at in detail was a fully auditable process to manage the secure disposal of storage media. We identified 15 barriers in this category;
- People - "soft" barriers related to the people, their skills and the way they are organised and managed. These barriers are often the last line of defence. A good example was an organisational model that put in place the segregation of duties between key roles. We identified 11 barriers in this category;
- Containment / Recovery - the final set of barriers identified what we thought would be required should the physical /technical, process and people barriers above fail and we triggered an undesired outcome. These barriers were concerned with minimising the impact of the undesired outcome and speeding up the recovery process. A good example being the ability to trigger migration of services to another data centre location. We identified 6 barriers in this area.
We maintained a matrix which mapped each barrier to the threat that it contained. By mapping out this many-to-many relationship, we were able to demonstrate that each threat had multiple barriers in place to mitigate it. This is an important feature of the Swiss Cheese approach: major threats require multiple barriers in order to contain them successfully.
We examined each of these barriers in turn during our due diligence process, seeking out evidence about how the barrier would be put in place.
The 42 barriers we identified are set out in detail on the Swiss Cheese illustration:
Risk management is a shared responsibility
The process of finding evidence for each of the barriers above, highlighted a key point: the overall management of the risks would be a shared responsibility between us (the tenant) and the Cloud provider.
Microsoft have recently published a paper on this concept of shared responsibility. It focuses quite heavily on physical and technical barriers, the Swiss Cheese model expands on this by also considering process and people related barriers.
We categorised ownership of the barriers based on the following four categories:
- Cloud Provider - where responsibility for that barrier lies entirely with the Cloud provider. There can be some important nuances when considering IaaS versus PaaS versus SaaS. So you need to make some important decisions here about your cloud architecture in as part of completing this due diligence. A good example of a barrier that should always fall into this category is physical security of the Cloud data centres. However, as you move up the value chain from IaaS to PaaS, the responsibility for some barriers such as regular patching of underlying infrastructure shifts from the tenant to the cloud provider. We had made a conscious decision to adopt PaaS, so we allocated the barriers on this basis;
- Shared Responsibility Implemented Jointly - this applied to a barrier that needed to be implemented jointly between the tenant and the Cloud provider. A good example of this was strict access control for tenant administrators - this was something the Cloud provider would be required to take an active interest in, for example during the on boarding process;
- Tenant - where the responsibility for a barrier rests wholly with the tenant. An important point here is that the Cloud does not release the tenant from all of their responsibilities, indeed it can increase the focus on some areas or highlight weaknesses in existing capability that needs to be developed in order to put in place adequate risk controls. A good example here is the responsibility for putting place processes to regularly back up information held in the Cloud to support "point in time" recovery;
- Shared Responsibility Implemented Independently - was assigned to barriers that we would expect to be implemented by both the tenant and the Cloud provider, but completely independently. A good here is the Vetting of Staff: you would expect both organisations to put in place standalone processes for doing this with no involvement from the other party.
Integrated management systems
We also found that many of the barriers that we already in place for our on-premise data centers - much of this due to us already being an ISO-27001 accredited organisation. We decided early on in the due diligence process that many of these on-premise barriers could be extended to or adapted for the cloud - a good example of this being the Change Management process. This enabled us to push ahead with Cloud adoption by leveraging existing policies, processes and people - in fact, in hindsight, I would be concerned if we had ended up with the Cloud operating in a organisational silo or as a "bolt on".
This ultimately led us to developing a single unified information security management system for our organization by:
- Adapting existing on-premise policies and procedures for the cloud - we desperately wanted to avoid having two discrete sets that would be cumbersome to maintain and inefficient to implement;
- Where relevant, ensuring the person responsible for each on-premise control was given support to adapt it for application to the Cloud;
- Where possible, extending the partnerships that we had with existing technology providers, software development partners, independent security consultants and auditors to cover the cloud. This was going to be a team effort, and we wanted our trusted partners to support us.
Conclusion: a comprehensive and structured approach to due diligence
The Swiss Cheese Model provided a powerful framework for our due diligence with the Cloud Providers. We used it to capture the evidence we obtained about how they implemented each barrier. It was interesting to observe how the Cloud providers were able to respond to the questions we raised. We got the sense that they were not used to being asked some of the questions, in particular where we were asking for greater insight into operational processes, or organisational /people factors. This was some time ago now, so I am sure they've been able to improve the level of information that is made available in these areas.
The risk driven, holistic approach we adopted has recently been reinforced by the Financial Conduct Authority (FCA) in the UK, who have recently published guidance for firms outsourcing to the cloud as part of their wider strategy to encourage innovation through lowering the barriers to firms entering, or considering entering, the banking sector.
In the third and final part of the series, we benchmark the Cloud against on-premise data centres.