Skip to main contentChat with us

ISO 27001:2022 Annex A  ·  Technological Control

A.8.14
Redundancy of information processing facilities

To keep information processing facilities operating through component, system, or site failures, at the availability levels the business has committed to.

Last reviewed: June 12, 2026  ·  Authored by TÜV SÜD & BSI Certified Lead Auditors

Control Definition

Information processing facilities must be implemented with enough redundancy to meet the organization's availability requirements. That means identifying what availability each service actually needs, designing duplicated components, systems, or sites to deliver it, and verifying that failover to the redundant elements works.

Control Objective

To keep information processing facilities operating through component, system, or site failures, at the availability levels the business has committed to.

What This Really Means

Redundancy is the practice of buying away single points of failure — and the control's most important word is "requirements". You are not asked to duplicate everything; you are asked to duplicate enough that the availability promises you have made (in customer SLAs, in your business impact analysis, in regulatory commitments) survive the failure of any one part. The design conversation therefore starts with a number, not with hardware.

Redundancy comes in three layers, each an order of magnitude apart in cost. Component level: RAID arrays, dual power supplies, dual network cards, UPS — surviving the failure of a part inside one machine. System level: clustered servers, load-balanced application tiers, replicated databases — surviving the failure of a whole machine or instance. Site level: multi-availability-zone and multi-region cloud deployments, or a secondary data center — surviving the loss of an entire facility. A sensible architecture tiers its services and spends accordingly: the revenue-bearing platform may justify multi-region, while the internal wiki gets a nightly backup and a documented tolerance for a day of downtime.

Two disciplines make the difference between real redundancy and a diagram. First, independence: redundant elements must not share a failure domain. Two "redundant" network links that enter the building through the same conduit, two VMs that land on the same physical host, two power feeds from the same substation — these fail together, and finding such shared dependencies is the actual engineering work of this control. Second, failover testing: redundancy that has never been exercised is theoretical. Pull the component, drain the node, evacuate the zone — on a schedule, with the results written down.

Keep the boundary with A.8.13 sharp, because auditors will. Redundancy keeps the service up when things break; backup gets the data back after it is lost or corrupted. Replication is not backup — it copies your deletions and your ransomware faithfully to every replica. The heart of the control at audit time: documented availability requirements per service, an architecture demonstrably matched to them, and evidence that failover has been tested rather than assumed.

Why It Matters

Availability is one third of the CIA triad, and it is the third that customers, regulators, and revenue notice first. Confidentiality failures surface in disclosure letters months later; availability failures surface on status pages within minutes. An organization that has committed to uptime in contracts — explicitly, or implicitly by being the system its users depend on — has already made redundancy promises; this control checks whether the architecture and the spend actually honor them.

The expensive failures here are rarely exotic. They are the single database instance behind a "highly available" application tier, the failover cluster that was never failed over until a real outage exposed a hardcoded IP, the two ISP links that shared a duct a backhoe found, and the DR environment three patch cycles behind production that could not take load when finally asked. Each of these is findable in advance — by mapping failure domains and by testing — which is exactly what the control requires.

Insufficient or untested redundancy exposes the organization to:

  • SLA breach and contractual penalties – committed availability percentages turn into service credits, penalty clauses, and renewal-time leverage for customers
  • Revenue and operations stoppage – for digital businesses, platform downtime is a direct revenue meter running backwards, plus the recovery cost on top
  • Hidden single points of failure – shared conduits, shared hosts, shared upstream providers make paper redundancy fail in pairs
  • Failover that fails when it matters – untested redundancy regularly collapses on small dependencies during real incidents, doubling outage duration
  • Regulatory exposure in critical sectors – financial and infrastructure regulators treat resilience as a supervised obligation, not an internal preference

Regional Compliance Context

Availability is a supervised outcome in Indian financial services: RBI master directions expect regulated entities to define recovery objectives for critical systems and prove them through periodic DR drills, and SEBI's CSCRF sets comparable resilience expectations for market intermediaries — so for BFSI workloads, site-level redundancy and its test records are inspection material, not just ISO evidence. Data-residency rules can also shape the design: where Saudi PDPL, the UAE federal PDPL, or sectoral Indian rules constrain cross-border data transfers, the failover region must satisfy the same residency conditions as the primary. A multi-region architecture that fails over into a non-compliant jurisdiction trades an availability problem for a legal one.

Implementation Guidance

1

Establish Availability Requirements per Service

Extract the real numbers from customer SLAs, the business impact analysis, and conversations with service owners: target uptime, maximum tolerable downtime, and recovery time objectives. Classify services into availability tiers (for example: critical / important / standard) and get the tiering signed off by the business — it is the basis for every spend decision that follows.

2

Map Single Points of Failure Across Each Critical Path

Walk the full path of each top-tier service: power, network entry points, hardware, hypervisors, software instances, data stores, supporting services like DNS and identity, sites, and third parties (single ISP, single cloud region, single SaaS dependency). Record every single point of failure in the risk register with an owner and a decision — eliminate, mitigate, or accept.

3

Select the Redundancy Level Each Tier Justifies

Match mechanism to tier: component redundancy (RAID, dual PSU/NIC, UPS) as a baseline for important hardware; system redundancy (clustering, load balancing, database replicas) for services that cannot wait for a rebuild; site redundancy (multi-AZ, multi-region, or a secondary facility) only where the availability requirement genuinely demands it. Document why each tier gets what it gets — proportionality is a feature, not a confession.

4

Engineer Independence Between Redundant Elements

Verify that redundant elements share no failure domain: separate availability zones, anti-affinity rules so instances never share a host, diverse network paths from different providers entering at different points, independent power feeds. Ask providers for diversity confirmation in writing where it matters — assumed independence is the classic way redundancy fails in pairs.

5

Implement Health Checks and Failover Mechanisms

Automate where the tier justifies it: load-balancer health checks, cluster quorum and automatic failover, DNS-based traffic steering, database replica promotion. Where failover is manual, write the runbook — trigger conditions, decision authority, exact steps, verification — and keep it current. Either way, alert the moment the system is running on its redundant path, because redundancy silently consumed is redundancy you no longer have.

6

Test Failover on a Schedule and Record the Results

Exercise the redundancy deliberately: pull a component, drain a node, evacuate an availability zone, switch to the DR site. Start in maintenance windows with low-risk services and grow toward production-realistic drills. Measure achieved recovery against the tier's targets, record date, scope, result, and issues, and feed fixes back into the architecture. Coordinate larger exercises with continuity testing under A.5.30.

7

Monitor, Review, and Re-Verify After Change

Track availability per service against its target and review misses. Size redundant capacity with A.8.6 in mind — the surviving half must absorb the full load, which N+1 sizing exists to guarantee. Re-run the single-point-of-failure analysis after major architecture changes, and review the redundancy tiering annually against current commitments: contracts change faster than infrastructure.

Audit Evidence

During your ISO 27001 certification audit, auditors will expect to see the following evidence to demonstrate compliance with A.8.14:

Documentation

  • Documented availability requirements and service tiering, traceable to SLAs or the business impact analysis
  • Architecture diagrams marking redundant components, failure domains, and site-level arrangements
  • Failover and DR test records with dates, scope, measured recovery times, and corrective actions
  • Availability monitoring reports comparing achieved uptime to committed targets
  • Failover runbooks for manual procedures, with version history showing they are maintained

Interviews

  • Infrastructure or platform lead on how redundancy levels were chosen and how failure domains were verified independent
  • Service or business owner on what availability was committed to customers and whether the tiering reflects it
  • On-call engineer on what actually happens when a node or zone fails — and whether practice matches the runbook

Observations

  • Cloud console or cluster configuration showing multi-AZ placement, replicas, and anti-affinity rules in effect
  • Load-balancer health checks and the alerting that fires when a service is running on its redundant path
  • Artifacts of a recent failover exercise — drill logs, chaos test output, or a live demonstration on a low-risk service

Practitioner Insights

Surendra Pal Singh

I make a habit of cross-reading customer contracts against architecture diagrams, and the mismatch is a classic management-level failure: sales has committed 99.9% availability while the production database runs as a single instance in a single zone. Certification auditors do the same cross-reading, and so do customers' due-diligence teams. Either the architecture rises to the commitment or the commitment comes down to the architecture — and if leadership consciously accepts the gap, that acceptance belongs in the risk register with a signature, not in a corridor conversation.

Surendra Pal Singh · CISO, DPO, CISA, ISO 27001, 27701, 42001 Lead Auditor
Saundhi Chauhan

In the cloud, redundancy is mostly configuration you have to deliberately switch on and then pay for — multi-AZ flags, a minimum of two instances behind a load balancer, a replica in a second zone. The implementation mistake I see most is buying the redundancy and never pulling the plug: the first real failover then trips over some small dependency nobody noticed, like a hardcoded IP, a single NAT gateway, or a license server that lived in the dead zone. Kill an instance in a maintenance window, watch what actually happens, and keep a one-page record of it. That single exercise is worth more than any diagram.

Saundhi Chauhan · ISO 27001, 27701 Lead Auditor

Common Challenges & Solutions

Challenge

Nobody has defined what availability each service actually requires, so redundancy decisions are guesswork and budget arguments.

Solution

Run a lightweight business impact analysis: for each service, ask the owner what an hour, a day, and a week of downtime costs, and what has been promised externally. Convert the answers into three or four availability tiers with explicit targets, get management sign-off, and let the tiering drive both the architecture and the spend conversation.

Challenge

Redundant elements secretly share a failure domain — both links in one conduit, both VMs on one host, both feeds from one substation — and fail together.

Solution

Treat independence as a verification exercise, not an assumption. Map the physical and logical path of every redundant pair, apply anti-affinity and zone-separation rules in virtualized and cloud environments, source network diversity from genuinely different providers and entry points, and ask suppliers to confirm diversity in writing for the paths that matter most.

Challenge

Failover has never been tested, so the first test is a real outage — and it fails.

Solution

Schedule failover exercises like any other control activity: quarterly or semi-annual drills for top-tier services, starting with low-risk components in maintenance windows and maturing toward zone-evacuation or DR-switchover exercises. Document each test's measured recovery time and defects, and fix the defects before scaling the next drill up.

Challenge

Site-level redundancy for everything is unaffordable, and cost pressure threatens to delete redundancy where it is genuinely needed.

Solution

Spend by tier. Reserve multi-region or secondary-site arrangements for the services whose availability requirements prove the need; for lower tiers, accept measured downtime with a documented risk acceptance and rely on tested restores under A.8.13 instead. An explicit, signed decision to tolerate downtime on the wiki is good governance; an implicit single point of failure on the payment platform is the finding.

Challenge

Redundancy decays as the estate changes — new services launch single-instance, and yesterday's resilient architecture quietly grows new single points of failure.

Solution

Gate it through change management (A.8.32): every new service declares its availability tier and the redundancy that tier mandates before go-live, using reference architectures with redundancy defaults built in. Re-scan for single points of failure after major changes and at least annually, and alert on configurations that drift below tier requirements.

Frequently Asked Questions

What is the difference between A.8.13 (backup) and A.8.14 (redundancy)?
Redundancy keeps the service available through a failure — a second instance, zone, or site takes over so users barely notice. Backup recovers information after it is lost, deleted, or corrupted — a point-in-time copy you can roll back to. Replication is redundancy, not backup: it propagates deletions and ransomware to every replica within seconds. Most organizations need both, scoped by availability requirements on one side and recovery-point requirements on the other.
Does running in AWS or Azure automatically satisfy A.8.14?
No. Cloud providers make redundancy available; they do not make it automatic. A single VM in a single availability zone is a single point of failure regardless of how resilient the provider's data centers are. Satisfying the control means deliberately using the primitives — multiple instances, multi-AZ placement, replicas, health-checked load balancing — where your availability requirements demand them, and being able to show the configuration.
Do small companies need a DR site or multi-region deployment?
Only if their availability requirements say so — the control demands redundancy sufficient for your commitments, not maximal redundancy. Many SMBs legitimately conclude that multi-AZ within one region, plus tested backups and a documented tolerance for rare regional outages, matches what they have promised customers. What auditors challenge is not modest architecture; it is the absence of any documented availability requirement behind the architecture.
Is RAID enough to satisfy this control?
RAID addresses exactly one failure mode: a disk dying inside one machine. It does nothing for server failure, software failure, site loss, or accidental deletion. Whether it is "enough" depends on the documented availability requirement of the system in question — for a tolerant internal workload it may be, for a committed-SLA service it almost certainly is not. RAID is also not backup, a conflation auditors specifically probe.
How often should failover be tested?
The standard sets no frequency — let your availability tiering set it. A common pattern is quarterly or semi-annual failover exercises for critical services and at least annual tests elsewhere that redundancy is claimed, plus re-testing after significant architecture changes. Automated failover still needs periodic deliberate exercise — health checks verify detection, not the system's ability to actually carry production load on the surviving path. Keep dated records with measured recovery times.
Does A.8.14 cover people and suppliers, or only hardware?
The control is aimed at information processing facilities — infrastructure, systems, and sites — but availability planning done well looks wider. Key-person dependencies, single-supplier dependencies, and utilities are addressed through related controls and continuity planning under A.5.30 rather than through this control's architecture work. In practice, your single-point-of-failure analysis should still record them, then route them to the right owner.

Written By Expert Auditors

Saundhi Chauhan
Saundhi Chauhan
Lead Auditor
ISO 27001 Lead AuditorISO 27701 Lead Auditor
Surendra Pal Singh
Surendra Pal Singh
Chief Information Security Officer & Data Protection Officer
CISODPOCISAMCSEITILISO 27001 Lead AuditorISO 27701 Lead AuditorISO 42001 Lead Auditor
Last reviewed: June 2026Content verified by certified lead auditors

Get in touch

Book a free consultation or send us your requirements. We respond within 24 hours.

Quick Call

Pick a time slot

Send Requirements

Get a custom quote in 24 hours

We're Online

⚠️ Business inquiries only. Personal email addresses will be rejected.

24hr Response
Free Consultation
No Obligations