CSA CCM DCS-14
Secure Utilities

Hey there! Let's talk about an important but often overlooked aspect of datacenter security - keeping the lights on and the servers humming. The Cloud Security Alliance has some great advice in their Cloud Controls Matrix about securing, monitoring, and testing utility services like power, water, internet and telecom to make sure your datacenter stays up and running no matter what.

Where did this come from?

This little gem of wisdom comes straight from the CSA Cloud Controls Matrix v4.0.10 released on 2023-09-26. You can download the full document chock-full of other useful cloud security tips at https://cloudsecurityalliance.org/artifacts/cloud-controls-matrix-v4. The CSA folks are the real MVPs when it comes to practical guidance on securing cloud services.

Who should care?

This one is critical for datacenter managers and engineers responsible for keeping critical IT infrastructure operational. If you're the one who gets the 2am call when half the datacenter goes dark, this control is definitely relevant to your interests!

What is the risk?

A datacenter without reliable utilities is like a car without gas - it ain't going nowhere fast. Losing power, cooling, or connectivity can bring services grinding to a halt, potentially for extended periods. Besides the immediate downtime, utility failures can also physically damage sensitive equipment. The consequences can range from angry customers to catastrophic data loss. A robust utility security program can prevent many of these issues and help quickly restore service when the inevitable problems do occur.

What's the care factor?

On a scale from "meh" to "mega", utility security is definitely up there for anyone responsible for datacenter uptime and service delivery. While perhaps not as sexy as the latest zero-day, investing effort in this area has a major ROI in terms of risk reduction. Don't be the team that only realizes the diesel generator was out of fuel after an outage! A bit of proactive TLC for core utilities is well worth it.

When is it relevant?

Any datacenter providing infrastructure for critical workloads should take utility security seriously. This is especially vital for cloud and hosting providers where an outage impacts many downstream customers. That said, the effort should be commensurate with the importance and sensitivity of the supported services. A small dev/test environment can likely get away with less formality than a Tier IV facility hosting financial systems.

What are the trade offs?

Utility security and redundancy doesn't come free. Provisioning backup generators, redundant network links, and onsite sparing all costs money. Monitoring and maintaining these systems also takes staff time and effort. At the extreme end, providing true 2N redundancy, where the entire datacenter can run indefinitely on alternate utilities, can massively increase build and operating costs. The key is striking the right balance of reliability vs expense for the anticipated workloads.

How to make it happen?

At a high level, implementing DCS-14 involves:

  1. Identifying all critical datacenter utilities (power, cooling, WAN links, etc).
  2. Determining the required redundancy and failover for each based on uptime targets.
  3. Deploying monitoring to track the health of each utility system. This could be anything from simple ping checks to integration with building management systems.
  4. Establishing maintenance procedures and schedules to keep spare components and redundant systems ready to go.
  5. Regularly testing failover between primary and backup systems. Document the results and fix any issues.
  6. Reviewing utility risks and planning in the context of wider business continuity activities.

What are some gotchas?

Some of the main items that trip up utility security efforts include:

  • Monitoring gaps - Forgetting to track key metrics like generator fuel levels or UPS battery health.
  • Untested failover - Redundant components that don't work when called upon due to lack of testing.
  • Circular dependencies - Backup systems that themselves depend on the primary utilities.
  • Outdated contacts - No longer valid escalation paths to utility providers for emergencies.
  • Missing spares - Not stocking enough spare parts for equipment with long lead times.

Be sure to verify the monitoring system has sufficient permissions to collect health info from all utility gear. For example, SNMP polling might require read-only community strings on network switches.

What are the alternatives?

For smaller deployments, relying on the datacenter provider's utility SLAs may be sufficient rather than implementing your own extensive monitoring and redundancy. However, be sure to dig into the details of what is actually guaranteed.

Another option is leveraging cloud hosting with multiple availability zones for inherent utility redundancy. However, this still requires configuration and testing to ensure seamless failover between zones. Major public clouds also still suffer widespread outages on occasion.

Explore further

For more info on keeping the lights on, check out:

Hopefully this quick tour of utility security has sparked some ideas on how to harden your own datacenter services. Stay vigilant and keep those servers humming! And as always, remember: friends don't let friends skip redundant power supplies.

Blog

Learn cloud security with our research blog