CSA CCM IPY-02
Application Interface Availability

Cloud Service Providers should offer APIs that allow their customers to programmatically retrieve their data in a secure way. This enables interoperability between systems and portability of applications and data across environments. Proper documentation must be maintained and shared with customers as APIs are updated.

Where did this come from?

This control comes from the CSA Cloud Controls Matrix v4.0.10 released on 2023-09-26. The full CCM can be downloaded from https://cloudsecurityalliance.org/artifacts/cloud-controls-matrix-v4. The CCM provides a comprehensive set of baseline security controls to help assess cloud computing risk. This specific control falls under the Interoperability & Portability domain.

For more background, check out the AWS Whitepaper on Migrating Applications to the Cloud which discusses portability considerations.

Who should care?

  • Cloud architects designing multi-cloud solutions that need interoperability
  • DevOps engineers responsible for migrating applications between cloud providers
  • Security professionals assessing vendor lock-in risks
  • Compliance officers ensuring data can be retrieved if a provider is changed

What is the risk?

Without API-based access to data, organizations face:

  • Inability to migrate applications to a different provider resulting in vendor lock-in
  • Lack of interoperability preventing integration of best-of-breed services
  • Potential data loss if a provider shuts down and data can't be easily exported

While not extremely likely, the impact of these risks is high. Loss of application portability can result in business continuity issues and large switching costs if a provider change is required.

What's the care factor?

For organizations with a multi-cloud strategy, the ability to programmatically retrieve data for interoperability is critical. It should be a key factor in provider selection.

However, for simple deployments using a single provider, it may be less of an immediate concern, though still important for long-term risk management. At a minimum, there should be a manual process to retrieve a complete copy of the data.

When is it relevant?

This control is most applicable when:

  • Deploying applications across multiple clouds
  • Integrating cloud services from different providers
  • Performing cloud migrations
  • Maintaining a cloud exit strategy

It's less of a concern for:

  • Monolithic apps deployed to a single cloud with no plans to move
  • Temporary applications that don't store any long-term data

What are the trade-offs?

Implementing data retrieval APIs has some costs:

  • Upfront and ongoing engineering effort to build and maintain the APIs
  • Potential performance impact from customers retrieving large data sets
  • Security risks from exposing data APIs that need to be mitigated
  • Increased complexity for customers learning and integrating the APIs

However, these are usually outweighed by the risk mitigation benefits of avoiding lock-in to a single provider. It's a table-stakes capability for enterprise cloud services.

How to make it happen?

The exact steps depend on the specific cloud provider and data stores being used. But at a high-level:

  1. Identify the key data stores and services where customer data is persisted (databases, object storage, file systems, etc.)
  2. Design RESTful APIs to query and retrieve each type of data
    • Provide filtering capabilities to retrieve subsets of data
    • Support pagination for large datasets
    • Use industry-standard data formats like JSON and XML
  3. Implement authentication and access controls on the APIs
    • Each customer must only have access to their own data
    • Use short-lived access tokens rather than long-term credentials
  4. Provide SDKs in popular languages to simplify API integration
  5. Write comprehensive API documentation with code samples
  6. Implement a versioning strategy for evolving APIs over time
  7. Setup CI/CD to deploy and manage API endpoints
  8. Run security tests on APIs to catch vulnerabilities
  9. Setup usage monitoring and alerting to detect abuse
  10. Have a process to notify customers of API updates and deprecations

For example, to provide access to data in AWS S3, you could:

  • Use the S3 REST API to list objects and retrieve object contents
  • Integrate the AWS SDK to handle authentication with IAM access keys
  • Leverage S3 Select to filter the contents of objects and retrieve a subset of data
  • Use Bucket Policies and IAM Roles to control access at a granular level

What are some gotchas?

There are a few things to watch out for when implementing data retrieval APIs:

  • Pagination is crucial to handle large datasets that can't be retrieved in a single call. Implement continuation tokens to page through results.
  • Throttle API calls to prevent abuse and denial of service. Enforce rate limits per customer.
  • Be mindful of compliance requirements like GDPR when providing access to personal data. Restrict access to sensitive fields.
  • API keys and access tokens must be carefully protected. Never send them in URLs. Use TLS encryption for API calls.

Some specific AWS gotchas:

  • The EC2 instance calling the APIs must have an IAM Role with permissions like s3:GetObject and s3:ListBucket to access S3 objects
  • By default, S3 Select only supports retrieving the first 128MB of an object. Use range requests to retrieve larger objects in chunks.
  • Glacier and Glacier Deep Archive have retrieval times of minutes to hours. Customers need to initiate a retrieval job first and poll for completion before getting the data.

What are the alternatives?

If real-time API access isn't feasible, consider:

  • Providing an export tool to let customers download their data on-demand
  • Generating periodic data exports (e.g. daily) in a standard format that customers can download
  • Allowing customers to ship you an encrypted hard drive to load their data on for bulk export

However, these all introduce additional friction compared to API-based access.

Explore further

Hopefully this gives you a solid foundation for working with cloud APIs to enable interoperability and portability! Let me know if you have any other questions.

Blog

Learn cloud security with our research blog