CSA CCM SEF-06
Event Triage Processes

Security-related events need to be triaged swiftly and effectively. Having well-defined processes, procedures and technical measures in place allows organizations to prioritize events based on severity and potential impact. The goal is enabling rapid analysis and engagement of incident response when required.

Where did this come from?

This control comes from the CSA Cloud Controls Matrix v4.0.10 - 2023-09-26, which you can download at https://cloudsecurityalliance.org/artifacts/cloud-controls-matrix-v4. The CCM provides a framework of security controls to help guide cloud security efforts. For more background, check out the CSA's Security Guidance v4.0 and Enterprise Architecture resources.

Who should care?

  • Security analysts with responsibility for detecting and investigating security events
  • Security architects designing monitoring and incident response capabilities
  • Security leaders accountable for the organization's security posture and incident readiness
  • Risk managers who need to understand the organization's ability to identify and contain security incidents

What is the risk?

Failing to triage security events effectively can lead to:

  • Real security incidents going undetected amidst the noise
  • Inability to prioritize the most severe and impactful events
  • Delayed incident response, allowing incidents to escalate in scope and severity
  • Overwhelmed security staff unable to keep up with event volume

Well-implemented event triage processes help manage these risks by ensuring events are quickly evaluated, prioritized, and acted upon as needed. However, triage alone does not completely eliminate these risks - it must be part of a broader incident detection and response capability.

What's the care factor?

For security teams, event triage should be a top priority. It is the critical link between detection of suspicious events and activation of incident response. Triage enables security teams to cut through the noise of thousands or millions of events and focus their efforts on the most important items.

Security leaders should also care about having effective triage, as this demonstrably improves the organization's incident readiness and can limit the blast radius of real incidents. The ability to identify the needle in the haystack is crucial.

For other roles, event triage is somewhat less visible, but still has important implications:

  • Developers should be aware that triage may surface bugs or vulnerabilities in their code
  • Business stakeholders should understand that triage is key to minimizing business disruption from incidents

When is it relevant?

Event triage processes are relevant for any organization that:

  • Generates security event logs (e.g. from servers, network devices, security tools, cloud platforms)
  • Needs to detect and respond to potential security incidents
  • Has a team of security analysts responsible for monitoring and investigations

Essentially, this means most organizations beyond a very small scale. The only exceptions may be organizations with an extremely minimal IT footprint or those who fully outsource security operations to a service provider.

What are the trade offs?

Implementing effective event triage does have some costs:

  • Upfront effort to define triage criteria, build workflows, and configure supporting tools
  • Ongoing time spent by analysts reviewing and prioritizing events
  • Potential for "false positives" where events are escalated but turn out to be benign
  • Possibility of "alert fatigue" if triage criteria are too broad

However, these costs tend to be far outweighed by the benefits of speed, focus and effectiveness in identifying real threats amongst the noise. Lack of triage is much more costly in the long run.

How to make it happen?

  1. Define criteria for event severity ratings, e.g:
    • Critical - Confirmed breach or outage, large scale/impact
    • High - Likely breach or severe risk, smaller scale
    • Medium - Unusual activity or policy violation
    • Low - Not directly risky but worth noting
    • Informational - Routine or expected activity
  2. Create a decision tree or matrix mapping event types to ratings
  3. Establish SLAs for event review, e.g. critical events reviewed within 15 mins
  4. Configure log collection from all critical systems and applications
  5. Aggregate and correlate logs to build context and identify patterns
  6. Implement automated event scoring and alerts based on the severity matrix
  7. Assign events to analysts' queues based on severity
  8. Provide analysts a case management UI to review, annotate and act on events
  9. Define clear escalation paths from triage to incident response processes
  10. Continuously measure and refine the triage criteria, models and workflow

What are some gotchas?

  • Logs must be complete, accurate and tamper-proof for triage to be meaningful
  • Triage based on narrow signatures can miss novel threat patterns
  • Mapping events to accurate severity ratings is both art and science
  • Analysts need to understand applications and data context, not just infrastructure
  • Over-reliance on AI/ML triage risks missing edge cases those models weren't trained for
  • Integrating many disparate monitoring tools is a perennial challenge

Some key technical requirements and permissions:

  • Log sources must be configured to send the right data in a compatible format - work with vendors for guidance
  • SIEM tools need read access to all relevant logs - likely needs admin rights on logging hosts/APIs
  • SOAR tools orchestrating response need to be able to open tickets, send emails, and invoke security tools - check vendor docs for exact permissions
  • Analysts need at least read-only access to security tools' management UIs to verify alerts and get context

What are the alternatives?

Some potential alternatives or complements to defined triage processes:

  • SIEM platforms with advanced correlation and anomaly detection for smarter prioritization
  • SOAR and XDR tools that automate parts of the triage and investigation workflow
  • Outsourced SOC services that handle front-line monitoring and triage
  • Threat hunting to proactively search for hidden threats

However, these options tend to be most effective when built on a foundation of well-designed triage processes rather than fully replacing that need.

Explore further

This control supports triage as part of a broader incident handling lifecycle mapped out by other controls.

I hope this article provides a useful overview and practical guidance on implementing effective event triage processes as part of a cloud security program. Let me know if you have any other questions!

Blog

Learn cloud security with our research blog