Tools for performing security chaos engineering testing
When Netflix launched its streaming service almost 15 years ago, performance issues were common. To build resilience as its customer base grows, the company has developed Chaos Engineering, a discipline that tests systems to determine their fault tolerance under unstable circumstances.
Today, this methodology is being adapted to a security context. While the engineering of security chaos is still in its infancy, many industry professionals are interested in its potential, and there are a few tools available in the market.
âCyber ââsecurity teams don’t always have the right situational awareness of how systems are inter-related internally. [Security chaos engineering] is incredibly valuable to security teams because it would give teams a better insight into their environment and what the tools are doing, âsaid Jeff Pollard, analyst at Forrester Research.
For businesses and security teams interested in security chaos engineering tools, here are a few options to consider.
Verica, co-founded by Aaron Rinehart and Casey Rosenthal in January 2019, is one of the few tools dedicated to engineering security chaos. Rosenthal led the Chaos Engineering team at Netflix, and Rinehart used Chaos Engineering for security when he was Chief Security Architect at UnitedHealth Group (UHG).
Verica’s platform can be used in the cloud or on-premises. It uses âcontinuous verificationâ to manage experiences around uptime and security. The platform is based on Netflix’s Chaos Automation Platform (ChAP) and integrates with Kafka and Kubernetes.
ChaoSlingr was one of the first security engineering tools available. Rinehart helped develop the tool when he was at UHG. Code for the open source tool, which was deprecated after Rinehart left UHG, is available on GitHub for companies to write their own experiments.
ChaoSlingr is made up of the following four AWS Lambda functions written in Python:
- Generatr identifies what will be affected by the failure.
- Slingr injects failure.
- Trackr provides event logs.
- The description of the experiment provides information about the tests.
Kelly Shortridge, Senior Director of Product Technology at Fastly, designed the Deciduous Security Tree Generator, which provides a design phase for engineering the security chaos. âHarness the scientific method, which is the very essence of chaos engineering, and come up with hypotheses,â Shortridge said.
Creating security decision trees enables security teams to effectively threaten model systems. In a context of engineering chaos, they help teams map how tools and systems are supposed to work versus how they actually work. Using Deciduous, security teams can visualize the actions of potential attackers and defenders’ mitigation actions graphically.
Adapting chaos engineering tools for security
Businesses should also consider using existing chaos engineering tools in security scenarios.
âA lot of existing chaos engineering tools are conducting uptime experiments,â Shortridge said. “In theory, you can reuse some of them for security, for example to simulate a distributed denial of service attack or excessive traffic.”
âProduction systems frequently experience varying levels of degradation and misconfiguration,â said Jim Scheibmeir, analyst at Gartner. Fortunately for businesses, he said, the majority of the chaos engineering tools providing this hypothesis-driven experimentation are open source.
For example, Netflix offers the following suite of tools that businesses can customize to suit their needs:
- Chaos Monkey is an open source tool that introduces random failures into applications. Netflix uses the tool to turn its servers on and off randomly in order to observe the resulting behavior.
- Chaos-Kong takes Chaos Monkey to the next level. This Simulates the deactivation of entire AWS Regions to help engineers discover and resolve systemic issues before any real failures occur.
- ChAP tests the system for failures at the microservice level.
Chaos Toolkit is another open source chaos engineering project that can be adapted for security. The extensible tool allows developers to create and automate experiences for their specific use cases. Developers can implement Chaos Toolkit through Python functions, HTTP requests, or separate processes. With pre-written extensions, developers can connect to a variety of systems via Open API.
Write custom Python or Bash scripts
If existing security chaos engineering tools aren’t suitable, another option is to create your own. Security teams can use Python and Bash to write custom scripts to introduce failures into specific systems and know exactly where the issues are occurring. Custom scripts also make it easier to restore the system from experiments.
The future of security chaos engineering tools
Because the concept is still in its infancy, there aren’t many security chaos engineering tools out there. Expect to see more as the subject takes off.
Rinehart said a forthcoming tool developed by software engineer Matas Kulkovas will run Kubernetes-specific experiments for security resiliency.
Researchers at the University of Potsdam in Germany published a 2020 article detailing CloudStrike, a tool designed to test the resilience of security in cloud infrastructure. It uses security chaos engineering techniques to help security teams detect configuration errors and availability issues in AWS and Google Cloud Platform. The tool has not yet been released.