Netflix lets Chaos Monkey run riot on AWS virtual machines

Netflix log in screen

Netflix has released the Chaos Monkey source code it uses to test the resiliency of its hosting provider, Amazon Web Services (AWS), to the developer community.

The source code is available to download for free from Github under an Apache license.

Chaos Monkey is used by Netflix to see how its systems would cope if some of its AWS-hosted virtual machines were randomly switched off between the hours of 9am and 3pm, Monday-to-Friday.

In particular, the app terminates virtual machines located in Auto Scaling Groups, which should respond by automatically creating a new and identical instance to replace it.

In a post on the Netflix Tech Blog, announcing the move, the company explained: “In most cases, we have designed our applications to continue working when an instance [virtual machine] goes offline.

“But in those special cases that they don't, we want to make sure there are people around to resolve and learn from any problems. With this in mind, Chaos Monkey only runs within a limited set of hours with the intent that engineers will be alert and able to respond,” the blog post continued.

The firm said it was releasing the source code to help developers guard against system crashes.

"Failures inevitably happen when least desired or expected. If your application can't tolerate an instance failure, would you rather find out by being paged at 3am or when you're in the office and have had your morning coffee,” asked the blog.

“The best defence against major unexpected failures is to fail often. By frequently causing failures, we force our services to be built in a way that is more resilient,” it added.

To date, Netflix said Chaos Monkey has been used to terminate more than 65,000 instances running within its test and development environments.

“Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don't happen again,” the blog added.

Netflix is also planning to release another member of its so-called Simian Army, an app called Janitor Monkey, in due course.

The company has spoken at length in the past about its decision to embrace the open source developer community as part of its transition from DVD mail order firm to online streaming provider.

ITPro

ITPro is a global business technology website providing the latest news, analysis, and business insight for IT decision-makers. Whether it's cyber security, cloud computing, IT infrastructure, or business strategy, we aim to equip leaders with the data they need to make informed IT investments.

For regular updates delivered to your inbox and social feeds, be sure to sign up to our daily newsletter and follow on us LinkedIn and Twitter.