Design for failure

Everything fails at some point. We can’t stop that, but what can do is think of the application as the sum of its parts. Knowing and understanding those parts will help us design for failure.

Thinking about failure at the beginning of a project causes recovery strategies to be part of the design process

You need to understand where

An example single point of failure in the application would
- Hosting your application and database on the same machine instance. If the machine instance fails the entire application fails
- A solution would be to run the application and database on their own machine instances

An initial design would be

But here we see the single point of failure

So we would rearchitect as follows

Because we have deployed the application and database on their own machines instances we can now bring into play tools to help us scale the application as whole

Now the system can tolerate an application failure without their being a system-wide failure.

But we still have single points of failure, the database, and the load balancer.

Using AWS services we could rearchitect to make the application as a whole more robust