Building a resilient Microservice? All you need is Eclipse MicroProfile Fault Tolerance!

Building a resilient microservice?

No need to ask why. Of course, everyone dreams about building a robust and resilient microservice. It can run all the time no matter where and what.

How? The obvious answer is to use Hystrix or Failsafe. However, the third party library usage will clatter your business code. You need to learn about these libraries. If you need to change some Fault Tolerance policies, you have to repackage your microservice.

Anything better out there? MicroProfile Fault Tolerance is the new future to build a resilient microservice.

What is Eclipse MicroProfile Fault Tolerance?

Eclipse MicroProfile Fault Tolerance provides a simple and flexible solution to build a Fault Tolerance microservice, which is easy to use and configurable. It offers the following Fault Tolerance policies:

  • Timeout: Define a duration for timeout.
  • Retry: Define a criteria on when to retry.
  • Fallback: provide an alternative solution for a failed execution.
  • Bulkhead: isolate failures in part of the system while the rest part of the system can still function.
  • CircuitBreaker: offer a way of fail fast by automatically failing execution to prevent the system overloading and indefinite wait or timeout by the clients.
  • Asynchronous: invoke the operation asynchronously.

The main design is to separate execution logic from execution. The execution can be configured with fault tolerance policies.

Eclipse MicroProfile Fault Tolerance introduces the following annotations for the corresponding Fault Tolerance policies:

  • Timeout
  • Retry
  • Fallback
  • Bulkhead
  • CircuitBreaker
  • Asynchronous

All you need to do is to add these annotations to the methods or bean classes you would like to achieve fault tolerance.

This project started in April 2017 with 10 amazing contributors. We have relesed MicroProfile Fault Tolerance 1.0.

MicroProfile Fault Tolerance does not contain an implementation itself but provides the specified API, TCK and documentation.

This is a list of current implementations of MicroProfile Fault Tolerance feature that are either underway or being planned:

How to use MicroProfile Fault Tolerance?

Apply Fault Tolerance annotations on the CDI bean classes or methods. Below are some examples:

1. Retry

In order to recover from a brief network glitch, Retry can be used to invoke the same operation again. The @Retry annotation is to achieve this and it can be applied to Class level or method level.

/**
* The configured the max retries is 90 but the max duration is 1000ms.
* Once the duration is reached, no more retries should be performed,
* even through it has not reached the max retries.
*/
@Retry(maxRetries = 90, maxDuration= 1000)
public void serviceB() {
    writingService();
}
/**
* There should be 0-800ms (jitter is -400ms - 400ms) delays
* between each invocation.
* there should be at least 4 retries but no more than 10 retries.
*/
@Retry(delay = 400, maxDuration= 3200, jitter= 400, maxRetries = 10)
public Connection serviceA() {
    return connectionService();
}
/**
* Sets retry condition, which means Retry will be performed on
* IOException.
*/
@Retry(retryOn = {IOException.class})
public void serviceB() {
    writingService();
}

2. Timeout

Timeout prevents from the execution from waiting forever. @Timeout is used to specify a timeout and it can be used on methods or class.

@Timeout(400) // timeout is 400ms
   public Connection serviceA() {
       Connection conn = null;
       conn = connectionService();
       return conn;
}

When a timeout occurs, a TimeoutException will be thrown.

3. CircuitBreaker

Circuit Breaker prevents repeating timeout, so that invoking dysfunctional services or APIs fail fast. Applying @CircuitBreaker on method or class level will have CircuitBreaker applied.

@CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 4, failureRatio=0.75,delay = 1000)
public Connection serviceA() {
       Connection conn = null;
       conn = connectionService();
       return conn;
}

The above code-snippet means the method serviceA applies the CircuitBreaker policy, which is to open the circuit once 3 (4x0.75) failures occur among the rolling window of 4 consecutive invocations. The circuit will stay open for 1000ms and then back to half open. After 10 consecutive successful invocations, the circuit will be back to close again. When a circuit is open, A CircuitBreakerOpenException will be thrown.

4. Bulkhead

The Bulkhead pattern is to prevent faults in one part of the system from cascading to the entire system, which might bring down the whole system. The implementation is to limit the number of concurrent requests accessing to an instance.

There are two different approaches to the bulkhead: thread pool isolation and semaphore isolation.

Semaphore style Bulkhead

Annotating a method or a class with @Bulkhead applies a semaphore style bulkhead, which allows the specified concurrent number of requests.

@Bulkhead(5) // maximum 5 concurrent requests allowed
public Connection serviceA() {
       Connection conn = null;
       conn = connectionService();
       return conn;
}

Thread pool style Bulkhead

When @Bulkhead is used with @Asynchronous, the thread pool isolation approach will be used. The thread pool approach allows to configure the maximum concurrent requests together with the waiting queue size. The semaphore approach only allows the concurrent number of requests configuration. @Asynchronous causes an invocation to be executed by a different thread.

// maximum 5 concurrent requests allowed, maximum 8 requests allowed in the waiting queue
@Asynchronous
@Bulkhead(value = 5, waitingTaskQueue = 8)
public Future<Connection> serviceA() {
Connection conn = null;
conn = connectionService();
return CompletableFuture.completedFuture(conn);
}

5. Fallback

Most previous annotations increase the success rate in method invocation. However, they cannot completely eliminate exception. Exception should still be dealt with. Often it is useful to fall back to a different operation on a dysfunctional operation. A method can be annotated with @Fallback, which means the method will have Fallback policy applied.

@Retry(maxRetries = 1)
@Fallback(StringFallbackHandler.class)
public String serviceA() {
	 return nameService();
}

In the above code snippet, when the method fails and retry reaches its maximum retry, the fallback operation will be performed. The method StringFallbackHandler.handle(ExecutionContext context) will be invoked. The return type of StringFallbackHandler.handle(ExecutionContext context) must match the return type of serviceA().

If a fallback method is declared on the same class as the method that specified with @Fallback. Use this following way to specify fallback.

@Retry(maxRetries = 2)
@Fallback(fallbackMethod= "fallbackForServiceB")
public String serviceB() {
	 counterForInvokingServiceB++;
	 return nameService();
}
private String fallbackForServiceB() {
return "myFallback";
}

The above code snippet means when the method failed and retry reaches its maximum retry, the method fallbackForServiceB will be invoked. The return type of fallbackForServiceB must be String and the argument list for fallbackForServiceB must be the same as ServiceB.

The annotations declared by MicroProfile Fault Tolerance can be used in combination.

When reading till here, you might think. So all annotations. They are all static. Is it possible to configure them without repackaging the app?

Configure MicroProfile Fault Tolerance

All of the annotation parameters are configurable. This specification directly depends on MicroProfile Config to configure the parameters.

The annotation parameters can be overwritten via config properties in the naming convention of <classname>/<methodname>/<annotation>/<parameter>. To override the maxDuration for ServiceA, set the config property

com.acme.test.MyClient/serviceA/Retry/maxDuration=3000

If the parameters for a particular annotation need to be configured with the same value for a particular class, use the config property <classname>/<annotation>/<parameter> for configuration.

For an instance, use the following config property to override all maxRetries for Retry specified on the class MyClient to 100.

com.acme.test.MyClient/Retry/maxRetries=100

Sometimes, the parameters need to be configured with the same value for the whole microservice.

For an instance, all Timeout needs to be set to 100ms. It can be cumbersome to override each occurrence of Timeout. In this circumstance, the config property <annotation>/<parameter> overrides the corresponding parameter value for the specified annotation. For instance, in order to override the maxRetries for the Retry to be 30, specify the config property

Retry/maxRetries=30

Special Feature in MicroProfile Fault Tolerance

Have you ever thought under some situations you would like to turn off Fault Tolerance? Can you achieve this by using the current third-party Fault Tolerance libraries? Probably not.

MicroProfile Fault Tolerance offers you a switch to turn off all other annotations except Fallback via the config property MP_Fault_Tolerance_NonFallback_Enabled. This feature is particular useful in some service mesh architecture e.g. Istio. Istio is a robust service mesh for microservices. The project was started by teams from Google and IBM, in partnership with the Envoy team at Lyft.

Istio offers Fault Tolerance aspects such as Retry, Circuit Breaker etc. Any microservice with Fault Tolerance integration will run into conflicts with Istio’s Fault Tolerance policies such as Retries and Timeout. For instance, if a microservice has a maxRetires configured to be 3 and Istio configured to be 5, 15 retries will be performed. This is not what you would expect. MicroProfile Fault Tolerance provides a way to solve this. Setting the property MP_Fault_Tolerance_NonFallback_Enabled with the value of false turns off all Fault Tolerance policies apart from Fallback. Therefore, the microservice with MicroProfile Fault Tolerance can utilise Istio’s Fault Tolerance without any issues.

The API and TCK jars can be found from maven central

The specification can be accessed from here.

Troubleshooting

The annotations @Asynchronous, @Bulkhead, @CircuitBreaker, @Fallback, @Retry and @Timeout are all interceptor bindings. All annotations apart from @Fallback, can bound at the class level or method level where @Fallback can only be bound at method level.

Since this specification depends on CDI and interceptors specifications, fault tolerance operations have the following restrictions:

  • Fault tolerance interceptors bindings must applied on a bean class or bean class method otherwise it is ignored,
  • invocation must be business method invocation as defined in CDI specification.
  • if a method and its containing class don’t have any fault tolerance interceptor binding, it won’t be considered as a fault tolerance operation

Where to go next?

We have released Fault Tolerance 1.0 in September 2017 and plan to work through the issues on MicroProfile Fault Tolerance repo. If you would like to see new features in the next release, please log some issues there. We have weekly hangout to discuss the design issues. Please import the MicroProfile Calendar, which can be found from MicroProfile wiki. Hope to meet you on the hangout!

About the Author

Emily Jiang

Emily Jiang
IBM