Modern applications process their work in multiple thread pools. Knowing where they are and how to set them up correctly is crucial for making sure applications behave well in production. We observe that stability and peformance issues are often a result of mistakes in this area.

A typical server-side application processes work in multiple thread pools. Depending on your technology choices, things may start in an I/O pool (for reading from and writing to sockets) and worker pool (for request handling) managed by your web server. You may also introduce additional thread pools either directly by yourself or through popular tools like Hystrix or RxJava.

It’s important to be aware of all the thread pools on the request handling path, as well as which parts of the code are executed in which pools. Misconfigured thread pools can easily become performance bottlenecks or affect application stability.

Thread pools are most commonly implemented with a task queue in front of them, so when all the threads are busy at the time the task is submitted, it can wait for execution. If tasks arrive faster than they can be executed, this queue will keep increasing in size.

Many commonly used thread pools, e.g. Executors.newFixedThreadPool() or those used by Spring’s ThreadPoolTaskExecutor, use a LinkedBlockingQueue as their task queue implementation. Such a queue is not limited in size. If tasks are consumed slowly, it can grow in size until it fills all the available memory and causes the application instance to crash with an OutOfMemoryError. This in turn can increase the load on the remaining application instances and make them crash even faster.

Queues should be limited in size. If a certain number of tasks already wait to be executed, any new arriving tasks should be quickly rejected, e.g. with an error response served to the caller. This way an overloaded service can correctly process some part of the requests, as opposed to crashing and not processing any at all. Watch out for retries performed by the caller though! Clients shouldn’t retry on errors returned this way, as it would increase the load of the already struggling service.

You can bound your queues e.g. by using an ArrayBlockingQueue instead of the default LinkedBlockingQueue. Here’s an example using raw JDK API:

ThreadPoolExecutor executorService =
new ThreadPoolExecutor(..., ..., ..., ..., new ArrayBlockingQueue(100));


Attempt to submit a task to a full queue will cause a RejectedExecutionException to be thrown by default. You can override this behaviour by providing a custom RejectedExecutionHandler implementation when constructing a ThreadPoolExecutor.

Applications commonly provide multiple functionalities, some of which are more critical than others. Problems with one of the functionalities shouldn’t cause unavailability of others. It may happen though, if functionalities execute on shared thread pools.

Issues with one of the functionalities, e.g. increased processing time due to an increased latency of one of the dependencies, may cause the pools on which this functionality is operating to saturate. If some healthy functionalities need to execute on the same pools, they are denied access to resources and become unavailable as well. This way an issue with some less important functionality may cause an outage of a crucial one.

The solution is to keep resource pools separate, especially for the crucial functionalities. In practice, this means each asychronous processing job should be done in its dedicated thread pool. If you use a technology which abstracts asynchronous processing for you, make sure you are aware where the thread pools are exactly and if they are shared.

Take Spring’s @Async annotation for example, which delegates the execution of a method to a task executor. If you annotate a method without specifying the task executor on which it should be executed, then the default one will be used – the same for all the methods annotated this way. Specifying a dedicated task executor for each method can mitigate this problem.

Default thread names in pools from the JDK make it hard to tell which pool they belong to. When you are in the middle of investigating an outage, e.g. viewing a thread dump to discover the state of an application, that’s a piece of information you really need.

Guava makes naming threads very easy by providing a ThreadFactoryBuilder:

ThreadFactory namedThreadsFactory = new ThreadFactoryBuilder().setNameFormat("my-pool-%d").build;


With threads configured this way it’s going to be easy to discover e.g. that all the threads from my-pool are blocked by some I/O operation.

It’s hard to reason about performance bottlenecks or to locate the causes of stability issues without proper metrics. You should measure at least the number of active threads in the pool and the task queue size. Here’s how you could do it:

MetricRegistry registry = ...;  // e.g. when using dropwizard-metrics
BlockingQueue<Runnable> q = new ArrayBlockingQueue<>(...);

// we can measure either queue utilization ...
(Gauge<Double>) () -> {q.size() / (double) (q.size() + q.remainingCapacity())}
);

// ... or queue size