Offload IO Intensive tasks to worker threads

In most of the modern web REST implementations, much of the task is spent waiting on IO, for example waiting for data to be fetched from database or through remote REST service.

When the application receives high traffic on a thread based application server, such as Tomcat, much of its threads are just waiting for data to come from database or remote REST service.

This is inefficient because, you cannot increase the number of threads vertically in a single compute. Theoretically you can, but its just pure inefficient because most of those threads will be just blocked.

Thus, in order to scale such applications, the better model is to offload this waiting to pool of worker threads. This will free up the Tomcat http worker threads(“http-nio-n”) to recieve more incoming requests.

This can be easily achieved using DeferredResult

A typical implementation in Java / Spring looks like this:

public DeferredResult<ResponseEntity<String>> getDeferredResult() {
        CompletableFuture<String> completableFuture = deferredResultService.getDelayedResult();
        DeferredResult<ResponseEntity<String>> deferredResult = new DeferredResult<>();
        completableFuture.thenAccept(result -> {
            if (deferredResult.hasResult()) {
            deferredResult.setResult(new ResponseEntity<>(result, HttpStatus.OK));
        completableFuture.exceptionally(e -> {
            return null;
        return deferredResult;

DeferredResultService instead of returning the actual result, just returns the future, whose actual implementation happens in a worker thread

If we run benchmark this API using ab by firing 100 requests concurrently, we see that it consumers only ~30 http-nio-n threads.

PS D:\apache\Apache24\bin> ./ab -n 100 -c 100 http://localhost:8080/deferred
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,

Benchmarking localhost (be patient).....done

Server Software:
Server Hostname:        localhost
Server Port:            8080

Document Path:          /deferred
Document Length:        8 bytes

Concurrency Level:      100
Time taken for tests:   20.030 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      14000 bytes
HTML transferred:       800 bytes
Requests per second:    4.99 [#/sec] (mean)
Time per request:       20029.525 [ms] (mean)
Time per request:       200.295 [ms] (mean, across all concurrent requests)
Transfer rate:          0.68 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.3      0       1
Processing: 10008 10014   3.9  10015   10022
Waiting:    10004 10014   4.0  10015   10022
Total:      10008 10014   3.9  10015   10022

Percentage of the requests served within a certain time (ms)
  50%  10015
  66%  10016
  75%  10017
  80%  10017
  90%  10020
  95%  10021
  98%  10022
  99%  10022
 100%  10022 (longest request)

Check out the demo project in my github repository