Spring WebFlux Performance Optimization

Problem Statement

Identify the reason for instance health check failure, high response time, and instability in a reactive service.

Understanding Spring WebFlux vs MVC

Spring MVC (Blocking)

Each request takes a thread and blocks it until a response is returned. This works well for traditional synchronous applications.

Spring WebFlux (Non-blocking)

Threads are used in a non-blocking manner, so threads are free to accept other requests. It works using a publisher/consumer pattern where Mono and Flux subscribe to the thread and publish the result as soon as processing completes.

Critical: The event loop is single-threaded and should not be blocked, as it can cause system instability.

Observations

Thread Profiling

EpollEventLoop was taking excessive time, likely because there weren’t enough threads for execution. By default, the event loop uses 2 threads, thereby not using all cores at full potential.

APM Tracing

WebClient was taking too long to process responses, even though the downstream identity service response time was 10ms (tp99). The solution: use boundedElastic thread (separate thread pool) for all IO operations to avoid blocking worker threads.

Instance Choking

One of the earlier started instances always choked. Changed load balancing strategy to LOR (Least Outstanding Requests) - targeting instances with fewer pending requests.

Memory Leaks

URI builder in WebClient causes memory leaks

// Bad - causes memory leak
webClient.uri(uriBuilder -> uriBuilder.path("/api").build())

// Good - prevents memory leak
webClient.uri("/api")

WebClient error handling - Response body should be released:

.onStatus(HttpStatus::isError, clientResponse -> {
    Mono<? extends Throwable> res = clientResponse
        .bodyToMono(String.class)
        .map(CustomException::new);
    clientResponse.releaseBody();
    return res;
})

Netty ByteBuf leaks - Upgraded netty version:

<netty.version>4.1.94.Final</netty.version>

ObjectMapper soft references - Disable buffer recycling:

@Bean(name = "customObjectMapper")
public ObjectMapper objectMapper() {
    JsonFactory jsonFactory = new JsonFactory()
        .configure(JsonFactory.Feature.USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING, false);
    return new ObjectMapper(jsonFactory);
}

Optimizations Applied

Thread Configuration

# Increased worker thread count (default 4)
-Dreactor.netty.ioWorkerCount=32

# Increased event loop thread count (default 2)
-Dio.netty.eventLoopThreads=4

# BoundedElastic thread pool configuration
-Dreactor.schedulers.defaultPoolSize=64
-Dreactor.schedulers.defaultBoundedElasticSize=100
-Dreactor.schedulers.defaultBoundedElasticQueueSize=50

Additional Changes

Added Redis connection pooling with command timeout
Added HTTP connection timeout
Load balancing strategy → LOR (not released in initial iteration)

Test Configuration

Instance type: m5d.xlarge
Auto-scaling enabled: min 2, max 10

Results

Before Optimization

Max RPS: 60
Frequent spikes
Instance health check failures
Max instances reached: 10

After Optimization

Max RPS: 155 (158% improvement)
Very few spikes
No unhealthy instances
Achieved 155 RPS with only 8 instances

Key Takeaways

Don’t block the event loop - Keep event loop processing time minimal
More threads ≠ better performance - WebFlux is designed to use threads in a non-blocking manner
Write non-blocking code - Use BlockHound to detect blocking calls from non-blocking threads
Async vs Reactive are different - Understand the distinction
Task separation - CPU-intensive tasks should use worker threads; IO-bound tasks should use boundedElastic threads

References

Comments & Discussion

Want to suggest corrections or improvements?

Have a correction, suggestion, or idea for improvement?

Comment below using GitHub Discussions (recommended)
Email directly via LinkedIn for detailed feedback
Open an issue on GitHub for technical corrections

All constructive feedback is welcome and helps improve the content for everyone.