Resilience Patterns for Spring Boot APIs

📚 Series Navigation:
← Previous: Part 4 - The Data Foundation
👉 You are here: Part 5 - When the World Breaks
Next: Part 6 - Cache Me If You Can →

📋 Introduction

You've built a beautiful microservice. Your architecture is layered. Your database is migrated. Your API documentation is pristine. Then Monday morning happens.

The external weather API goes down. Not completely — that would be easy to handle. No, it's doing something far worse: responding to 60% of requests normally, timing out on 30%, and returning garbage data on the remaining 10%. Your service is now hammering a sick API with retry storms, your thread pool is full of connections waiting for timeouts, and your users are getting a random grab bag of stale data, errors, and infinite loading spinners.

Welcome to distributed systems. The question isn't whether external dependencies will fail — it's when, and whether your service will survive it.

In this article, we'll explore how the Weather Microservice integrates with the WeatherAPI.com external service using Spring's modern RestClient, and protects itself using Resilience4j's three most powerful patterns: circuit breakers, retries, and rate limiters. ☕

🛡️ The SHIELD Framework: Six Pillars of Resilient Integration

Meet SHIELD — six principles for surviving external API failures:

Letter	Principle	What It Means
S	Smart HTTP Client	Modern RestClient with virtual threads and timeouts
H	Handled Failures	Every error path has a defined behavior
I	Intelligent Retries	Exponential backoff, not retry storms
E	Event Recording	Failures are logged and metered for observability
L	Limited Throughput	Rate limiting prevents API quota exhaustion
D	Degraded Service	Circuit breakers provide fast failure when APIs are down

🌐 The HTTP Client: Modern RestClient with Virtual Threads

The Weather Microservice uses Spring's RestClient (introduced in Spring 6.1) instead of the older RestTemplate:

@Configuration
public class RestClientConfig {

  @Value("${weather.api.timeout:5000}")
  private long timeout;

  @Value("${weather.api.base-url}")
  private String baseUrl;

  @Bean
  public RestClient weatherRestClient() {
    HttpClient httpClient =
        HttpClient.newBuilder()
            .connectTimeout(Duration.ofMillis(timeout))
            .executor(Executors.newVirtualThreadPerTaskExecutor())
            .build();

    JdkClientHttpRequestFactory requestFactory = new JdkClientHttpRequestFactory(httpClient);
    requestFactory.setReadTimeout(Duration.ofMillis(timeout));

    return RestClient.builder()
        .baseUrl(baseUrl)
        .requestFactory(requestFactory)
        .defaultHeader("Accept", "application/json")
        .build();
  }
}

Three critical design decisions:

1. Virtual Thread Executor

.executor(Executors.newVirtualThreadPerTaskExecutor())

The HTTP client uses virtual threads for I/O operations. While a traditional thread pool would block platform threads during API calls (waiting for network responses), virtual threads yield when blocked on I/O. This means thousands of concurrent API calls without thread pool exhaustion.

2. Dual Timeouts

.connectTimeout(Duration.ofMillis(timeout))  // Connection establishment
requestFactory.setReadTimeout(Duration.ofMillis(timeout));  // Response reading

Two separate timeouts protect against different failure modes:

Connect timeout — How long to wait for TCP connection establishment
Read timeout — How long to wait for the response after connecting

Both default to 5 seconds. Without these, a hung API would block your threads indefinitely.

3. JDK HttpClient Instead of Apache

The Weather Microservice uses Java's built-in HttpClient (Java 11+) instead of Apache HttpClient. Benefits:

Native virtual thread support
No additional dependency
HTTP/2 support built-in
Modern, fluent API

🔌 The API Client: Three Resilience Patterns in One Class

The WeatherApiClient is where resilience patterns come together:

@Slf4j
@Component
public class WeatherApiClient {

  private final RestClient restClient;
  private final String apiKey;

  public WeatherApiClient(
      RestClient weatherRestClient,
      @Value("${weather.api.key}") String apiKey) {
    this.restClient = weatherRestClient;
    this.apiKey = apiKey;
  }

  @CircuitBreaker(name = "weatherApiCurrent", fallbackMethod = "getCurrentWeatherFallback")
  @Retry(name = "weatherApiCurrent")
  @RateLimiter(name = "weatherApiCurrent")
  public WeatherApiResponse getCurrentWeather(String location) {
    log.debug("Fetching current weather for location: {}", location);

    try {
      WeatherApiResponse response = restClient.get()
          .uri(uriBuilder -> uriBuilder
              .path("/current.json")
              .queryParam("key", apiKey)
              .queryParam("q", location)
              .queryParam("aqi", "no")
              .build())
          .retrieve()
          .onStatus(HttpStatusCode::is4xxClientError, (request, clientResponse) -> {
              throw new WeatherApiException("Invalid location or API request: " + location);
          })
          .onStatus(HttpStatusCode::is5xxServerError, (request, serverResponse) -> {
              throw new WeatherApiException("Weather API server error");
          })
          .body(WeatherApiResponse.class);

      if (response == null) {
        throw new WeatherApiException("Failed to fetch weather data: empty response");
      }

      log.info("Successfully fetched current weather for: {}", location);
      return response;

    } catch (WeatherApiException e) {
      throw e;
    } catch (RestClientException e) {
      throw new WeatherApiException("Failed to fetch weather data: " + e.getMessage(), e);
    }
  }
}

Annotation Stack: Order Matters

@CircuitBreaker(name = "weatherApiCurrent", fallbackMethod = "getCurrentWeatherFallback")
@Retry(name = "weatherApiCurrent")
@RateLimiter(name = "weatherApiCurrent")

These three annotations create a layered defense:

Request → RateLimiter → CircuitBreaker → Retry → Actual API Call
                ↑              ↑            ↑
           Check quota    Check state    Attempt call
           (50/min)       (open/closed)  (up to 3 times)

The execution order (innermost to outermost) is:

Rate Limiter checks if we're within API quota
Circuit Breaker checks if the API is healthy
Retry attempts the call up to 3 times with exponential backoff
Fallback activates when all retries and circuit breaker fail

Error Handling Strategy

.onStatus(HttpStatusCode::is4xxClientError, (request, clientResponse) -> {
    throw new WeatherApiException("Invalid location or API request: " + location);
})
.onStatus(HttpStatusCode::is5xxServerError, (request, serverResponse) -> {
    throw new WeatherApiException("Weather API server error");
})

4xx and 5xx errors are converted to WeatherApiException — the application's domain exception. This lets the circuit breaker track failures using a consistent exception type.

The catch block at the bottom handles connection failures, timeouts, and other transport-level errors:

catch (RestClientException e) {
    throw new WeatherApiException("Failed to fetch weather data: " + e.getMessage(), e);
}

🔄 Circuit Breaker: The Intelligent Fuse

The circuit breaker pattern prevents your service from hammering a sick API. Think of it as an electrical fuse — when too many failures occur, it "trips" and stops sending requests.

Configuration

resilience4j:
  circuitbreaker:
    configs:
      default:
        failureRateThreshold: 50
        minimumNumberOfCalls: 5
        waitDurationInOpenState: 10s
        permittedNumberOfCallsInHalfOpenState: 3
        slidingWindowSize: 10
        slidingWindowType: COUNT_BASED
        slowCallRateThreshold: 60
        slowCallDurationThreshold: 3s
        recordExceptions:
          - com.weatherspring.exception.WeatherApiException
          - java.io.IOException
          - java.util.concurrent.TimeoutException
    instances:
      weatherApiCurrent:
        baseConfig: default
        failureRateThreshold: 50
        waitDurationInOpenState: 15s
        minimumNumberOfCalls: 10

State Machine

     +----------------------+
     |       CLOSED         |  Normal operation
     |  (requests pass      |  Tracking failures in sliding window
     |   through)           |
     +----------+-----------+
                | Failure rate > 50%
                | (after 10 minimum calls)
                ▼
     +----------------------+
     |        OPEN          |  All requests fail immediately
     |  (fast failure)      |  No API calls made
     +----------+-----------+
                | After 15 seconds
                ▼
     +----------------------+
     |     HALF-OPEN        |  3 test requests allowed
     |  (testing recovery)  |  If they succeed → CLOSED
     +----------+-----------+  If they fail → OPEN
                |
                ▼
         Success? → CLOSED
         Failure? → OPEN

Key settings explained:

Setting	Value	Meaning
`failureRateThreshold: 50`	50%	Open circuit when half of calls fail
`minimumNumberOfCalls: 10`	10	Need 10 calls before evaluating failure rate
`waitDurationInOpenState: 15s`	15s	Stay open for 15 seconds before testing
`permittedNumberOfCallsInHalfOpenState: 3`	3	Allow 3 test calls in half-open state
`slidingWindowSize: 10`	10	Evaluate last 10 calls
`slowCallDurationThreshold: 3s`	3s	Calls > 3 seconds count as slow
`slowCallRateThreshold: 60`	60%	Open circuit when 60% of calls are slow

Different Instances for Different Endpoints

instances:
  weatherApiCurrent:
    waitDurationInOpenState: 15s
    minimumNumberOfCalls: 10
  weatherApiForecast:
    waitDurationInOpenState: 20s
    minimumNumberOfCalls: 10
    slowCallDurationThreshold: 5s

The forecast endpoint has a longer slowCallDurationThreshold (5s vs 3s) because forecast responses contain more data and naturally take longer. A separate circuit breaker means a failing current-weather endpoint doesn't prevent forecast requests from working.

The Fallback Method

private WeatherApiResponse getCurrentWeatherFallback(String location, Throwable throwable) {
    log.error("Circuit breaker fallback triggered for getCurrentWeather. "
        + "Location: {}, Error: {}", location, throwable.getMessage());
    throw new WeatherApiException(
        "Weather service is currently unavailable. Please try again later. "
        + "Location: " + location, throwable);
}

When the circuit breaker is open or all retries are exhausted, the fallback method is called. Here it throws a WeatherApiException with a user-friendly message, which the GlobalExceptionHandler converts to a 503 Service Unavailable response.

🤔 Why not return cached data from the fallback? That's a valid pattern, but the Weather Microservice keeps the fallback simple — it reports the failure. The caching layer (Part 6) independently serves cached data if available. Separating caching and fallback concerns makes each easier to reason about.

🔁 Retry: The Persistence Pattern

When an API call fails, it might be a transient issue — a network blip, a temporary server overload. Retries give the API a chance to recover:

Configuration

resilience4j:
  retry:
    configs:
      default:
        maxAttempts: 3
        waitDuration: 1s
        enableExponentialBackoff: true
        exponentialBackoffMultiplier: 2
        retryExceptions:
          - com.weatherspring.exception.WeatherApiException
          - java.io.IOException
    instances:
      weatherApiCurrent:
        maxAttempts: 3
        waitDuration: 500ms
      weatherApiForecast:
        maxAttempts: 2
        waitDuration: 1s

Exponential Backoff Timeline

Attempt 1: Immediate
  ↓ fail
Wait 500ms
Attempt 2:
  ↓ fail
Wait 1000ms (500ms × 2)
Attempt 3:
  ↓ fail
→ Fallback triggered

The wait duration doubles after each failure. This is critical because:

Without backoff: 3 retries in 100ms = retry storm on an already struggling server
With exponential backoff: Progressively longer waits give the server time to recover

Forecast Gets Fewer Retries

weatherApiForecast:
  maxAttempts: 2    # vs 3 for current weather
  waitDuration: 1s  # vs 500ms for current weather

Forecasts take longer, have larger payloads, and are less time-sensitive. Two attempts with a longer wait is more appropriate than three fast attempts.

🚦 Rate Limiter: The Quota Guard

The external WeatherAPI.com has usage quotas. The rate limiter prevents the Weather Microservice from exceeding them:

Configuration

resilience4j:
  ratelimiter:
    configs:
      default:
        limitForPeriod: 100
        limitRefreshPeriod: 1m
        timeoutDuration: 5s
    instances:
      weatherApi:
        limitForPeriod: 50
        limitRefreshPeriod: 1m
      weatherApiCurrent:
        limitForPeriod: 50
        limitRefreshPeriod: 1m
      weatherApiForecast:
        limitForPeriod: 30
        limitRefreshPeriod: 1m

How It Works

Minute 0:00 → 50 permits available (current weather)
Request 1-50: Permitted ✅
Request 51: Blocked (waits up to 5s for next minute)
Minute 1:00 → 50 permits refreshed
Request 52: Permitted ✅

Key settings:

Setting	Meaning
`limitForPeriod: 50`	50 calls allowed per period
`limitRefreshPeriod: 1m`	Permits refresh every minute
`timeoutDuration: 5s`	Wait up to 5s if no permits available

When the rate limit is exceeded, RequestNotPermitted is thrown and the GlobalExceptionHandler returns a 429 Too Many Requests response.

Split Quotas by Endpoint

The forecast endpoint gets only 30 calls/minute (vs. 50 for current weather). This ensures the cheaper current-weather calls always have quota available, even when forecast-heavy workloads are running.

🧩 Putting It All Together: The Defense Timeline

Here's what happens when a request hits the API client:

1. Rate Limiter Check
   +- Permits available? → Continue
   +- No permits? → Wait up to 5s → Timeout → 429 Too Many Requests

2. Circuit Breaker Check
   +- CLOSED? → Continue to actual call
   +- OPEN? → Skip call → Fallback → 503 Service Unavailable
   +- HALF-OPEN? → Allow test call → Continue

3. Retry Loop (up to 3 attempts)
   +- Attempt 1: Call API
   |   +- Success? → Return response
   |   +- Failure? → Wait 500ms
   +- Attempt 2: Call API
   |   +- Success? → Return response
   |   +- Failure? → Wait 1000ms
   +- Attempt 3: Call API
       +- Success? → Return response
       +- Failure? → Circuit breaker records failure → Fallback

4. Fallback
   +- Throw WeatherApiException → GlobalExceptionHandler → 503

What makes this design effective: each pattern handles a different failure mode:

Rate limiter → Prevents quota exhaustion (proactive)
Circuit breaker → Prevents hammering a dead API (reactive)
Retry → Handles transient failures (optimistic)
Fallback → Provides graceful degradation (last resort)

📊 The Resilience4j Configuration Hierarchy

resilience4j:
  circuitbreaker:
    configs:
      default:             # ← Template (shared defaults)
        failureRateThreshold: 50
    instances:
      weatherApiCurrent:   # ← Instance (inherits + overrides)
        baseConfig: default
        waitDurationInOpenState: 15s

This two-level configuration keeps things DRY:

Default configs define the baseline behavior
Instances inherit from defaults and override specific settings
Instance names match the annotation names: @CircuitBreaker(name = "weatherApiCurrent")

✅ Resilience Checklist

[ ] RestClient over RestTemplate — Modern, fluent API with better error handling
[ ] Virtual thread executor on HTTP client — Non-blocking I/O
[ ] Connect and read timeouts configured — Never wait forever
[ ] Circuit breaker with appropriate thresholds — 50% failure rate, 15s wait
[ ] Retry with exponential backoff — 3 attempts, doubling wait
[ ] Rate limiter per endpoint — Separate quotas for different call types
[ ] Fallback methods for every circuit breaker — Graceful degradation
[ ] Exception mapping — Transport errors wrapped in domain exceptions
[ ] 4xx/5xx handling — Different responses for client vs. server errors
[ ] Named instances — Each API endpoint has its own resilience config

🎓 Conclusion: Resilience Is Not Optional

External APIs will fail. The question is whether your service fails with them. Here's the defense strategy:

The SHIELD framework (Smart HTTP client, Handled failures, Intelligent retries, Event recording, Limited throughput, Degraded service) provides a comprehensive defense strategy
Spring's RestClient with JDK HttpClient and virtual threads provides modern, non-blocking HTTP communication
Circuit breakers prevent cascading failures by stopping requests to unhealthy services
Exponential backoff retries give transient failures time to resolve without creating retry storms
Rate limiters protect API quotas and prevent overwhelming external services
The three patterns compose — Rate limiter → Circuit breaker → Retry → Actual call
Separate instances per endpoint allow fine-tuned behavior for different API characteristics
Fallback methods provide graceful degradation when all else fails

In distributed systems, failure is the norm, not the exception. The Weather Microservice treats external APIs as fundamentally unreliable and designs accordingly. Your service should do the same.

Coming Next Week:
Part 6: Cache Me If You Can - Smart Caching Strategies for Microservices ⚡

📚 Series Progress

✅ Part 1: The Blueprint Before the Build
✅ Part 2: Spring Boot Alchemy
✅ Part 3: REST Assured
✅ Part 4: The Data Foundation
✅ Part 5: When the World Breaks ← You just finished this!
⬜ Part 6: Cache Me If You Can
⬜ Part 7: Guarding the Gates
⬜ Part 8: Fail Gracefully
⬜ Part 9: 10,000 Threads and a Dream
⬜ Part 10: Can You See Me Now?
⬜ Part 11: Trust, But Verify
⬜ Part 12: Ship It
⬜ Part 13: To Production and Beyond

Happy coding, and remember — hope is not a resilience strategy. ☕

🛡️ When the World Breaks: External API Integration and Resilience Patterns

📋 Introduction

🛡️ The SHIELD Framework: Six Pillars of Resilient Integration

🌐 The HTTP Client: Modern RestClient with Virtual Threads

1. Virtual Thread Executor

2. Dual Timeouts

3. JDK HttpClient Instead of Apache

🔌 The API Client: Three Resilience Patterns in One Class

Annotation Stack: Order Matters

Error Handling Strategy

🔄 Circuit Breaker: The Intelligent Fuse

Configuration

State Machine

Different Instances for Different Endpoints

The Fallback Method

🔁 Retry: The Persistence Pattern

Configuration

Exponential Backoff Timeline

Forecast Gets Fewer Retries

🚦 Rate Limiter: The Quota Guard

Configuration

How It Works

Split Quotas by Endpoint

🧩 Putting It All Together: The Defense Timeline

📊 The Resilience4j Configuration Hierarchy

✅ Resilience Checklist

🎓 Conclusion: Resilience Is Not Optional

Robert Marcel Saveanu

Read next

🗄️ The Data Foundation: JPA, Hibernate, and the Database Migration Playbook

🌐 REST Assured: Designing APIs Developers Actually Want to Use

⚙️ Spring Boot Alchemy: Turning Configuration into a Running Service