📚 Series Navigation:
← Previous: Part 4 - The Data Foundation
👉 You are here: Part 5 - When the World Breaks
Next: Part 6 - Cache Me If You Can →
📋 Introduction
You've built a beautiful microservice. Your architecture is layered. Your database is migrated. Your API documentation is pristine. Then Monday morning happens.
The external weather API goes down. Not completely — that would be easy to handle. No, it's doing something far worse: responding to 60% of requests normally, timing out on 30%, and returning garbage data on the remaining 10%. Your service is now hammering a sick API with retry storms, your thread pool is full of connections waiting for timeouts, and your users are getting a random grab bag of stale data, errors, and infinite loading spinners.
Welcome to distributed systems. The question isn't whether external dependencies will fail — it's when, and whether your service will survive it.
In this article, we'll explore how the Weather Microservice integrates with the WeatherAPI.com external service using Spring's modern RestClient, and protects itself using Resilience4j's three most powerful patterns: circuit breakers, retries, and rate limiters. ☕
🛡️ The SHIELD Framework: Six Pillars of Resilient Integration
Meet SHIELD — six principles for surviving external API failures:
| Letter | Principle | What It Means |
|---|---|---|
| S | Smart HTTP Client | Modern RestClient with virtual threads and timeouts |
| H | Handled Failures | Every error path has a defined behavior |
| I | Intelligent Retries | Exponential backoff, not retry storms |
| E | Event Recording | Failures are logged and metered for observability |
| L | Limited Throughput | Rate limiting prevents API quota exhaustion |
| D | Degraded Service | Circuit breakers provide fast failure when APIs are down |
🌐 The HTTP Client: Modern RestClient with Virtual Threads
The Weather Microservice uses Spring's RestClient (introduced in Spring 6.1) instead of the older RestTemplate:
@Configuration
public class RestClientConfig {
@Value("${weather.api.timeout:5000}")
private long timeout;
@Value("${weather.api.base-url}")
private String baseUrl;
@Bean
public RestClient weatherRestClient() {
HttpClient httpClient =
HttpClient.newBuilder()
.connectTimeout(Duration.ofMillis(timeout))
.executor(Executors.newVirtualThreadPerTaskExecutor())
.build();
JdkClientHttpRequestFactory requestFactory = new JdkClientHttpRequestFactory(httpClient);
requestFactory.setReadTimeout(Duration.ofMillis(timeout));
return RestClient.builder()
.baseUrl(baseUrl)
.requestFactory(requestFactory)
.defaultHeader("Accept", "application/json")
.build();
}
}
Three critical design decisions:
1. Virtual Thread Executor
.executor(Executors.newVirtualThreadPerTaskExecutor())
The HTTP client uses virtual threads for I/O operations. While a traditional thread pool would block platform threads during API calls (waiting for network responses), virtual threads yield when blocked on I/O. This means thousands of concurrent API calls without thread pool exhaustion.
2. Dual Timeouts
.connectTimeout(Duration.ofMillis(timeout)) // Connection establishment
requestFactory.setReadTimeout(Duration.ofMillis(timeout)); // Response reading
Two separate timeouts protect against different failure modes:
- Connect timeout — How long to wait for TCP connection establishment
- Read timeout — How long to wait for the response after connecting
Both default to 5 seconds. Without these, a hung API would block your threads indefinitely.
3. JDK HttpClient Instead of Apache
The Weather Microservice uses Java's built-in HttpClient (Java 11+) instead of Apache HttpClient. Benefits:
- Native virtual thread support
- No additional dependency
- HTTP/2 support built-in
- Modern, fluent API
🔌 The API Client: Three Resilience Patterns in One Class
The WeatherApiClient is where resilience patterns come together:
@Slf4j
@Component
public class WeatherApiClient {
private final RestClient restClient;
private final String apiKey;
public WeatherApiClient(
RestClient weatherRestClient,
@Value("${weather.api.key}") String apiKey) {
this.restClient = weatherRestClient;
this.apiKey = apiKey;
}
@CircuitBreaker(name = "weatherApiCurrent", fallbackMethod = "getCurrentWeatherFallback")
@Retry(name = "weatherApiCurrent")
@RateLimiter(name = "weatherApiCurrent")
public WeatherApiResponse getCurrentWeather(String location) {
log.debug("Fetching current weather for location: {}", location);
try {
WeatherApiResponse response = restClient.get()
.uri(uriBuilder -> uriBuilder
.path("/current.json")
.queryParam("key", apiKey)
.queryParam("q", location)
.queryParam("aqi", "no")
.build())
.retrieve()
.onStatus(HttpStatusCode::is4xxClientError, (request, clientResponse) -> {
throw new WeatherApiException("Invalid location or API request: " + location);
})
.onStatus(HttpStatusCode::is5xxServerError, (request, serverResponse) -> {
throw new WeatherApiException("Weather API server error");
})
.body(WeatherApiResponse.class);
if (response == null) {
throw new WeatherApiException("Failed to fetch weather data: empty response");
}
log.info("Successfully fetched current weather for: {}", location);
return response;
} catch (WeatherApiException e) {
throw e;
} catch (RestClientException e) {
throw new WeatherApiException("Failed to fetch weather data: " + e.getMessage(), e);
}
}
}
Annotation Stack: Order Matters
@CircuitBreaker(name = "weatherApiCurrent", fallbackMethod = "getCurrentWeatherFallback")
@Retry(name = "weatherApiCurrent")
@RateLimiter(name = "weatherApiCurrent")
These three annotations create a layered defense:
Request → RateLimiter → CircuitBreaker → Retry → Actual API Call
↑ ↑ ↑
Check quota Check state Attempt call
(50/min) (open/closed) (up to 3 times)
The execution order (innermost to outermost) is:
- Rate Limiter checks if we're within API quota
- Circuit Breaker checks if the API is healthy
- Retry attempts the call up to 3 times with exponential backoff
- Fallback activates when all retries and circuit breaker fail
Error Handling Strategy
.onStatus(HttpStatusCode::is4xxClientError, (request, clientResponse) -> {
throw new WeatherApiException("Invalid location or API request: " + location);
})
.onStatus(HttpStatusCode::is5xxServerError, (request, serverResponse) -> {
throw new WeatherApiException("Weather API server error");
})
4xx and 5xx errors are converted to WeatherApiException — the application's domain exception. This lets the circuit breaker track failures using a consistent exception type.
The catch block at the bottom handles connection failures, timeouts, and other transport-level errors:
catch (RestClientException e) {
throw new WeatherApiException("Failed to fetch weather data: " + e.getMessage(), e);
}
🔄 Circuit Breaker: The Intelligent Fuse
The circuit breaker pattern prevents your service from hammering a sick API. Think of it as an electrical fuse — when too many failures occur, it "trips" and stops sending requests.
Configuration
resilience4j:
circuitbreaker:
configs:
default:
failureRateThreshold: 50
minimumNumberOfCalls: 5
waitDurationInOpenState: 10s
permittedNumberOfCallsInHalfOpenState: 3
slidingWindowSize: 10
slidingWindowType: COUNT_BASED
slowCallRateThreshold: 60
slowCallDurationThreshold: 3s
recordExceptions:
- com.weatherspring.exception.WeatherApiException
- java.io.IOException
- java.util.concurrent.TimeoutException
instances:
weatherApiCurrent:
baseConfig: default
failureRateThreshold: 50
waitDurationInOpenState: 15s
minimumNumberOfCalls: 10
State Machine
+----------------------+
| CLOSED | Normal operation
| (requests pass | Tracking failures in sliding window
| through) |
+----------+-----------+
| Failure rate > 50%
| (after 10 minimum calls)
▼
+----------------------+
| OPEN | All requests fail immediately
| (fast failure) | No API calls made
+----------+-----------+
| After 15 seconds
▼
+----------------------+
| HALF-OPEN | 3 test requests allowed
| (testing recovery) | If they succeed → CLOSED
+----------+-----------+ If they fail → OPEN
|
▼
Success? → CLOSED
Failure? → OPEN
Key settings explained:
| Setting | Value | Meaning |
|---|---|---|
failureRateThreshold: 50 |
50% | Open circuit when half of calls fail |
minimumNumberOfCalls: 10 |
10 | Need 10 calls before evaluating failure rate |
waitDurationInOpenState: 15s |
15s | Stay open for 15 seconds before testing |
permittedNumberOfCallsInHalfOpenState: 3 |
3 | Allow 3 test calls in half-open state |
slidingWindowSize: 10 |
10 | Evaluate last 10 calls |
slowCallDurationThreshold: 3s |
3s | Calls > 3 seconds count as slow |
slowCallRateThreshold: 60 |
60% | Open circuit when 60% of calls are slow |
Different Instances for Different Endpoints
instances:
weatherApiCurrent:
waitDurationInOpenState: 15s
minimumNumberOfCalls: 10
weatherApiForecast:
waitDurationInOpenState: 20s
minimumNumberOfCalls: 10
slowCallDurationThreshold: 5s
The forecast endpoint has a longer slowCallDurationThreshold (5s vs 3s) because forecast responses contain more data and naturally take longer. A separate circuit breaker means a failing current-weather endpoint doesn't prevent forecast requests from working.
The Fallback Method
private WeatherApiResponse getCurrentWeatherFallback(String location, Throwable throwable) {
log.error("Circuit breaker fallback triggered for getCurrentWeather. "
+ "Location: {}, Error: {}", location, throwable.getMessage());
throw new WeatherApiException(
"Weather service is currently unavailable. Please try again later. "
+ "Location: " + location, throwable);
}
When the circuit breaker is open or all retries are exhausted, the fallback method is called. Here it throws a WeatherApiException with a user-friendly message, which the GlobalExceptionHandler converts to a 503 Service Unavailable response.
🤔 Why not return cached data from the fallback? That's a valid pattern, but the Weather Microservice keeps the fallback simple — it reports the failure. The caching layer (Part 6) independently serves cached data if available. Separating caching and fallback concerns makes each easier to reason about.
🔁 Retry: The Persistence Pattern
When an API call fails, it might be a transient issue — a network blip, a temporary server overload. Retries give the API a chance to recover:
Configuration
resilience4j:
retry:
configs:
default:
maxAttempts: 3
waitDuration: 1s
enableExponentialBackoff: true
exponentialBackoffMultiplier: 2
retryExceptions:
- com.weatherspring.exception.WeatherApiException
- java.io.IOException
instances:
weatherApiCurrent:
maxAttempts: 3
waitDuration: 500ms
weatherApiForecast:
maxAttempts: 2
waitDuration: 1s
Exponential Backoff Timeline
Attempt 1: Immediate
↓ fail
Wait 500ms
Attempt 2:
↓ fail
Wait 1000ms (500ms × 2)
Attempt 3:
↓ fail
→ Fallback triggered
The wait duration doubles after each failure. This is critical because:
- Without backoff: 3 retries in 100ms = retry storm on an already struggling server
- With exponential backoff: Progressively longer waits give the server time to recover
Forecast Gets Fewer Retries
weatherApiForecast:
maxAttempts: 2 # vs 3 for current weather
waitDuration: 1s # vs 500ms for current weather
Forecasts take longer, have larger payloads, and are less time-sensitive. Two attempts with a longer wait is more appropriate than three fast attempts.
🚦 Rate Limiter: The Quota Guard
The external WeatherAPI.com has usage quotas. The rate limiter prevents the Weather Microservice from exceeding them:
Configuration
resilience4j:
ratelimiter:
configs:
default:
limitForPeriod: 100
limitRefreshPeriod: 1m
timeoutDuration: 5s
instances:
weatherApi:
limitForPeriod: 50
limitRefreshPeriod: 1m
weatherApiCurrent:
limitForPeriod: 50
limitRefreshPeriod: 1m
weatherApiForecast:
limitForPeriod: 30
limitRefreshPeriod: 1m
How It Works
Minute 0:00 → 50 permits available (current weather)
Request 1-50: Permitted ✅
Request 51: Blocked (waits up to 5s for next minute)
Minute 1:00 → 50 permits refreshed
Request 52: Permitted ✅
Key settings:
| Setting | Meaning |
|---|---|
limitForPeriod: 50 |
50 calls allowed per period |
limitRefreshPeriod: 1m |
Permits refresh every minute |
timeoutDuration: 5s |
Wait up to 5s if no permits available |
When the rate limit is exceeded, RequestNotPermitted is thrown and the GlobalExceptionHandler returns a 429 Too Many Requests response.
Split Quotas by Endpoint
The forecast endpoint gets only 30 calls/minute (vs. 50 for current weather). This ensures the cheaper current-weather calls always have quota available, even when forecast-heavy workloads are running.
🧩 Putting It All Together: The Defense Timeline
Here's what happens when a request hits the API client:
1. Rate Limiter Check
+- Permits available? → Continue
+- No permits? → Wait up to 5s → Timeout → 429 Too Many Requests
2. Circuit Breaker Check
+- CLOSED? → Continue to actual call
+- OPEN? → Skip call → Fallback → 503 Service Unavailable
+- HALF-OPEN? → Allow test call → Continue
3. Retry Loop (up to 3 attempts)
+- Attempt 1: Call API
| +- Success? → Return response
| +- Failure? → Wait 500ms
+- Attempt 2: Call API
| +- Success? → Return response
| +- Failure? → Wait 1000ms
+- Attempt 3: Call API
+- Success? → Return response
+- Failure? → Circuit breaker records failure → Fallback
4. Fallback
+- Throw WeatherApiException → GlobalExceptionHandler → 503
What makes this design effective: each pattern handles a different failure mode:
- Rate limiter → Prevents quota exhaustion (proactive)
- Circuit breaker → Prevents hammering a dead API (reactive)
- Retry → Handles transient failures (optimistic)
- Fallback → Provides graceful degradation (last resort)
📊 The Resilience4j Configuration Hierarchy
resilience4j:
circuitbreaker:
configs:
default: # ← Template (shared defaults)
failureRateThreshold: 50
instances:
weatherApiCurrent: # ← Instance (inherits + overrides)
baseConfig: default
waitDurationInOpenState: 15s
This two-level configuration keeps things DRY:
- Default configs define the baseline behavior
- Instances inherit from defaults and override specific settings
- Instance names match the annotation names:
@CircuitBreaker(name = "weatherApiCurrent")
✅ Resilience Checklist
- [ ] RestClient over RestTemplate — Modern, fluent API with better error handling
- [ ] Virtual thread executor on HTTP client — Non-blocking I/O
- [ ] Connect and read timeouts configured — Never wait forever
- [ ] Circuit breaker with appropriate thresholds — 50% failure rate, 15s wait
- [ ] Retry with exponential backoff — 3 attempts, doubling wait
- [ ] Rate limiter per endpoint — Separate quotas for different call types
- [ ] Fallback methods for every circuit breaker — Graceful degradation
- [ ] Exception mapping — Transport errors wrapped in domain exceptions
- [ ] 4xx/5xx handling — Different responses for client vs. server errors
- [ ] Named instances — Each API endpoint has its own resilience config
🎓 Conclusion: Resilience Is Not Optional
External APIs will fail. The question is whether your service fails with them. Here's the defense strategy:
- The SHIELD framework (Smart HTTP client, Handled failures, Intelligent retries, Event recording, Limited throughput, Degraded service) provides a comprehensive defense strategy
- Spring's RestClient with JDK HttpClient and virtual threads provides modern, non-blocking HTTP communication
- Circuit breakers prevent cascading failures by stopping requests to unhealthy services
- Exponential backoff retries give transient failures time to resolve without creating retry storms
- Rate limiters protect API quotas and prevent overwhelming external services
- The three patterns compose — Rate limiter → Circuit breaker → Retry → Actual call
- Separate instances per endpoint allow fine-tuned behavior for different API characteristics
- Fallback methods provide graceful degradation when all else fails
In distributed systems, failure is the norm, not the exception. The Weather Microservice treats external APIs as fundamentally unreliable and designs accordingly. Your service should do the same.
Coming Next Week:
Part 6: Cache Me If You Can - Smart Caching Strategies for Microservices ⚡
📚 Series Progress
✅ Part 1: The Blueprint Before the Build
✅ Part 2: Spring Boot Alchemy
✅ Part 3: REST Assured
✅ Part 4: The Data Foundation
✅ Part 5: When the World Breaks ← You just finished this!
⬜ Part 6: Cache Me If You Can
⬜ Part 7: Guarding the Gates
⬜ Part 8: Fail Gracefully
⬜ Part 9: 10,000 Threads and a Dream
⬜ Part 10: Can You See Me Now?
⬜ Part 11: Trust, But Verify
⬜ Part 12: Ship It
⬜ Part 13: To Production and Beyond
Happy coding, and remember — hope is not a resilience strategy. ☕