
分布式系统限流熔断实战保护微服务稳定性一、限流与熔断的核心概念在分布式系统中服务之间的依赖关系复杂当某个下游服务出现问题时可能会导致级联故障。限流和熔断是保障系统稳定性的两大核心手段。1.1 限流Rate Limiting限流是通过限制单位时间内的请求数量防止服务被过多请求压垮。常见的限流策略包括计数器算法固定时间窗口内统计请求数滑动窗口算法更精确的时间窗口统计漏桶算法匀速处理请求平滑流量令牌桶算法允许一定程度的突发流量1.2 熔断Circuit Breaker熔断机制借鉴了电路熔断的原理当服务调用失败率达到阈值时自动切断调用链路避免级联失败。熔断状态包括闭合状态正常工作统计失败率打开状态熔断触发直接返回错误半开状态尝试恢复检测服务是否可用二、使用 Sentinel 实现限流Sentinel 是阿里巴巴开源的流量控制组件提供了丰富的限流策略。2.1 引入依赖dependency groupIdcom.alibaba.csp/groupId artifactIdsentinel-core/artifactId version1.8.6/version /dependency dependency groupIdcom.alibaba.csp/groupId artifactIdsentinel-annotation-aspectj/artifactId version1.8.6/version /dependency2.2 配置限流规则public class SentinelConfig { PostConstruct public void init() { ListFlowRule rules new ArrayList(); FlowRule rule new FlowRule(); rule.setResource(order-service); rule.setCount(100); rule.setGrade(0); rule.setLimitApp(default); rules.add(rule); FlowRuleManager.loadRules(rules); } }2.3 注解方式使用限流Service public class OrderService { SentinelResource(value order-service, blockHandler handleBlock) public Order createOrder(OrderRequest request) { // 业务逻辑 return orderRepository.save(request); } public Order handleBlock(OrderRequest request, BlockException ex) { log.warn(请求被限流: {}, ex.getMessage()); return Order.builder() .status(FAILED) .message(系统繁忙请稍后重试) .build(); } }三、使用 Resilience4j 实现熔断Resilience4j 是一个轻量级的容错库提供了熔断、限流、重试等功能。3.1 引入依赖dependency groupIdio.github.resilience4j/groupId artifactIdresilience4j-circuitbreaker/artifactId version2.2.0/version /dependency dependency groupIdio.github.resilience4j/groupId artifactIdresilience4j-retry/artifactId version2.2.0/version /dependency3.2 配置熔断策略resilience4j: circuitbreaker: instances: payment-service: registerHealthIndicator: true slidingWindowSize: 100 minimumNumberOfCalls: 10 permittedNumberOfCallsInHalfOpenState: 3 automaticTransitionFromOpenToHalfOpenEnabled: true waitDurationInOpenState: 10s failureRateThreshold: 50 eventConsumerBufferSize: 103.3 编程式使用熔断Service public class PaymentService { private final CircuitBreaker circuitBreaker; Autowired public PaymentService(CircuitBreakerRegistry registry) { this.circuitBreaker registry.circuitBreaker(payment-service); } public PaymentResponse processPayment(PaymentRequest request) { SupplierPaymentResponse supplier CircuitBreaker .decorateSupplier(circuitBreaker, () - { return restTemplate.postForObject( http://payment-service/api/pay, request, PaymentResponse.class ); }); return Try.ofSupplier(supplier) .recover(ex - { log.error(支付服务熔断: {}, ex.getMessage()); return PaymentResponse.failure(支付服务暂时不可用); }) .get(); } }四、网关层限流方案在 API 网关层进行限流可以有效保护后端服务。4.1 Spring Cloud Gateway 限流Configuration public class GatewayConfig { Bean public KeyResolver ipKeyResolver() { return exchange - Mono.just( exchange.getRequest().getRemoteAddress().getAddress().getHostAddress() ); } Bean public RouteLocator customRouteLocator(RouteLocatorBuilder builder) { return builder.routes() .route(order-service, r - r.path(/api/orders/**) .filters(f - f.requestRateLimiter() .rateLimiter(redisRateLimiter()) .keyResolver(ipKeyResolver())) .uri(lb://order-service)) .build(); } }4.2 限流配置spring: cloud: gateway: filter: request-rate-limiter: redis-rate-limiter: replenishRate: 100 burstCapacity: 200五、分布式限流方案在分布式环境下单机限流无法满足全局限流需求。5.1 Redis Lua 实现分布式限流local key KEYS[1] local limit tonumber(ARGV[1]) local window tonumber(ARGV[2]) local current redis.call(GET, key) if current and tonumber(current) limit then return 0 end current redis.call(INCR, key) if tonumber(current) 1 then redis.call(EXPIRE, key, window) end return 15.2 Java 调用示例public class RedisRateLimiter { private final StringRedisTemplate redisTemplate; Autowired public RedisRateLimiter(StringRedisTemplate redisTemplate) { this.redisTemplate redisTemplate; } public boolean tryAcquire(String key, int limit, int windowSeconds) { String luaScript // Lua脚本内容 Long result redisTemplate.execute( new DefaultRedisScript(luaScript, Long.class), Collections.singletonList(key), String.valueOf(limit), String.valueOf(windowSeconds) ); return result ! null result 1; } }六、最佳实践总结6.1 限流策略选择场景推荐策略说明API入口令牌桶算法允许突发流量资源保护计数器算法简单高效数据库访问漏桶算法平滑流量6.2 熔断配置建议failureRateThreshold: 建议设置为 50%waitDurationInOpenState: 建议设置为 10-30 秒slidingWindowSize: 根据 QPS 调整建议 100-10006.3 多层防护策略┌─────────────────────────────────────────────────────┐ │ API Gateway (入口限流) │ ├─────────────────────────────────────────────────────┤ │ Service Mesh (熔断降级) │ ├─────────────────────────────────────────────────────┤ │ 业务层 (本地限流) │ ├─────────────────────────────────────────────────────┤ │ 数据库 (连接池限流) │ └─────────────────────────────────────────────────────┘通过多层防护可以有效保障分布式系统的稳定性防止单点故障引发的级联崩溃。七、监控与告警限流熔断的效果需要通过监控来评估management: endpoints: web: exposure: include: health,metrics,prometheus metrics: export: prometheus: enabled: true配置 Prometheus 告警规则groups: - name: circuit_breaker_alerts rules: - alert: CircuitBreakerOpen expr: resilience4j_circuitbreaker_state{stateOPEN} 1 for: 1m labels: severity: critical annotations: summary: 熔断器打开: {{ $labels.name }} description: {{ $labels.name }} 熔断器已打开服务可能不可用通过合理配置限流熔断策略并结合监控告警可以在保障系统稳定性的同时为用户提供良好的服务体验。