
告别手动证书管理cert-manager在Kubernetes中的全自动TLS实践凌晨三点服务突然中断——原因竟是证书过期。这种场景对Kubernetes运维团队来说再熟悉不过。传统手动管理证书的方式不仅耗时耗力还隐藏着巨大的运维风险。本文将带你用cert-manager构建自动化证书管理体系让TLS证书的申请、续期和部署实现全生命周期无人值守。1. 为什么需要自动化证书管理手动管理TLS证书就像用算盘处理现代金融交易——理论上可行但效率与风险完全不成正比。当集群规模超过10个服务时管理员需要跟踪数十个证书的过期时间手动执行重复的签发流程。更危险的是90%的证书相关事故都源于人为疏忽导致的过期未更新。cert-manager作为Kubernetes原生的证书管理工具通过与Lets Encrypt等CA的集成实现了自动续期在证书到期前自动申请新证书集中配置通过CRD统一管理所有证书策略无缝集成自动将证书注入Ingress和Pod多CA支持同时对接Lets Encrypt、Venafi等不同CA# 手动管理证书的典型生命周期 openssl req -new -newkey rsa:2048 -nodes -keyout tls.key -out tls.csr kubectl create secret tls my-cert --certtls.crt --keytls.key # 然后每月检查一次过期时间...2. 搭建cert-manager基础环境2.1 安装cert-managercert-manager的安装需要三个核心组件CustomResourceDefinitions扩展Kubernetes APIController证书生命周期管理核心Webhook验证和修改资源配置使用官方提供的manifest一键安装最新稳定版kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml验证安装成功的三个关键点所有pod处于Running状态出现cert-manager开头的CRD资源能够创建ClusterIssuer资源注意生产环境建议使用helm进行安装便于后续升级和配置管理2.2 配置Lets Encrypt ClusterIssuerLets Encrypt提供两种验证方式HTTP-01通过域名解析验证DNS-01通过DNS记录验证支持通配符以下是配置DNS验证的ClusterIssuer示例以Cloudflare为例apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: adminexample.com privateKeySecretRef: name: letsencrypt-prod-account-key solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token key: token关键参数说明参数说明生产环境建议值serverACME服务器地址生产用v02而非stagingemail紧急通知邮箱公司运维专用邮箱privateKeySecretRef账户密钥存储位置使用kms加密的secret3. 证书签发实战场景3.1 为Ingress自动配置证书传统方式需要手动创建secret然后配置Ingresscert-manager只需一个annotation即可自动完成apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: web-app annotations: cert-manager.io/cluster-issuer: letsencrypt-prod spec: tls: - hosts: - app.example.com secretName: app-tls-cert rules: - host: app.example.com http: paths: - path: / pathType: Prefix backend: service: name: web-app port: number: 80证书状态验证命令kubectl get certificate kubectl describe certificate app-tls-cert3.2 为工作负载直接签发证书某些场景下服务不通过Ingress暴露但仍需要TLS加密。cert-manager可以通过Certificate资源直接管理apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: grpc-server-cert spec: secretName: grpc-tls duration: 2160h # 90天 renewBefore: 720h # 到期前30天续期 commonName: grpc.internal dnsNames: - grpc.internal - grpc.prod.svc.cluster.local issuerRef: name: letsencrypt-prod kind: ClusterIssuer证书将自动注入到指定secret中Pod可以直接挂载使用containers: - name: grpc-server volumeMounts: - name: tls mountPath: /etc/tls volumes: - name: tls secret: secretName: grpc-tls4. 高级配置与故障排查4.1 多CA策略配置生产环境通常需要混合使用不同CAcert-manager支持基于注解的签发策略选择apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: mixed-ca-cert annotations: cert-manager.io/issuer-selector: ca-type in (internal, letsencrypt) spec: secretName: mixed-ca-tls issuerRef: name: ca-pool kind: ClusterIssuer --- apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: ca-pool spec: issuerGroups: - name: internal issuers: - name: vault-issuer kind: Issuer - name: letsencrypt issuers: - name: letsencrypt-prod kind: ClusterIssuer4.2 常见问题排查指南当证书签发失败时按以下步骤排查检查Issuer状态kubectl describe clusterissuer letsencrypt-prod查看CertificateRequestkubectl get certificaterequest kubectl describe certificaterequest name检查Order资源ACME专用kubectl get order kubectl describe order name查看Pod日志kubectl logs -n cert-manager -l app.kubernetes.io/instancecert-manager典型错误场景处理错误现象可能原因解决方案CertificateRequest pendingCA服务器不可达检查网络策略和出口规则Invalid domainDNS配置错误验证域名解析和DNS策略Rate limit exceededLets Encrypt限制切换临时使用staging环境5. 生产环境最佳实践5.1 证书监控与告警虽然cert-manager会自动续期但仍需建立监控体系apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: cert-manager-alerts spec: groups: - name: cert-manager rules: - alert: CertificateExpiringSoon expr: certmanager_certificate_expiration_timestamp_seconds - time() 86400 * 30 for: 5m labels: severity: warning annotations: summary: Certificate will expire soon (instance {{ $labels.instance }}) description: Certificate {{ $labels.name }} expires in {{ humanizeDuration (($value - time())) }}5.2 安全加固措施私钥保护apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: secured-cert spec: privateKey: rotationPolicy: Always algorithm: RSA size: 4096网络策略限制apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: cert-manager-egress spec: podSelector: matchLabels: app.kubernetes.io/instance: cert-manager egress: - to: - ipBlock: cidr: 0.0.0.0/0 except: - 192.168.0.0/16 ports: - protocol: TCP port: 443备份策略# 备份关键Issuer配置 kubectl get clusterissuer -o yaml clusterissuers-backup.yaml # 备份所有证书私钥 kubectl get secret -l cert-manager.io/certificate-name -o yaml certificates-backup.yaml在万级规模集群中运行cert-manager两年后我们总结出三点核心经验第一一定要为DNS验证配置专用服务账号避免使用全局API密钥第二定期轮换ACME账户私钥建议每半年一次第三开发环境使用staging端点时要注意其证书不受主流浏览器信任的特性。