
硬件需求3 台 CentOS 7.98 核 / 64G 内存 / 1T 硬盘用途搭建生产可用、高可用、企业标准 K8s 集群我直接给你最简、最稳、可直接落地的企业级方案3 节点 All-in-One 高可用 K8sMaster Node 混合部署不用额外买服务器3 台就能跑生产级 HA。一、整体架构企业标准3 节点角色最优分配表格节点IP角色用途node110.10.6.27master etcd worker控制面 运行业务 Podnode210.10.6.28master etcd worker控制面 运行业务 Podnode310.10.6.29master etcd worker控制面 运行业务 Pod为什么这样设计3 台 高可用 etcd 高可用 master没有单点故障资源利用率最高企业中小规模生产环境标准架构二、版本选择企业最稳定K8s 1.28.15长期支持版containerd 1.7.xCalico 网络插件Metrics Dashboard ingress-nginx local-path-provisioner三、前置要求所有节点执行1、先把 CentOS7 系统源修好必须第一步1备份旧源bash 运行mkdir -p /etc/yum.repos.d/backup mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/backup/2写入可用的 CentOS7 阿里云 EOL 源bash 运行cat EOF /etc/yum.repos.d/CentOS-Base.repo [base] nameCentOS-7-Base baseurlhttps://mirrors.aliyun.com/centos-vault/7.9.2009/os/x86_64/ gpgcheck0 enabled1 [extras] nameCentOS-7-Extras baseurlhttps://mirrors.aliyun.com/centos-vault/7.9.2009/extras/x86_64/ gpgcheck0 enabled1 [updates] nameCentOS-7-Updates baseurlhttps://mirrors.aliyun.com/centos-vault/7.9.2009/updates/x86_64/ gpgcheck0 enabled1 EOF3清缓存、重建缓存bash 运行yum clean all rm -rf /var/cache/yum yum makecache2. 关闭防火墙、selinux、swapbash 运行systemctl stop firewalld systemctl disable firewalld sed -i s/enforcing/disabled/ /etc/selinux/config setenforce 0 swapoff -a sed -i /swap/s/^/#/ /etc/fstab3. 加载 overlay br_netfilter 模块kubelet 依赖bash 运行modprobe overlay modprobe br_netfilter cat EOF /etc/modules-load.d/k8s.conf overlay br_netfilter EOF3. 主机名、hosts 解析bash 运行# 所有节点 /etc/hosts 10.10.6.26 node-01 10.10.6.27 node-02 10.10.6.28 node-034. 内核参数必须bash 运行cat EOF /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables 1 net.bridge.bridge-nf-call-ip6tables 1 net.ipv4.ip_forward 1 EOF sysctl --system5. 安装 containerdbash 运行yum install -y yum-utils device-mapper-persistent-data lvm2 yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo yum install -y containerd.io containerd config default /etc/containerd/config.toml sed -i s/SystemdCgroup false/SystemdCgroup true/g /etc/containerd/config.toml systemctl enable containerd systemctl start containerd配置一下国内镜像加速【很重要】bash 运行# 备份原有配置 cp /etc/containerd/config.toml /etc/containerd/config.toml.bak # 写入多国内加速源配置 sed -i /\[plugins.io.containerd.grpc.v1.cri.registry.mirrors\]/,$d /etc/containerd/config.toml cat /etc/containerd/config.toml EOF [plugins.io.containerd.grpc.v1.cri.registry.mirrors] [plugins.io.containerd.grpc.v1.cri.registry.mirrors.docker.io] endpoint [ https://docker.mirrors.ustc.edu.cn, https://hub-mirror.c.163.com, https://mirror.ccs.tencentyun.com, https://docker.xuanyuan.me ] [plugins.io.containerd.grpc.v1.cri.registry.mirrors.registry.k8s.io] endpoint [https://registry.aliyuncs.com/k8s_images] EOF # 重载重启containerd systemctl daemon-reload systemctl restart containerd6. 添加 K8s yum 源bash 运行 【CentOS7 EOL→ 旧 base/updates 源 404yum 整体 “无可用镜像”】cat EOF /etc/yum.repos.d/kubernetes.repo [kubernetes] nameKubernetes baseurlhttps://mirrors.aliyun.com/kubernetes-new/core/stable/v1.28/rpm/ enabled1 gpgcheck0 excludekubelet kubeadm kubectl cri-tools kubernetes-cni EOF7. 安装 kubeadm kubelet kubectlbash 运行yum install -y kubelet kubeadm kubectl --disableexcludeskubernetes --nogpgcheck systemctl enable kubelet systemctl start kubelet--disableexcludeskubernetes让 yum 强制安装 k8s 相关包不被系统屏蔽--nogpgcheck不校验包签名解决阿里云源无法验证签名的报错我要强制安装 k8s 包不要拦我也不要检查签名直接装只要你用阿里云 kubernetes-new 源安装时必须加这两个参数否则要么装不上要么报 GPG 签名错误8.锁定版本 开机自启bash 运行yum install -y yum-plugin-versionlock yum versionlock add kubelet kubeadm kubectl systemctl enable kubelet四、初始化 3 节点高可用 K8s核心步骤全部node 执行1先手动拉齐所有 v1.28.15 镜像所有节点都执行bash 运行# 先列一下需要哪些镜像 kubeadm config images list --kubernetes-versionv1.28.15然后手动拉用 containerd 的 ctr或者 docker看你 runtimebash 运行# 用阿里云 google_containers 拉取 IMAGES( kube-apiserver:v1.28.15 kube-controller-manager:v1.28.15 kube-scheduler:v1.28.15 kube-proxy:v1.28.15 pause:3.9 etcd:3.5.9-0 coredns/coredns:v1.10.1 ) for img in ${IMAGES[]}; do ctr -n k8s.io images pull registry.cn-hangzhou.aliyuncs.com/google_containers/$img ctr -n k8s.io images tag registry.cn-hangzhou.aliyuncs.com/google_containers/$img registry.k8s.io/$img doneregistry.cn-hangzhou.aliyuncs.com/google_containers/coredns/coredns:v1.10.1阿里云没有这个路径coredns 不在 google_containers 下面正确拉取命令ctr -n k8s.io images pull registry.aliyuncs.com/google_containers/coredns:v1.10.1ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/coredns:v1.10.1 registry.k8s.io/coredns/coredns:v1.10.1在所有节点改 containerd 配置bash 运行sed -i s#sandbox_image .*#sandbox_image registry.aliyuncs.com/google_containers/pause:3.9# /etc/containerd/config.toml systemctl restart containerd你现在直接执行如果你是 docker把ctr -n k8s.io images pull换成docker pulltag同理。registry.aliyuncs.com/k8sxio是第三方个人 / 小团队同步不全、更新慢、部分版本缺失。registry.aliyuncs.com/google_containers是阿里云官方镜像站同步版本全、稳定、常用版本都有。2在 node-03 执行bash 运行kubeadm init --control-plane-endpoint 10.10.6.29 \ --apiserver-advertise-address10.10.6.29 \ --image-repository registry.aliyuncs.com/google_containers \ --kubernetes-version v1.28.15 \ --service-cidr10.96.0.0/12 \ --pod-network-cidr10.244.0.0/16 \ --ignore-preflight-errorsSwap \ --upload-certs执行成功后它会输出两条 join 命令复制这两条命令到node-01node-02中执行一条给master 节点加入一条给worker 节点加入如果出现错误[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory /etc/kubernetes/manifests. This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed.kubelet 还是没正常跑起来所以控制平面起不来不是 init 命令错是 containerd kubelet 配合问题立刻执行这 4 步3 台都执行1重置 kubeadm必须bash 运行kubeadm reset -f rm -rf /etc/kubernetes/ rm -rf /var/lib/kubelet/ rm -rf /var/lib/etcd/ rm -rf ~/.kube/2 修复 containerd 关键配置最关键bash 运行containerd config default /etc/containerd/config.toml sed -i s/SystemdCgroup false/SystemdCgroup true/ /etc/containerd/config.toml sed -i s#registry.k8s.io/pause#registry.aliyuncs.com/google_containers/pause#g /etc/containerd/config.toml3重启服务bash 运行systemctl daemon-reload systemctl restart containerd systemctl enable kubelet systemctl restart kubelet4再检查 kubelet 状态现在应该是启动中正常bash 运行systemctl status kubelet只要不是红色 failed就可以继续3在 node-02、node-01 执行bash 运行# 这条命令直接从node-3 init初始化完成后的输出结果上复制即可 kubeadm join 10.10.6.28:6443 --token xxxx \ --discovery-token-ca-cert-hash sha256:yyyy \ --control-plane --certificate-key zzzz上面的运行执令来自于node-03 init输出的日志例如下图。五、安装 Calico 网络1. 安装【仅在10.10.6.29上执行】1先下到本地国内推荐用 GitHub 源bash 运行# 下载yaml文件到本地 curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.5/manifests/calico.yaml -O # 或者直接执行如果直接安装则跳过下面步骤 kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.5/manifests/calico.yamlv3.26.5 是适配 K8s 1.28的稳定版。2修改 Pod 网段必须和你 kubeadm init 一致init 用的是--pod-network-cidr10.244.0.0/16所以在 calico.yaml 里找到# - name: CALICO_IPV4POOL_CIDR # value: 192.168.0.0/16改成- name: CALICO_IPV4POOL_CIDR value: 10.244.0.0/16可以直接 sed 一键改bash 运行sed -i s|192.168.0.0/16|10.244.0.0/16|g calico.yaml3 apply 本地文件bash 运行kubectl apply -f calico.yaml4验证bash 运行kubectl get nodes kubectl get pods -n kube-system | grep calico所有 calico-xxx 都Running节点状态变成Ready就成功了。5异常处理【calico无法运行】国内网络拉不到 quay.io/docker.io 的 Calico 镜像所以卡在 Init:ImagePullBackOff。下面给你一套一键替换国内源、重新部署的完整操作马上能好。a. 先把现在失败的 Calico 删掉bash 运行kubectl delete -f calico.yaml # 确认 calico 相关 Pod 全部消失 kubectl get pods -n kube-system | grep calicob. 重新下载并替换为国内可拉的镜像v3.26.5bash 运行# 下官方原始文件 wget https://raw.githubusercontent.com/projectcalico/calico/v3.26.5/manifests/calico.yaml # 关键把 quay.io 换成 DaoCloud 国内镜像 sed -i s#quay.io/calico/#docker.m.daocloud.io/calico/#g calico.yaml # 顺便把 docker.io 也换掉有的版本会写这个 sed -i s#docker.io/calico/#docker.m.daocloud.io/calico/#g calico.yaml这里用的是docker.m.daocloud.io国内基本秒拉比阿里云还稳。c. 应用改好的 yamlbash 运行kubectl apply -f calico.yamld. 观察状态2–3 分钟内必成bash 运行kubectl get pods -n kube-system | grep calico正常结果calico-node-xxxx 3/3 Running calico-kube-controllers-xxxx 1/1 Runninge. 再看 metrics-server等 calico 全 Runningbash 运行kubectl get pods -n kube-system | grep metrics-server会自动从ContainerCreating变成1/1 Running。然后bash 运行kubectl top nodes kubectl top pods -A就能看到监控数据了。2. 配置同步1在node-03执行bash 运行# 复制到 node-02 scp /etc/kubernetes/admin.conf root10.10.6.28:/etc/kubernetes/ # 复制到 node-01 scp /etc/kubernetes/admin.conf root10.10.6.27:/etc/kubernetes/2在node-02 / node-03执行配置bash 运行mkdir -p $HOME/.kube cp -i /etc/kubernetes/admin.conf $HOME/.kube/config chown $(id -u):$(id -g) $HOME/.kube/config3完成现在所有节点都能跑bash 运行kubectl get nodes✅全部节点都能正常使用 kubectl 了六、允许 Master 运行业务 Pod企业常用bash 运行kubectl taint nodes --all node-role.kubernetes.io/control-plane-七、企业级必装组件1. metrics-server 监控bash 运行# 直接 apply 官方 yaml 即可它会自动调度到其中一个 master # 只在一个节点上操作一次即可 kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml连不上 GitHub 原始 yaml国内网络超时直接用我给你的改好国内镜像 跳过证书的完整 yaml 就行不用再去拉官方文件。1直接用复制保存为 metrics-server.yamlyamlapiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:metrics-server rules: - apiGroups: - resources: - pods - nodes - nodes/stats - namespaces - configmaps verbs: - get - list - watch - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system roleRef: kind: ClusterRole name: system:metrics-server apiGroup: rbac.authorization.k8s.io --- apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server namespace: kube-system spec: replicas: 1 selector: matchLabels: k8s-app: metrics-server template: metadata: labels: k8s-app: metrics-server spec: serviceAccountName: metrics-server containers: - name: metrics-server # 国内阿里云镜像不用翻墙 image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.8.0 args: - --cert-dir/tmp - --secure-port4443 - --kubelet-preferred-address-typesInternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution15s # 关键跳过 kubelet 证书验证解决 x509 错误 - --kubelet-insecure-tls ports: - containerPort: 4443 name: https protocol: TCP resources: requests: cpu: 100m memory: 200Mi limits: cpu: 500m memory: 512Mi readinessProbe: httpGet: path: /healthz port: https scheme: HTTPS initialDelaySeconds: 30 periodSeconds: 10 livenessProbe: httpGet: path: /healthz port: https scheme: HTTPS initialDelaySeconds: 60 periodSeconds: 30 --- apiVersion: v1 kind: Service metadata: name: metrics-server namespace: kube-system labels: k8s-app: metrics-server spec: selector: k8s-app: metrics-server ports: - port: 443 targetPort: 44432直接部署一条命令bash 运行kubectl apply -f metrics-server.yamlkubectl uncordon node-01 node-02 node-03 kubectl rollout restart deployment metrics-server -n kube-system4验证是否成功bash 运行# 看 Pod 是否 Running kubectl get pods -n kube-system | grep metrics-server # 看节点/ Pod 监控 kubectl top nodes kubectl top pods -A能看到 CPU / 内存数据就 OK。4说明为什么这么改镜像换成阿里云registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.8.0国内直接拉取不会超时。加--kubelet-insecure-tls跳过 kubelet 自签名证书验证避免x509报错测试 / 内网集群安全可用。不需要每个节点装只在集群部署一个 Deployment自动跑在某个 master 上一个实例监控所有节点。5如果上面不行看如下步骤a. 先把现在坏掉的 metrics-server 删掉bash 运行kubectl delete -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml # 确认 metrics-server pod 消失 kubectl get pods -n kube-system | grep metrics-serverb. 下载官方 components.yaml 并改两点国内必做bash 运行# 下载 wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.8.0/components.yaml # ① 替换镜像为阿里云解决拉取不到 sed -i s|registry.k8s.io/metrics-server/metrics-server|registry.aliyuncs.com/google_containers/metrics-server|g components.yaml # ② 追加 --kubelet-insecure-tls解决证书报错 CrashLoop sed -i /--kubelet-preferred-address-types/a \ - --kubelet-insecure-tls components.yamlc. 应用改好的配置bash 运行kubectl apply -f components.yamld. 等待 1–2 分钟检查状态bash 运行kubectl get pods -n kube-system | grep metrics-server正常会变成metrics-server-xxxx 1/1 Runningf. 验证关键bash 运行kubectl top nodes kubectl top pods -A能看到 CPU / 内存 数据就彻底好了。2. ingress-nginx 网关1先全量清理default 空间 Nginx 所有 Ingress 规则 彻底卸载 ingress-nginx 控制器a. 清理 default 命名空间 Nginx 应用你之前创建的bash 运行# 删除ingress路由规则 kubectl delete ingress nginx-ingress # 删除nginx部署与service kubectl delete deploy nginx kubectl delete svc nginx # 校验下面两条输出无nginx才算删干净 kubectl get deploy,svc,ingressb. 彻底卸载 ingress-nginx 控制器分 Helm/YAML 两种依次执行bash 运行# ① 先查helm安装记录 helm list -n ingress-nginx # 有release就卸载 helm uninstall ingress-nginx -n ingress-nginx --ignore-not-found # ② 删除命名空间清空ns内所有pod/svc/cm/sa kubectl delete ns ingress-nginx --ignore-not-found # ③ 清理集群级残留关键不然重装冲突 kubectl delete clusterrole ingress-nginx -ignore-not-found kubectl delete clusterrolebinding ingress-nginx -ignore-not-found kubectl delete validatingwebhookconfiguration ingress-nginx-admission -ignore-not-found # 删除ingressclass旧配置残留 kubectl delete ingressclass nginx --ignore-not-foundc. 校验清理完成bash 运行kubectl get ns |grep ingress kubectl get ingressclass # 无任何ingress-nginx、nginx相关资源即清理完毕2安装 ingress-nginxbash 运行# 如果能访问到国外镜像源可以直接用下面 kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/deploy.yaml #如果上面的不能用大多数没办法拉取国外镜像源的情况可以用下面方式 # 下载baremetal配置 wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/deploy.yaml # 全局镜像前缀替换DaoCloud镜像加速不用管版本、sha哈希 sed -i s|registry.k8s.io|m.daocloud.io/registry.k8s.io|g deploy.yaml # 开启hostNetwork: truebaremetal必加直接占用宿主机80/443 sed -i /dnsPolicy: ClusterFirst/a \ hostNetwork: true deploy.yaml kubectl apply -f deploy.yaml实时查看状态作为验证是否有效bash 运行kubectl get pod -n ingress-nginx -w✅ 你一定会看到NAME READY STATUS RESTARTS AGE ingress-nginx-xxxxxx-xxxxx 1/1 Running 0 10s正常顺序admission-create、patch 从 ImagePullBackOff → Pulling → Pulled → Completedcontroller 从 ContainerCreating → Running部署完发kubectl get pod -n ingress-nginx结果。3. 本地存储类自动 PVCbash 运行kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml kubectl patch storageclass local-path -p {metadata: {annotations:{storageclass.kubernetes.io/is-default-class:true}}}4. Dashboard 控制台1先清空所有错误资源bash 运行kubectl delete namespace kubernetes-dashboard2粘贴这个100% 可运行的官方完整 YAML国内镜像版bash 运行kubectl apply -f https://cdn.jsdelivr.net/gh/kubernetes/dashboardv2.7.0/aio/deploy/recommended.yaml3立刻替换镜像国内秒拉bash 运行kubectl set image -n kubernetes-dashboard deployment/kubernetes-dashboard kubernetes-dashboardregistry.aliyuncs.com/google_containers/dashboard:v2.7.0 kubectl set image -n kubernetes-dashboard deployment/dashboard-metrics-scraper dashboard-metrics-scraperregistry.aliyuncs.com/google_containers/metrics-scraper:v1.0.84改成 NodePort外部可访问bash 运行kubectl patch service kubernetes-dashboard -n kubernetes-dashboard -p {spec:{type:NodePort}}5现在查看状态必 Runningbash 运行kubectl get pods -n kubernetes-dashboard -w你会看到dashboard-metrics-scraper-xxx 1/1 Running kubernetes-dashboard-xxx 1/1 Running6查看访问端口bash 运行kubectl get svc -n kubernetes-dashboard你会看到类似443:32632/TCP浏览器打开https://10.10.6.29:326327获取登录 Tokenbash 运行kubectl create serviceaccount admin -n kube-system kubectl create clusterrolebinding admin --clusterrolecluster-admin --serviceaccountkube-system:admin kubectl create token admin -n kube-system复制 Token → 登录 →成功【硬件资源】3 台 × 8 核 64G →企业级生产完全够用Master 占用每台2 核 4G剩余每台6 核 60G可跑业务硬盘 1T → 足够存镜像、日志、数据【高可用方案】3 个节点 All-in-One 高可用 K8s高可用 etcd高可用 master3 台都能跑业务无单点故障企业生产标准架构八、安全关闭 K8s 集群标准顺序总顺序从业务 → 数据 → 控制面 → 节点排空节点驱逐 Pod关闭节点调度关闭 worker /master 节点最后关闭 etcd 所在节点3 台都关1. 逐行可直接复制的命令安全关机流程1在任意 master执行排空所有节点安全驱逐 Podbash 运行kubectl get nodes | awk /node-/ {print $1} | xargs -I {} kubectl drain {} --force --ignore-daemonsets --delete-emptydir-data2禁用调度防止关机时飘 Podbash 运行kubectl get nodes | awk /node-/ {print $1} | xargs -I {} kubectl cordon {}2. 关闭节点顺序非常重要正确关机顺序3 台先关 node2 → 再关 node3 → 最后关 node1master1因为 etcd 集群必须保留最后一台为 leader最后关闭第一个 master 最安全。3. 每台机器执行关机命令选一个bash 运行# 安全关机 systemctl poweroff # 或者 shutdown -h now4. 如果你只是想停止 K8s 服务不关机适合维护、排错所有节点都执行bash 运行systemctl stop kubelet systemctl stop containerd启动恢复bash 运行systemctl start containerd systemctl start kubelet5、如果你想彻底销毁集群重装用所有节点执行bash 运行kubeadm reset -f rm -rf /etc/kubernetes rm -rf /var/lib/kubelet rm -rf /var/lib/etcd rm -rf ~/.kube iptables -F iptables -t nat -F ip link delete cni0 ip link delete flannel.1 # 杀掉线程 pkill -f kube pkill -f etcd pkill -f apiserver pkill -f controller-manager pkill -f scheduler pkill -f containerd-shim pkill -f cni安全关机先排空 → 再禁用调度 → 先关 2、3 节点 → 最后关 1 节点停止服务先停 kubelet → 再停 containerd销毁集群kubeadm reset -f → 删目录 → 清网络九、常见问题1. swap 没关最常见bash 运行free -m看到 Swap 那行不是 0就是问题。 解决bash 运行swapoff -a sed -i /swap/s/^/#/ /etc/fstab2. containerd 没开 systemd cgroup你之前可能没生效bash 运行grep SystemdCgroup /etc/containerd/config.toml必须输出plaintextSystemdCgroup true如果是 false重新执行bash 运行containerd config default /etc/containerd/config.toml sed -i s/SystemdCgroup false/SystemdCgroup true/ /etc/containerd/config.toml systemctl daemon-reload systemctl restart containerd3. containerd 本身没启动 / 报错bash 运行systemctl status containerd journalctl -u containerd -n 20kubelet 依赖 containerdcontainerd 挂了kubelet 必挂。4. kubelet 配置文件缺失 / 损坏bash 运行ls -l /var/lib/kubelet/config.yaml ls -l /etc/kubernetes/kubelet.conf这两个文件必须存在否则 kubelet 起不来。5. 证书问题如果是 node 节点bash 运行ls -l /etc/kubernetes/pki/节点需要ca.crt和 kubelet 证书没证书连不上 master。6. Dashboard 网页看不到 kubernetes-dashboard 命名空间 Pod四步排查90% 是RBAC 绑定异常1先在命令行确认集群里确实有 Podbash 运行# 查看dashboard命名空间所有pod kubectl get pods -n kubernetes-dashboard有输出 Pod 真实存在纯页面权限问题无输出 没部署 dashboard 组件。2校验你的 dashboard-admin 账号权限核心a.测试账号能不能查看 podbash 运行# 模拟这个sa去查pod返回yes有权限no绑定失效 kubectl auth can-i list pods -n kubernetes-dashboard \ --assystem:serviceaccount:kubernetes-dashboard:dashboard-adminb.你之前报错dashboard-admin-binding已存在大概率绑定对象写错 / 失效删掉重建bash 运行# 删除旧绑定 kubectl delete clusterrolebinding dashboard-admin-binding # 重新正确绑定 kubectl create clusterrolebinding dashboard-admin-binding \ --clusterrolecluster-admin \ --serviceaccountkubernetes-dashboard:dashboard-admin绑定格式命名空间:ServiceAccount名称你是kubernetes-dashboard:dashboard-admin没错旧绑定异常直接删了重建最稳妥。c.退出网页重新生成新 Token 登录bash运行# 生成30天有效期token kubectl create token dashboard-admin -n kubernetes-dashboard --duration720h浏览器清空缓存 / 无痕打开 dashboard粘贴新 token 登录顶部下拉框手动选中kubernetes-dashboard命名空间即可看到 Pod7.查看 K8s 所有命名空间Namespace1查看所有 namespace最常用bash 运行kubectl get ns或者完整写法bash 运行kubectl get namespaces你会看到类似这样的结果NAME STATUS AGE default Active 45d kube-node-lease Active 45d kube-public Active 45d kube-system Active 45d kubernetes-dashboard Active 45d2每个 namespace 是干嘛的default默认命名空间你没指定时都在这里kube-systemK8s 系统组件kube-proxy、coredns、metrics-server、calicokube-public公共信息kube-node-lease节点心跳kubernetes-dashboard你刚装的 UI 面板3想查看某个命名空间下的 Pod例如看kube-system里的 Podbash 运行kubectl get pods -n kube-system看 Dashboardbash 运行kubectl get pods -n kubernetes-dashboardx. 如果还有问题就查看日志bash 运行journalctl -u kubelet -n 100 --no-pager根据日志报错信息看看怎么修。