Play with sunbeam again (by quqi99)

发布时间:2026/6/19 22:58:34

Play with sunbeam again (by quqi99) 作者张华 发表于2026-06-05版权声明可以任意转载转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明(http://blog.csdn.net/quqi99)问题之前玩过sunbeam, 都命令都过时了Using sunbeam to deploy openstack - https://zhhuabj.blog.csdn.net/article/details/133840856并且这次是将sunbean部署在公司测试机器上但全是内网环境需要通过proXX才能访问外网。诀窍是对于juju controller IP必须设置no_proxyno_proxy不仅得在/etc/environment里设置(设置了之后得重新登录ssh让它生效no_proxy还得设置在snap中: sudo snap set system proxy.no-proxy$NO_PROXY sudo snap get system proxyno_proxy更得设置在juju中juju model-config -m localhost-localhost:controller no-proxy$NO_PROXY juju-no-proxy$NO_PROXY apt-no-proxy$NO_PROXY juju model-config -m localhost-localhost:openstack-machines no-proxy$NO_PROXY juju-no-proxy$NO_PROXY apt-no-proxy$NO_PROXY juju model-config -m localhost-localhost:controller | grep -i no-proxy juju model-config -m localhost-localhost:openstack-machines | grep -i no-proxy步骤下面是步骤#We dont have permission to create flavor with root-disk100G mem16G cores8, so were using a volume disk instead (root-disk-sourcevolume) #juju add-model sunbeam juju add-machine --base ubuntu24.04 --constraints root-disk100G mem16G cores8 #ssh-keygen -t rsa -N -f ~/.ssh/id_rsa openstack keypair create --public-key ~/.ssh/id_rsa.pub mykey openstack server create --image auto-sync/ubuntu-noble-24.04-amd64-server-20260518-disk1.img --flavor shared.xlarge --key-name mykey --network net_stg-reproducer-zhhuabj-ps7-psd-extra --boot-from-volume 100 sunbeam SUNBEAM_VM_IP10.159.26.62 ssh -i ~/.ssh/id_rsa ubuntu$SUNBEAM_VM_IP -v openstack volume create --size 10 --type Ceph_NVMe ceph1 openstack server add volume sunbeam ceph1 openstack volume create --size 10 --type Ceph_NVMe ceph2 openstack server add volume sunbeam ceph2 openstack volume create --size 10 --type Ceph_NVMe ceph3 openstack server add volume sunbeam ceph3 #sudo parted /dev/sdb mklabel gpt sudo parted /dev/sdb mkpart primary ext4 0% 100% sudo mkfs.ext4 /dev/sdb1 #UUID$(sudo blkid -s UUID -o value /dev/sdb1) #sudo mkdir -p /data/sunbeam sudo chown -R $USER:$USER /data/sunbeam #echo UUID$UUID /data/sunbeam ext4 defaults 0 0 | sudo tee -a /etc/fstab #sudo mount -a df -h juju ssh 0 #reset the env if its necessary sudo snap remove --purge openstack sudo snap remove --purge juju sudo snap remove --purge juju-db sudo snap remove --purge kubectl sudo /usr/sbin/remove-juju-services sudo rm -rf /var/lib/juju rm -rf ~/.local/share/juju rm -rf ~/snap/juju/ rm -rf ~/snap/openstack rm -rf ~/snap/openstack-hypervisor rm -rf ~/snap/microstack/ rm -rf ~/snap/microk8s/ sudo snap remove --purge vault sudo snap remove --purge microk8s sudo snap remove --purge openstack-hypervisor rm -rf $USER/.local/share/openstack/deployments.yaml #its best to restart, otherwise some calico NICs and namespaces may not be able to access sudo init 6 #https://canonical-openstack.readthedocs-hosted.com/en/latest/tutorial/get-started-with-openstack/ #Configuring passwordless access to the sudo command for all terminal commands for the currently logged in user #echo $USER NOPASSWORD(ALL) ALL | sudo tee /etc/sudoers.d/nopasswd sudo chmod 440 /etc/sudoers.d/nopasswd #这里设置no_proxy代理异常关键假设controller ip是10.250.150.28/24, 那no_proxy里一定要添加10.250.150.0/24(设置对了之后使用juju show-user --debug能看到proxydirect字眼 #controller ip只有在运行了sunbeam prepare-node-script --bootstrap | bash -x newgrp snap_daemon之后才会有需之后再修改no_proxy但为简便第一次就加了10.250.150.0/24 #NO_PROXY也不应该随便加入别的如若加入了cloud-images.ubuntu.com, 就会发生no matching image found NO_PROXYlocalhost,127.0.0.1,::1,10.149.95.128/25,172.24.0.0/24,172.22.0.0/24,172.20.0.0/24,172.26.0.0/24,172.28.0.0/24,10.159.26.128/25,10.159.25.128/25,10.159.26.0/25,10.250.150.0/24; export HTTP_PROXYhttp://egress.ps7.internal:3128 HTTPS_PROXYhttp://egress.ps7.internal:3128 NO_PROXY$NO_PROXY http_proxyhttp://egress.ps7.internal:3128 https_proxyhttp://egress.ps7.internal:3128 no_proxy$NO_PROXY JUJU_DATA$HOME/.local/share/juju; juju show-user --debug | sed -n 1,40p #ERROR LOG: api dial attempt failed: urlwss://252.46.0.1:17070/api addresswss://252.46.0.1:17070 ip252.46.0.1:17070 attempt1 proxyhttp://egress.ps7.internal:3128 elapsed2ms errForbidden #RIGHT LOG: api dial attempt succeeded: urlwss://10.159.26.151:17070/api ip10.159.26.151:17070 attempt1 proxydirect elapsed6ms #make sure you dont add cloud-images.ubuntu.com in no_proxy curl https://cloud-images.ubuntu.com/releases/ -o /dev/null sudo snap install openstack --channel 2024.1/stable #先将环境变量写到 /etc/environment echo HTTP_PROXYhttp://egress.ps7.internal:3128 |sudo tee -a /etc/environment echo HTTPS_PROXYhttp://egress.ps7.internal:3128 |sudo tee -a /etc/environment echo NO_PROXY$NO_PROXY | sudo tee -a /etc/environment #退出ssh再重新登录ssh让它生效 env |grep -i proxy rm -rf ~/.local/share/juju/controllers.yaml sudo remove-juju-services sunbeam prepare-node-script --bootstrap | bash -x newgrp snap_daemon #lxc exec juju-03be01-0 -- tail -f /var/log/syslog #但上步创建的juju controller的IP是10.9.136.162不是我们之前设置的NO_PROXY10.250.150.0/24, 所以我们得添加10.9.136.0/24, 之后再重新登录ssh让它生效, 此时juju status将不在hang在那 juju status juju show-user --debug | sed -n 1,40p #接着运行bootstrap newgrp snap_daemon sunbeam cluster bootstrap --accept-defaults --role control,compute,storage tail -f ~/snap/openstack/common/logs/sunbeam* tail -f ~/snap/openstack/common/etc/*/deploy-sunbeam-machine/terraform-apply-*.log juju machines ubuntusunbeam:~$ juju machines Machine State Address Inst id Base AZ Message 0 started 10.9.136.162 juju-03be01-0 ubuntu24.04 sunbeam Running ubuntusunbeam:~$ juju models Controller: localhost-localhost Model Cloud/Region Type Status Machines Cores Units Access Last connection controller* localhost/localhost lxd available 1 - 1 admin just now openstack-machines close-swine/default manual available 1 8 - admin 2 minutes ago #在设置了/etc/envirnment之后(并重新登录ssh)运行了上步的bootstrap会自动设置juju proxy, 但snap proxy还是差一个no_proxy ubuntusunbeam:~$ juju model-config -m localhost-localhost:controller | grep -i no-proxy apt-no-proxy default juju-no-proxy controller localhost,127.0.0.1,::1,10.149.95.128/25,172.24.0.0/24,172.22.0.0/24,172.20.0.0/24,172.26.0.0/24,172.28.0.0/24,10.159.26.128/25,10.159.25.128/25,10.159.26.0/25,10.250.150.0/24,10.159.26.62,10.9.136.1/24 no-proxy controller localhost,127.0.0.1,::1,10.149.95.128/25,172.24.0.0/24,172.22.0.0/24,172.20.0.0/24,172.26.0.0/24,172.28.0.0/24,10.159.26.128/25,10.159.25.128/25,10.159.26.0/25,10.250.150.0/24,10.159.26.62,10.9.136.1/24 ubuntusunbeam:~$ juju model-config -m localhost-localhost:openstack-machines | grep -i no-proxy apt-no-proxy default juju-no-proxy model 10.149.95.128/25,10.9.136.0/24,172.22.0.0/24,10.159.26.0/25,10.152.183.0/24,172.26.0.0/24,172.20.0.0/24,10.159.25.128/25,172.24.0.0/24,10.159.26.128/25,172.28.0.0/24,localhost,127.0.0.1,10.1.0.0/16,::1,.svc.cluster.local,.svc no-proxy default 127.0.0.1,localhost,::1 ubuntusunbeam:~$ sudo snap get system proxy Key Value proxy.http http://egress.ps7.internal:3128 proxy.https http://egress.ps7.internal:3128 proxy.store #也需设置snap no_proxy不设置会报ERROR unable to contact api server after 0 attempts: unknown error in bootstrap api connect: unable to connect to API: Forbidden sudo snap set system proxy.no-proxy$NO_PROXY sudo snap get system proxy sunbeam cluster bootstrap --accept-defaults --role control,compute,storage tail -f /home/ubuntu/snap/openstack/common/logs/* sudo k8s kubectl get pods --all-namespaces alias kubectlsudo /snap/k8s/current/bin/kubectl source (kubectl completion bash) kubectl completion bash |sudo tee /etc/bash_completion.d/kubectl sudo /snap/k8s/current/bin/kubectl get pods --all-namespaces sudo /snap/k8s/current/bin/ctr namespaces list sudo /snap/k8s/current/bin/ctr -n k8s.io images ls sudo /snap/k8s/current/bin/ctr version #现在改成直接用NO_PROXYlocalhost,127.0.0.1,10.0.0.0/8,172.16.0.0/16 , 然后报下列错, 再设置juju proxy就问题解决了 #subprocess.CalledProcessError: Command [/snap/openstack/1005/juju/bin/juju, migrate, localhost-localhost:admin/openstack-machines, sunbeam-controller] returned non-zero exit status 1 juju model-config -m localhost-localhost:controller no-proxy$NO_PROXY juju-no-proxy$NO_PROXY apt-no-proxy$NO_PROXY juju model-config -m localhost-localhost:openstack-machines no-proxy$NO_PROXY juju-no-proxy$NO_PROXY apt-no-proxy$NO_PROXY juju model-config -m localhost-localhost:controller | grep -i no-proxy juju model-config -m localhost-localhost:openstack-machines | grep -i no-proxy sunbeam cluster bootstrap --accept-defaults --role control,compute,storage sudo /snap/k8s/current/bin/ctr -n k8s.io images ls sunbeam utils juju-login sunbeam configure --accept-defaults --openrc demo-openrc sunbeam launch ubuntu --name test ssh -i /home/ubuntu/snap/openstack/1005/sunbeam ubuntu172.16.2.44 sudo microceph.ceph status for l in a b c; do loop_file$(sudo mktemp -p /mnt XXXX.img) sudo truncate -s 1G ${loop_file} loop_dev$(sudo losetup --show -f ${loop_file}) # the block-devices plug doesnt allow accessing /dev/loopX # devices so we make those same devices available under alternate # names (/dev/sdiY) minor${loop_dev##/dev/loop} sudo mknod -m 0660 /dev/sdi${l} b 7 ${minor} sudo microceph disk add --wipe /dev/sdi${l} done sudo microceph disk list国内机器如何部署 sunbeam设想国内家中机器由于特色跟国外公司机器一样也都属于内网机器所以上面的办法一样适用。只是注意一点国内做实验不要直接在物理机上做(因为有ipv6), 弄一个only ipv4的lxd容器做更好。费劲 略。curl -4 -k -I --max-time 15 https://cloud-images.ubuntu.com/releases/streams/v1/index.json curl -6 -k -I --max-time 15 https://cloud-images.ubuntu.com/releases/streams/v1/index.json curl -4 -k -I -x http://192.168.99.179:3128 --max-time 15 https://cloud-images.ubuntu.com/releases/streams/v1/index.json curl -6 -k -I -x http://[2409:8a00:7881:20c0:a236:bcff:fe58:2bff]:3128 --max-time 15 https://cloud-images.ubuntu.com/releases/streams/v1/index.jsonsunbeam源码sunbeam的总入口是snap-openstack/sunbeam-python/sunbeam/main.py 安装sunbeam的步骤有’sunbeam prepare-node-script --bootstrap’ and ‘sunbeam cluster bootstrap --accept-defaults --role control,compute,storage’.1, 当运行bootstrap时会调用注册的provider-specific commands (provider有local与maas)2, sunbeam如何将高层命令转成charm部署呢先初始化 clusterd再准备juju controller/model最后用多个 Terraform plan 部署 MicroOVN/MicroCeph/OpenStack/hypervisor charms。3, sunbeam bootstrap大致做这些事 1做prefilght checks, 确认juju,lxd,permission,hostname等基础条件 2问或读取managment CIDR 3)初始化sunbeam own cluster DB也就是clusterd 4)准备juju controller, model, spaces, k8s cloud 5) 运行多个terraform plan: microovn-plan, microceph-plan, cinder-volume-plan, openstack-plan, hypervisor-plan 6)最后写入 bootstrapped状态4, Sunbeam 自己的步骤执行器很简单run_plan() 按顺序跑每个 BaseStep失败就抛 ClickException见 common.py:309-357。所以 Sunbeam 的“高级命令”本质上是很多小 step 串起来。5, clusterd 是 Sunbeam 的本地/集群状态服务。Python client 支持 unix socket 和 HTTPS mTLS。它保存 node、role、Juju users、manifest、Terraform state/lock 等client API 里能看到 /1.0/nodes、/1.0/config、/1.0/manifests、/1.0/terraformstate6, Terraform 这一层是 Sunbeam 部署 charms 的关键。TerraformHelper.backend_config() 会把 clusterd 注入成 Terraform HTTP backend例如 /1.0/terraformstate/{plan}、/1.0/terraformlock/{plan}见 terraform.py:111-143。然后 apply() 调用 snap 里的 Terraform 二进制执行 terraform apply -auto-approve -json. Terraform plan 的形状可以看 hypervisor 例子它用 juju_application 部署 openstack-hypervisor再用 juju_integration 接上 AMQP、Keystone、CA、OVN、Nova 等关系#Note: Restarting ovn-controller on computes to pick up the new certificate wipes thebr-exeth1port mapping (lp #2147582).#Operators must wait for the reapply-patches cron to restore it, or manually runovs-vsctl add-port br-ex eth1on every compute.一个bugself-signed-certificates charm is failing to auto-renew 90-day tls leaf certificates for most applications prior to their expiry. 有些像traefik就成功了但有一些像ovn-central, ovn-replay, neutron, openstack-hypervisor就失败了一直用它们原来的证书直到过期. 这导致数据面outage和后续这些服务tls重连时的崩愦. one workaround是针对失败的app运行juju remove-relation certificate-authority:certificates app:certificates juju integrate certificate-authority:certificates app:certificates juju remove-relation -m admin/openstack-machines openstack-hypervisor:certificates certificate-authority juju integrate -m admin/openstack-machines openstack-hypervisor:certificates controller00.maas/openstack.certificate-authority代码的关键链路是这样的1, self-signed-certificates 作为 provider只处理 “outstanding CSR”。它在 _configure() 最后调用 _process_outstanding_certificate_requests()见 charm.py:301-320 和 charm.py:414-426。2, provider lib 对 outstanding 的定义是“这个 CSR 还没有对应已签发证书”。如果已签证书里的 CSR 和当前 CSR 相同并且证书匹配 CSR就认为不是 outstanding见 _tls_certificates.py:3407-3440。3, 所以 provider 不会因为 leaf cert 快过期就主动给同一个 CSR 再签一次。必须由 requirer 侧删除/替换旧 CSR产生新的“待处理 CSR”。4, Sunbeam charms 通过 TlsCertificatesHandler 创建 TLSCertificatesRequiresV4见 relation_handlers.py:1010-1084。这个 handler 的 update_relation_data() 只是调用 self.certificates.sync()。5, Sunbeam charms vendored 的 requirer lib 中_configure() 做的是生成 private key、清理 CSR、发送 CSR、查找可用证书它没有临期扫描见 tls_certificates.py:1732-1750。它真正的自动续期入口是 Juju secret_expired 事件_on_secret_expired() 读取 secret 里的 CSR然后 _renew_certificate_request(csr) 删除旧 CSR 并重新发送见 tls_certificates.py:1770-1820。6, Traefik 的 vendored lib 不一样它的 _configure() 最后还调用 _renew_expiring_certificates()见 tls_certificates.py:1766-1784。这个函数会在证书接近过期但还没过期时主动触发续期见 tls_certificates.py:2267-2294。最可能的故障链是在测试环境里可以做一些非破坏性的观察:juju run -m openstack certificate-authority/0 get-issued-certificates juju debug-log -m openstack --include certificate-authority --include neutron --include ovn-central juju debug-log -m openstack-machines --include openstack-hypervisor20260618 - 更简洁的步骤关键是no_proxy设置:NO_PROXY needs the following:Initial IPs copied from your bastion: 127.0.0.1,localhost,::1,10.159.19.128/25,10.159.20.128/25,10.159.20.0/25Management CIDR and MetalLB/load-balancer CIDR mentioned in: 10.121.193.0/24,10.20.21.0/27The juju network (controller’s IP can be viewed by runninglxc list: 10.187.221.0/24The sunbeam management network: 172.16.1.0/241, We do not have permission to create a flavor with ‘root-disk100G mem16G cores8’, so we use a volume-backed root disk instead. openstack server create --image auto-sync/ubuntu-noble-24.04-amd64-server-20260518-disk1.img --flavor shared.xlarge --key-name mykey --network net_stg-reproducer-zhhuabj-ps7-psd-extra --boot-from-volume 100 sunbeam for i in 1 2 3; do openstack volume create --size 10 --type Ceph_NVMe ceph$i openstack server add volume sunbeam ceph$i; done 2, Set the following env variables and then log in again so they take effect. echo HTTP_PROXYhttp://egress.ps7.internal:3128 |sudo tee -a /etc/environment echo HTTPS_PROXYhttp://egress.ps7.internal:3128 |sudo tee -a /etc/environment echo NO_PROXYlocalhost,127.0.0.1,10.0.0.0/8,172.16.0.0/16 | sudo tee -a /etc/environment NOTE: I used localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/16, Munir used the following no_proxy, both are fine. NO_PROXY10.121.193.0/24,10.1.0.0/16,10.159.19.128/25,10.159.20.0/25,10.187.221.0/24,localhost,::1,127.0.0.1,10.20.21.0/27,10.152.183.0/24,10.159.20.128/25,172.16.1.0/24 3, Follow the official guide to start sunbeam installation - https://canonical-openstack.readthedocs-hosted.com/en/latest/tutorial/get-started-with-openstack/ sudo snap install openstack --channel 2024.1/stable sunbeam prepare-node-script --bootstrap | bash -x newgrp snap_daemon sunbeam cluster bootstrap --accept-defaults --role control,compute,storage 4, Fill in the missing no_proxy in snap ubuntusunbeam:~$ sudo snap get system proxy Key Value proxy.http http://egress.ps7.internal:3128 proxy.https http://egress.ps7.internal:3128 proxy.store ubuntusunbeam:~$ sudo snap set system proxy.no-proxy$NO_PROXY ubuntusunbeam:~$ sudo snap get system proxy Key Value proxy.http http://egress.ps7.internal:3128 proxy.https http://egress.ps7.internal:3128 proxy.no-proxy localhost,127.0.0.1,10.0.0.0/8,172.16.0.0/16 proxy.store 5, Bootstrap the cloud sunbeam cluster bootstrap --accept-defaults --role control,compute,storage 6, Configure the cloud sunbeam configure --accept-defaults --openrc demo-openrc

相关新闻