在Lightsail上使用Zerotier构建Kubernetes集群

AWS Lightsail是一个非常适合个人使用的云服务平台,提供了简单易用的界面和相对较低的价格。但是,它的基础服务不像EC2那么完善,我没有找到一个方便的AWS提供的跨区域组网方法。Zerotier则是一个虚拟局域网解决方案,可以让不同网络中的设备像在同一局域网中一样进行通信。结合这两者,我们可以轻松地在Lightsail上搭建一个Kubernetes集群。

使用kubeadm搭建

kubeadm对内存、磁盘空间要求较k3s比稍高,内存少于2G无法部署

1. 创建Lightsail实例

可以在AWS Lightsail每个区域上都创建一个2C2G的实例,包含在90天免费套餐中

2. 加入Zerotier网络

在每个实例上安装Zerotier,使用以下命令:

1
curl -s https://install.zerotier.com | sudo bash

然后加入Zerotier网络,使用以下命令:

1
sudo zerotier-cli join <your_network_id>

在Zerotier控制台中,授权你的实例加入网络。

使用ifconfig命令可以查看Zerotier创建的虚拟网卡和IP地址

1
2
3
4
5
6
7
8
9
10
ifconfig

ztksexlnsa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2800
inet 10.147.18.13 netmask 255.255.255.0 broadcast 10.147.18.255
inet6 fe80::acb6:3aff:fed5:96b6 prefixlen 64 scopeid 0x20<link>
ether ae:b6:3a:d5:96:b6 txqueuelen 1000 (Ethernet)
RX packets 9150 bytes 1161718 (1.1 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9624 bytes 5133672 (4.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

尝试ping 另一台实例Zerotier IP地址

1
2
3
4
5
6
7
8
9
10
admin@worker:~$ ping 10.147.18.13
PING 10.147.18.13 (10.147.18.13) 56(84) bytes of data.
64 bytes from 10.147.18.13: icmp_seq=1 ttl=64 time=139 ms
64 bytes from 10.147.18.13: icmp_seq=2 ttl=64 time=139 ms
64 bytes from 10.147.18.13: icmp_seq=3 ttl=64 time=138 ms
64 bytes from 10.147.18.13: icmp_seq=4 ttl=64 time=140 ms
^C
--- 10.147.18.13 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 138.482/139.006/140.268/0.737 ms

可以看到能ping通,但是美国实例ping德国实例延迟比较高

3. 安装Kubernetes

为了方便,我使用kubeadm的1.23.3版本来安装Kubernetes集群,因为它可以兼容Docker。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
sudo -i
curl https://get.docker.com | bash

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward=1
EOF

sudo swapoff -a
sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab

nano /etc/hostname # 修改为你的主机名(master或worker)
nano /etc/hosts # master 127.0.0.1

reboot

安装Kubernetes

1
2
3
4
5
6
7
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
sudo apt update
sudo apt install -y kubeadm=1.23.3-00 kubelet=1.23.3-00 kubectl=1.23.3-00
sudo apt-mark hold kubeadm kubelet kubectl

4. 初始化Kubernetes集群

在master节点上初始化Kubernetes集群

1
kubeadm init --pod-network-cidr=10.10.0.0/16 --apiserver-advertise-address=<your_master_ip> # 替换为你的Zerotier主节点IP

按照提示拷贝kubectl配置文件

在worker节点上加入Kubernetes集群

1
kubeadm join <your_master_ip>:6443 --token <your_token> --discovery-token-ca-cert-hash sha256:<your_hash>

查看节点状态

1
kubectl get nodes

发现节点状态为NotReady,原因是没有安装网络插件

5. 安装网络插件

安装Flannel网络插件

1
2
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
nano kube-flannel.yml

有两个地方要修改:

  1. 修改net-conf.json Network为我们的pod-network-cidr,这里是10.10.0.0/16
  2. 选定args参数中Zerotier的网卡,如上文的ztksexlnsa
1
kubectl apply -f kube-flannel.yml

查看节点状态

1
kubectl get nodes -o wide

如果节点状态为Ready,说明网络插件安装成功

6. 修改Node连接IP

kubectl get nodes -o wide中查看节点的IP地址,发现并不是Zerotier的IP地址,而是AWS的NAT网卡地址,例如172.26.2.153

需要修改为Zerotier的IP地址,不然master和worker之间无法通信

1
sudo nano /etc/systemd/system/kubelet.service.d/10-kubeadm.conf 

看到EnvironmentFile=-/etc/default/kubelet
那么就去修改这个文件

1
sudo nano /etc/default/kubelet

在文件中指定为你的Zerotier IP地址

1
KUBELET_EXTRA_ARGS="--node-ip=10.147.18.244"

然后重启kubelet

1
2
sudo systemctl daemon-reload
sudo systemctl restart kubelet

最好两台机器都重启一下,然后重新部署Flannel网络插件

1
2
kubectl delete -f kube-flannel.yml
kubectl apply -f kube-flannel.yml

7. 部署nginx测试连通性

可以选择这个yaml作为测试

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: v1
kind: ConfigMap
metadata:
name: ngx-conf
data:
default.conf: |
server {
listen 80;
location / {
default_type text/plain;
return 200 'srv : $server_addr:$server_port\nhost: $hostname\nuri : $request_method $host $request_uri\n';
}
}


---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ngx-dep
spec:
replicas: 2
selector:
matchLabels:
app: ngx-dep
template:
metadata:
labels:
app: ngx-dep
spec:
volumes:
- name: ngx-conf-vol
configMap:
name: ngx-conf
containers:
- image: nginx:alpine
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: /etc/nginx/conf.d
name: ngx-conf-vol

---
apiVersion: v1
kind: Service
metadata:
name: ngx-svc
spec:
selector:
app: ngx-dep
ports:
- port: 80
targetPort: 80
protocol: TCP
1
kubectl apply -f nginx.yaml

然后查看pod和service的IP地址

1
2
kubectl get pods -o wide
kubectl get svc -o wide

如果在master和worker节点上都可以使用curl命令访问service或pod的IP地址,则说明集群网络配置正确了


使用 K3s 的轻量级方案

如果你的服务器配置较低(如 2C1G),可以使用 K3s 来替代完整的 Kubernetes。K3s 是 Rancher 开发的轻量级 Kubernetes 发行版,内存占用仅约 512MB,非常适合资源受限的环境。

1. 创建 Lightsail 实例

创建 2C1G 的实例,同样包含在免费套餐中。

2. 加入 Zerotier 网络

与上文相同,在每个实例上安装 Zerotier:

1
2
curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join <your_network_id>

在网页面板授权加入

记录每个节点的 Zerotier IP 地址和网卡名称:

1
ip addr show | grep zt

假设 Master 节点的 Zerotier IP 为 10.147.18.6,网卡名称为 ztksexlnsa

3. 安装 K3s Master 节点

在 Master 节点上执行:

1
2
3
4
5
curl -sfL https://get.k3s.io | sh -s - server \
--flannel-iface=ztksexlnsa \
--node-ip=10.147.18.6 \
--advertise-address=10.147.18.6 \
--node-external-ip=10.147.18.6

参数说明:

  • --flannel-iface:指定 Flannel 使用的网卡为 Zerotier 网卡
  • --node-ip:指定节点 IP 为 Zerotier IP
  • --advertise-address:API Server 广播地址
  • --node-external-ip:节点外部 IP

安装完成后,获取 node token:

1
sudo cat /var/lib/rancher/k3s/server/node-token

4. 安装 K3s Worker 节点

修改hostname和hosts

1
2
nano /etc/hostname
nano /etc/hosts

在 Worker 节点上执行(替换为你的 Master IP、Token 和网卡名称):

1
2
3
4
curl -sfL https://get.k3s.io | K3S_URL=https://10.147.18.6:6443 K3S_TOKEN=<your_token> sh -s - agent \
--flannel-iface=ztksexlnsa \
--node-ip=10.147.18.20 \
--node-external-ip=10.147.18.20

如果在 Master 节点使用了防火墙,要允许 Worker 的 IP 访问 6443 端口

5. 验证集群状态

在 Master 节点上查看节点状态:

1
sudo kubectl get nodes -o wide

输出示例:

1
2
3
4
5
6
NAME         STATUS     ROLES                  AGE   VERSION        INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                         KERNEL-VERSION         CONTAINER-RUNTIME
master Ready control-plane,master 12d v1.33.6+k3s1 10.147.18.6 10.147.18.6 Debian GNU/Linux 12 (bookworm) 6.1.0-28-amd64 containerd://2.1.5-k3s1.33
worker1 Ready <none> 12d v1.33.6+k3s1 10.147.18.253 10.147.18.253 Debian GNU/Linux 12 (bookworm) 6.1.0-39-amd64 containerd://2.1.5-k3s1.33
worker2 Ready <none> 12d v1.33.6+k3s1 10.147.18.20 10.147.18.20 Debian GNU/Linux 12 (bookworm) 6.1.0-40-cloud-amd64 containerd://2.1.5-k3s1.33
worker3 Ready <none> 20h v1.33.6+k3s1 10.147.18.243 10.147.18.243 Debian GNU/Linux 12 (bookworm) 6.1.0-41-cloud-amd64 containerd://2.1.5-k3s1.33
worker4 Ready <none> 20h v1.33.6+k3s1 10.147.18.241 10.147.18.241 Debian GNU/Linux 12 (bookworm) 6.1.0-41-cloud-amd64 containerd://2.1.5-k3s1.33

可以看到节点 IP 已正确显示为 Zerotier IP。

6. 配置 kubectl(可选)

如果想在非 root 用户下使用 kubectl:

1
2
3
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

7. 部署测试应用

使用与上文相同的 nginx 测试 yaml 验证集群网络:

1
2
3
kubectl apply -f nginx.yaml
kubectl get pods -o wide
kubectl get svc

在各节点上测试访问 Service IP,确认跨节点网络连通性。

k3s 流量放大问题诊断

安装之后,在哪吒监控中可以看到会出现间歇性的大上传流量和CPU占用。这个问题排查了很久,网络上也没有找到答案,询问了AI,进行抓包调查,发现很可能是因为 Zerotier 和 k3s 的 flannel 网络插件 出现了路由重复。

修复命令

1
nano /var/lib/zerotier-one/local.conf

添加:

1
2
3
{
"interfacePrefixBlacklist": ["cni", "flannel"]
}

这个配置是为了让zerotier忽略cni、flannel开头的接口,防止出现网络循环

重启zerotier

1
systemctl restart zerotier-one

以下是网络环境诊断的脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#!/bin/bash

echo "=== Node Info ==="
hostname
ip addr show ztksexlnsa | grep inet

echo -e "\n=== Route Table ==="
ip route show

echo -e "\n=== Pod Network Routes ==="
ip route show | grep 10.42

echo -e "\n=== ZeroTier Routes ==="
ip route show dev ztksexlnsa

echo -e "\n=== Flannel Routes ==="
ip route show dev flannel.1 2>/dev/null || echo "No flannel.1 interface"

echo -e "\n=== Flannel VXLAN Details ==="
ip -d link show flannel.1 2>/dev/null || echo "No flannel.1 interface"

echo -e "\n=== Flannel FDB ==="
bridge fdb show dev flannel.1 2>/dev/null || echo "No flannel.1 interface"

echo -e "\n=== Route Test to Other Node ==="
# 替换为你的另一个节点的 IP
OTHER_NODE_IP="10.147.18.6"
ip route get $OTHER_NODE_IP

echo -e "\n=== Route Test to Other Node Pod Network ==="
# 替换为另一个节点的 Pod 网段
OTHER_POD_CIDR="10.42.1.0"
ip route get $OTHER_POD_CIDR

echo -e "\n=== Policy Routing ==="
ip rule show

echo -e "\n=== VXLAN Packet Sample (5 seconds) ==="
timeout 5 tcpdump -i ztksexlnsa -nn -c 10 'udp port 8472' 2>/dev/null | head -20

诊断结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
=== Node Info ===
worker4
inet 10.147.18.241/24 brd 10.147.18.255 scope global ztksexlnsa
inet6 fe80::ac54:26ff:fee9:de87/64 scope link

=== Route Table ===
default via 172.26.0.1 dev ens5 proto dhcp src 172.26.10.39 metric 100
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 dev cni0 proto kernel scope link src 10.42.2.1
10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink
10.42.5.0/24 via 10.42.5.0 dev flannel.1 onlink
10.147.18.0/24 dev ztksexlnsa proto kernel scope link src 10.147.18.241
172.26.0.0/20 dev ens5 proto kernel scope link src 172.26.10.39 metric 100
172.26.0.1 dev ens5 proto dhcp scope link src 172.26.10.39 metric 100
172.26.0.2 dev ens5 proto dhcp scope link src 172.26.10.39 metric 100

=== Pod Network Routes ===
10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 dev cni0 proto kernel scope link src 10.42.2.1
10.42.3.0/24 via 10.42.3.0 dev flannel.1 onlink
10.42.5.0/24 via 10.42.5.0 dev flannel.1 onlink

=== ZeroTier Routes ===
10.147.18.0/24 proto kernel scope link src 10.147.18.241

=== Flannel Routes ===
10.42.0.0/24 via 10.42.0.0 onlink
10.42.1.0/24 via 10.42.1.0 onlink
10.42.3.0/24 via 10.42.3.0 onlink
10.42.5.0/24 via 10.42.5.0 onlink

=== Flannel VXLAN Details ===
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2750 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 96:33:7a:f1:c8:2c brd ff:ff:ff:ff:ff:ff promiscuity 0 allmulti 0 minmtu 68 maxmtu 65535
vxlan id 1 local 10.147.18.241 dev ztksexlnsa srcport 0 0 dstport 8472 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536

=== Flannel FDB ===
c6:99:11:2f:00:2d dst 10.147.18.253 self permanent
02:c0:d9:de:ba:f2 dst 10.147.18.20 self permanent
de:a3:ef:8a:73:15 dst 10.147.18.243 self permanent
aa:b6:cd:bb:f2:b3 dst 10.147.18.6 self permanent

=== Route Test to Other Node ===
10.147.18.6 dev ztksexlnsa src 10.147.18.241 uid 0
cache

=== Route Test to Other Node Pod Network ===
10.42.1.0 via 10.42.1.0 dev flannel.1 src 10.42.2.0 uid 0
cache expires 519sec mtu 1230

=== Policy Routing ===
0: from all lookup local
32766: from all lookup main
32767: from all lookup default

=== VXLAN Packet Sample (5 seconds) ===
17:39:17.012286 IP 10.147.18.241.49401 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 456
17:39:17.012298 IP 10.147.18.241.49401 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 524
17:39:17.012309 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 1316
17:39:17.012312 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0 > 10.42.1.0: ip-proto-17
17:39:17.012322 IP 10.147.18.241.49401 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 572
17:39:17.012333 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 1316
17:39:17.012336 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0 > 10.42.1.0: ip-proto-17
17:39:17.012346 IP 10.147.18.241.49401 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 176
17:39:17.012359 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0.9993 > 10.42.1.0.9993: UDP, length 1316
17:39:17.012362 IP 10.147.18.241.45208 > 10.147.18.243.8472: OTV, flags [I] (0x08), overlay 0, instance 1
IP 10.42.2.0 > 10.42.1.0: ip-proto-17