昨晚加班到10点····搞这个破玩意儿
言归正传
上一篇在master成功 kubeadm init了,但是安装network add-on时总是出错。今天来再试一试。
首先我是按照这篇博文安装的,
https://blog.csdn.net/weixin_43645454/article/details/124952184
因为国内安装真的太多坑了。官网根本没法看
首先我按照这篇博文没有成功安装calico,原因是:
在kubeadm init 时 配置了serviceSubnet,很明显是service的子网的意思
同时 在calico.yml 中配置的是CALICO_IPV4POOL_CIDR,意思是pod ip池
博文中介绍要一样。
但实际上。apiservice的网段跟 pod的网段 是不一样的。我目前成功启动的配置是:
networking:
dnsDomain: cluster.local
serviceSubnet: 172.21.0.0/16
podSubnet: 172.22.0.0/16
然后calico.yml 中:
- name: CALICO_IPV4POOL_CIDR
value: "172.22.0.0/16"
这样就成功启动了。
下面说一下遇到的问题:
错误1:
kubectl apply -f calico.yaml 后,calico-node报错
或者worker节点join后,calico-node 启动失败(例如:CrashLoopBackOff )
9月 29 16:12:48 master kubelet[12272]: E0929 16:12:48.116920 12272 remote_runtime.go:222] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"71281bf7c6d991756cac784f7c9943e200a3e69fa49afe3299f98c6a5fd6b366\": plugin type=\"calico\" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"
9月 29 16:12:48 master kubelet[12272]: E0929 16:12:48.117002 12272 kuberuntime_sandbox.go:71] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox \"71281bf7c6d991756cac784f7c9943e200a3e69fa49afe3299f98c6a5fd6b366\": plugin type=\"calico\" failed (add): stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/" pod="kube-system/calico-kube-controllers-58dbc876ff-bc5dg"
9月 29 16:45:31 master kubelet[32709]: E0929 16:45:31.311990 32709 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-kube-controllers-58dbc876ff-7lxsj_kube-system(1eec9a3f-6310-492d-b2c5-c6278356c48e)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"calico-kube-controllers-58dbc876ff-7lxsj_kube-system(1eec9a3f-6310-492d-b2c5-c6278356c48e)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"84dfe491af29e30551e124ac6c73bfcd2ffd089ab900192d745441868083f6dd\\\": plugin type=\\\"calico\\\" failed (add): error adding host side routes for interface: cali11848191ccc, error: route (Ifindex: 9, Dst: 10.0.0.1/32, Scope: link) already exists for an interface other than 'cali11848191ccc': route (Ifindex: 5, Dst: 10.0.0.1/32, Scope: link, Iface: cali13a7d337791)\"" pod="kube-system/calico-kube-controllers-58dbc876ff-7lxsj" podUID=1eec9a3f-6310-492d-b2c5-c6278356c48e
这个是因为calico安装 卸载了很多次,有时候 k delete -f calico.yaml 没有删除虚拟网卡或者路由(暂时这样称呼,暴露了我基础知识的薄弱,鸟哥的书买了好几年也没有看,要把这个提上日程了)。
解决办法:
// 每次kubeadm reset 后
// 都要先删除网络的配置,其实reset的提示里有让删除这个
rm -rf /etc/cni/net.d/*
// 然后删除 遗留的路由或网卡
// link/ipip 或 link/ether
// 查询网卡/路由
ip a / ip addr / ip link / ip route
或者
ifconfig
// 如果能看到别的网卡 例如我的是这样
[root@master ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:2f:98:e0 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
valid_lft 65590sec preferred_lft 65590sec
inet6 fe80::be6e:ee2a:bcd9:e981/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bc:f7:2b brd ff:ff:ff:ff:ff:ff
inet 192.168.56.106/24 brd 192.168.56.255 scope global noprefixroute dynamic enp0s8
valid_lft 552sec preferred_lft 552sec
inet6 fe80::5753:6a6a:3f3:6f5b/64 scope link noprefixroute
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:cc:58:a5:6e brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
15: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 172.22.219.64/32 scope global tunl0
valid_lft forever preferred_lft forever
16: cali9035434f5df@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
17: cali4af7a3781d7@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
18: calib71dfeb1411@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
// 后面三个 cali开头的还有tunl0都是要删除的
modprobe -r ipip // 删除tunl0
ip link delete cali23bcdbdbc8c // 删除cali开头的ip link
错误2:
worker join 后 的calico-node 报的错
9月 29 16:55:44 master kubelet[9251]: E0929 16:55:44.151178 9251 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"coredns-c676cc86f-dh5bn_kube-system(70d1a056-dd07-4162-b350-85d6be15276b)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"coredns-c676cc86f-dh5bn_kube-system(70d1a056-dd07-4162-b350-85d6be15276b)\\\": rpc error: code = Unknown desc = failed to setup network for sandbox \\\"b1bced40c96601e0c114392e6388991a6609fcfd81ac2f1c2a359840f272e997\\\": plugin type=\\\"calico\\\" failed (add): error getting ClusterInformation: Get \\\"https://10.0.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default\\\": dial tcp 10.0.0.1:443: connect: connection refused\"" pod="kube-system/coredns-c676cc86f-dh5bn" podUID=70d1a056-dd07-4162-b350-85d6be15276b
2022-09-29 13:22:02.617 [FATAL][1] cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post "https://10.244.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-node/token": dial tcp 10.244.0.1:443: i/o timeout
查看下url
https://10.0.6.1:443/api/
很明显,请求的ip是我当时设置的CALICO_IPV4POOL_CIDR,ip咋会是443? 我在init.yaml 里面是这样定义的
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 192.168.56.106
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: master
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.25.0
networking:
dnsDomain: cluster.local
serviceSubnet: 172.21.0.0/16
podSubnet: 172.22.0.0/16
scheduler: {}
绑定的端口是6443。所以,就像开头说的,把serviceSubnet和podSubnet分开,启动成功。还要注意网段,我的虚拟机的网段都是192的。
看了百度上很多 解决办法
//有说在calico.yaml中加入KUBERNETES_SERVICE_HOST和KUBERNETES_SERVICE_PORT的
- name: KUBERNETES_SERVICE_HOST
value: "kube-apiserver" # master apiserver 地址
- name: KUBERNETES_SERVICE_PORT
value: "6443"
- name: KUBERNETES_SERVICE_PORT_HTTPS
value: "6443"
//有说加IP_AUTODETECTION_METHOD的
- name: IP_AUTODETECTION_METHOD
value: "interface=enp.*"
//官网上说可以加一个ConfigMap来设置,也尝试了
https://projectcalico.docs.tigera.io/maintenance/ebpf/enabling-ebpf#configure-calico-to-talk-directly-to-the-api-server
kind: ConfigMap
apiVersion: v1
metadata:
name: kubernetes-services-endpoint
namespace: kube-system
data:
KUBERNETES_SERVICE_HOST: "192.168.56.106"
KUBERNETES_SERVICE_PORT: "6443"
KUBERNETES_SERVICE_PORT_HTTPS: "6443"
反正最后都没有成功,最后还是修改子网的配置成功了,因为网段压根不一样。
最后worker节点成功加入集群
[root@master ~]# kubectl get pods --all-namespaces -owide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-58dbc876ff-pvpft 1/1 Running 0 4h15m 172.22.219.66 master <none> <none>
kube-system calico-node-bd4vg 1/1 Running 0 4h15m 192.168.56.106 master <none> <none>
kube-system calico-node-p98gc 1/1 Running 0 4h12m 192.168.56.107 node01 <none> <none>
kube-system coredns-c676cc86f-lq4kx 1/1 Running 0 4h15m 172.22.219.65 master <none> <none>
kube-system coredns-c676cc86f-rjkp8 1/1 Running 0 4h15m 172.22.219.67 master <none> <none>
kube-system etcd-master 1/1 Running 9 4h15m 192.168.56.106 master <none> <none>
kube-system kube-apiserver-master 1/1 Running 0 4h15m 192.168.56.106 master <none> <none>
kube-system kube-controller-manager-master 1/1 Running 0 4h15m 192.168.56.106 master <none> <none>
kube-system kube-proxy-4k9rr 1/1 Running 0 4h12m 192.168.56.107 node01 <none> <none>
kube-system kube-proxy-mzp7q 1/1 Running 0 4h15m 192.168.56.106 master <none> <none>
kube-system kube-scheduler-master 1/1 Running 9 4h15m 192.168.56.106 master <none> <none>
[root@master ~]# k get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 4h15m v1.25.0
node01 Ready <none> 4h11m v1.25.0
最后说一下,其实k8s 官方文档里面 troubleshooting kubeadm 页面的东西能解决你 99.99%的问题。剩下 0.01%是网络环境的问题。
但是这0.01%的网络问题很难··因为不太了解linux网络的东西。
比如安装ipset及ipvsadm,都是啥··
还有 搜索问题应该是
1、去官网找
2、去github上找相关issue
3、实在没办法,百度
最后总结下查找错误日志的命令,这些也很重要,要不然你都无从下手。
// 查看配置
kubectl config view
// kubernetes查看当前context
kubectl config get-contexts
// 切换名称空间
kubectl config set-context --current --namespace=<namespace>
// get 所有 pod
kubectl get pods --all-namespaces
// 更详细
kubectl get pods --all-namespaces -owide
// 删除 pod 或者 node
kubectl delete pod -n kube-system coredns-6f4fd4bdf-8q7zp
kubectl delete nodes node01
// kubelet 的日志
journalctl -xefu kubelet
// 查询某个pod,仔细观察日志
kubectl describe pod -n kube-system pod_name
// 查询某个pod的某个container的日志
kubectl logs -n kube-system calico-node-jx4k5 -c install-cni
// watch 很有意思
watch kubectl get pods --all-namespaces
// 查询状态
systemctl status kubelet
// 给node设置标签
kubectl label no node2 kubernetes.io/role=test-node
基本就这三板斧 describe logs journalctl
好,下一篇开始部署点东西试试
不太对啊,还是有问题
Warning Unhealthy 69s (x2 over 70s) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
虽然是个warning,但不知道会不会有什么问题,反正目前都正常,都在running状态
我想为Heroku构建一个Rails3应用程序。他们使用Postgres作为他们的数据库,所以我通过MacPorts安装了postgres9.0。现在我需要一个postgresgem并且共识是出于性能原因你想要pggem。但是我对我得到的错误感到非常困惑当我尝试在rvm下通过geminstall安装pg时。我已经非常明确地指定了所有postgres目录的位置可以找到但仍然无法完成安装:$envARCHFLAGS='-archx86_64'geminstallpg--\--with-pg-config=/opt/local/var/db/postgresql90/defaultdb/po
我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案 他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%
尝试通过RVM将RubyGems升级到版本1.8.10并出现此错误:$rvmrubygemslatestRemovingoldRubygemsfiles...Installingrubygems-1.8.10forruby-1.9.2-p180...ERROR:Errorrunning'GEM_PATH="/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/ruby-1.9.2-p180@global:/Users/foo/.rvm/gems/ruby-1.9.2-p180:/Users/foo/.rvm/gems/rub
我打算为ruby脚本创建一个安装程序,但我希望能够确保机器安装了RVM。有没有一种方法可以完全离线安装RVM并且不引人注目(通过不引人注目,就像创建一个可以做所有事情的脚本而不是要求用户向他们的bash_profile或bashrc添加一些东西)我不是要脚本本身,只是一个关于如何走这条路的快速指针(如果可能的话)。我们还研究了这个很有帮助的问题:RVM-isthereawayforsimpleofflineinstall?但有点误导,因为答案只向我们展示了如何离线在RVM中安装ruby。我们需要能够离线安装RVM本身,并查看脚本https://raw.github.com/wayn
我有一个奇怪的问题:我在rvm上安装了rubyonrails。一切正常,我可以创建项目。但是在我输入“railsnew”时重新启动后,我有“程序'rails'当前未安装。”。SystemUbuntu12.04ruby-v"1.9.3p194"gemlistactionmailer(3.2.5)actionpack(3.2.5)activemodel(3.2.5)activerecord(3.2.5)activeresource(3.2.5)activesupport(3.2.5)arel(3.0.2)builder(3.0.0)bundler(1.1.4)coffee-rails(
我刚刚为fedora安装了emacs。我想用emacs编写ruby。为ruby提供代码提示、代码完成类型功能所需的工具、扩展是什么? 最佳答案 ruby-mode已经包含在Emacs23之后的版本中。不过,它也可以通过ELPA获得。您可能感兴趣的其他一些事情是集成RVM、feature-mode(Cucumber)、rspec-mode、ruby-electric、inf-ruby、rinari(用于Rails)等。这是我当前用于Ruby开发的Emacs配置:https://github.com/citizen428/emacs
我正在尝试在我的centos服务器上安装therubyracer,但遇到了麻烦。$geminstalltherubyracerBuildingnativeextensions.Thiscouldtakeawhile...ERROR:Errorinstallingtherubyracer:ERROR:Failedtobuildgemnativeextension./usr/local/rvm/rubies/ruby-1.9.3-p125/bin/rubyextconf.rbcheckingformain()in-lpthread...yescheckingforv8.h...no***e
我的最终目标是安装当前版本的RubyonRails。我在OSXMountainLion上运行。到目前为止,这是我的过程:已安装的RVM$\curl-Lhttps://get.rvm.io|bash-sstable检查已知(我假设已批准)安装$rvmlistknown我看到当前的稳定版本可用[ruby-]2.0.0[-p247]输入命令安装$rvminstall2.0.0-p247注意:我也试过这些安装命令$rvminstallruby-2.0.0-p247$rvminstallruby=2.0.0-p247我很快就无处可去了。结果:$rvminstall2.0.0-p247Search
我实际上是在尝试使用RVM在我的OSX10.7.5上更新ruby,并在输入以下命令后:rvminstallruby我得到了以下回复:Searchingforbinaryrubies,thismighttakesometime.Checkingrequirementsforosx.Installingrequirementsforosx.Updatingsystem.......Errorrunning'requirements_osx_brew_update_systemruby-2.0.0-p247',pleaseread/Users/username/.rvm/log/138121
由于fast-stemmer的问题,我很难安装我想要的任何rubygem。我把我得到的错误放在下面。Buildingnativeextensions.Thiscouldtakeawhile...ERROR:Errorinstallingfast-stemmer:ERROR:Failedtobuildgemnativeextension./System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/rubyextconf.rbcreatingMakefilemake"DESTDIR="cleanmake"DESTDIR=