在 CentOS 7.9 上使用二进制包部署 Kubernetes v1.24.1 集群,kubelet 使用 Containerd 作为 container runtime。启动kubelet失败,问题排查和解决。
| 服务 | 版本 |
| CentOS | 7.9 |
| Kernel | 5.4.195-1.el7.elrepo.x86_64 |
| Kubernetes | v1.24.1 |
| containerd | v1.6.4 |
[root @ machine5 ~]$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Fri 2022-06-10 21:56:47 CST; 304ms ago
[root @ machine5 ~]$ journalctl -xe -u kubelet
Jun 10 22:23:33 machine5 kubelet[11122]: I0610 22:23:33.098633 11122 remote_runtime.go:114] "Finding the CRI API runtime version"
Jun 10 22:23:33 machine5 kubelet[11122]: W0610 22:23:33.838519 11122 clientconn.go:1331] [core] grpc: addrConn.createTransport failed to connect to { <nil> 0 <nil>}. Err: connection error: desc = "transport: Error while dialing dial unix: missing address". Reconnecting...
Jun 10 22:23:33 machine5 kubelet[11122]: Error: failed to run Kubelet: unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix: missing address"
报错信息是“failed to run Kubelet: unable to determine runtime API version”
从报错信息来看,kubelet 找不到 Containerd 服务提供的接口,但Containerd服务已经启动了
[root @ machine5 ~]$ systemctl status containerd -l
● containerd.service - containerd container runtime
Loaded: loaded (/usr/lib/systemd/system/containerd.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-06-10 22:20:06 CST; 6s ago
Docs: https://containerd.io
Process: 9923 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
Main PID: 9925 (containerd)
Tasks: 9
Memory: 26.0M
CGroup: /system.slice/containerd.service
└─9925 /usr/bin/containerd
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913907117+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.version\"..." type=io.containerd.grpc.v1
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913913347+08:00" level=info msg="loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." type=io.containerd.tracing.processor.v1
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913926772+08:00" level=info msg="skip loading plugin \"io.containerd.tracing.processor.v1.otlp\"..." error="no OpenTelemetry endpoint: skip plugin" type=io.containerd.tracing.processor.v1
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913933530+08:00" level=info msg="loading plugin \"io.containerd.internal.v1.tracing\"..." type=io.containerd.internal.v1
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913947444+08:00" level=error msg="failed to initialize a tracing processor \"otlp\"" error="no OpenTelemetry endpoint: skip plugin"
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.913983093+08:00" level=info msg="loading plugin \"io.containerd.grpc.v1.cri\"..." type=io.containerd.grpc.v1
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.914128628+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `systemd_cgroup` only works for runtime io.containerd.runtime.v1.linux"
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.914306062+08:00" level=info msg=serving... address=/run/containerd/containerd.sock.ttrpc
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.914331072+08:00" level=info msg=serving... address=/run/containerd/containerd.sock
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.914370232+08:00" level=info msg="containerd successfully booted in 0.022563s"
在仔细查看kubelet启动日志时发现:kubelet 启动时,container-runtime-endpoint 配置是空的,但是 containerd 参数(已弃用)默认配置了 Containerd 的套接字地址。
Jun 10 22:23:26 machine5 kubelet[11122]: I0610 22:23:26.358097 11122 flags.go:64] FLAG: --container-runtime-endpoint=""
Jun 10 22:23:26 machine5 kubelet[11122]: I0610 22:23:26.358099 11122 flags.go:64] FLAG: --containerd="/run/containerd/containerd.sock"
....省略...
Jun 10 22:23:33 machine5 kubelet[11122]: --container-runtime-endpoint string The endpoint of remote runtime service. Unix Domain Sockets are supported on Linux, while npipe and tcp en
Jun 10 22:23:33 machine5 kubelet[11122]: --containerd string containerd endpoint (default "/run/containerd/containerd.sock")
在查看官方文档《Changing the Container Runtime on a Node from Docker Engine to containerd》和 《Component tools - Kubelet》 文档关于配置 Kubelet 使用 Containerd 作为 Container runtimes 的说明以及 kubelet “--container-runtime-endpoint” 参数的说明
配置 Kubelet 使用 Containerd 作为 Container runtimes 的说明
--- 来自《Changing the Container Runtime on a Node from Docker Engine to containerd》
Configure the kubelet to use containerd as its container runtime
Edit the file /var/lib/kubelet/kubeadm-flags.env and add the containerd runtime to the flags. --container-runtime=remote and
--container-runtime-endpoint=unix:///run/containerd/containerd.sock".
... 中间省略....
Note that new CRI socket paths must be prefixed with unix:// ideally.
--container-runtime string Default: docker
The container runtime to use. Possible values: docker, remote.
--container-runtime-endpoint string Default: unix:///var/run/dockershim.sock
[Experimental] The endpoint of remote runtime service. Currently unix socket endpoint is supported on Linux, while npipe and tcp endpoints are supported on windows. Examples: unix:///var/run/dockershim.sock, npipe:./pipe/dockershim.
从文档和kubelet参数说明中可以看出,如果使用 Containerd 作为 Container runtime 时,kubelet启动时需要配置 “--container-runtime-endpoint” 和 “--container-runtime” 两个参数
由于我是使用 systemd 管理 kubelet 服务,需要修改 kubelet.service 中启动kubelet时的参数配置。如下:
[root @ machine5 ~]$ vim /usr/lib/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=containerd.service
Requires=containerd.service
[Service]
WorkingDirectory=/data/kubernetes/kubelet
ExecStart=/usr/local/bin/kubelet \
...省略...
--container-runtime=remote \
--container-runtime-endpoint=unix:///run//containerd/containerd.sock
...省略...
[root @ machine5 ~]$ systemctl daemon-reload
[root @ machine5 ~]$ systemctl start kubelet
[root @ machine5 ~]$ journalctl -xe -u kubelet
Jun 10 23:05:31 machine5 kubelet[25811]: I0610 23:05:31.877416 25811 kubelet.go:376] "Attempting to sync node with API server"
Jun 10 23:05:31 machine5 kubelet[25811]: I0610 23:05:31.877443 25811 kubelet.go:278] "Adding apiserver pod source"
Jun 10 23:05:31 machine5 kubelet[25811]: I0610 23:05:31.877457 25811 apiserver.go:42] "Waiting for node sync before watching apiserver pods"
Jun 10 23:05:31 machine5 kubelet[25811]: E0610 23:05:31.878947 25811 remote_runtime.go:168] "Version from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alph"
Jun 10 23:05:31 machine5 kubelet[25811]: E0610 23:05:31.878995 25811 kuberuntime_manager.go:225] "Get runtime version failed" err="get remote runtime typed version failed: rpc error: code = Unimplemented"
Jun 10 23:05:31 machine5 kubelet[25811]: Error: failed to run Kubelet: failed to create kubelet: get remote runtime typed version failed: rpc error: code = Unimplemented desc = unknown service runtime.v1alp
从报错信息来看,还是 Contiainerd 的问题。Containerd 1.6.4 找不到 runtime.v1alp
在网上搜到的很多分析的原因是 Containerd 配置/etc/containerd/config.toml中禁用了“cri” 插件,解决方案是就是删除 "/etc/containerd/config.toml" 并重启Containerd。
但,我的Containerd配置并没有禁用 cri 插件,并且做了相应的配置。同时我要用Containerd作为Container runtime,并使用 systemd 替换 cgroups,所以以上解决方案并不能很好解决我的问题。《container-runtimes:containerd》
在查看 Containerd 日志时,突然发现启动日志中有一个关于”systemd_cgroup“的Warming日志
Jun 10 22:20:06 machine5 containerd[9925]: time="2022-06-10T22:20:06.914128628+08:00" level=warning msg="failed to load plugin io.containerd.grpc.v1.cri" error="invalid plugin config: `systemd_cgroup` only works for runtime io.containerd.runtime.v1.linux"
这是说明 Containerd v1.6.4版本 “systemd_cgroup” 只能在 runtime type 为 “io.containerd.runtime.v1.linux” 模式下使用。
看来是 Containerd 的配置有问题。
[root @ machine5 ~]$ vim /etc/containerd/config.toml
...省略...
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
...省略...
runtime_root = ""
runtime_type = "io.containerd.runtime.v1.linux"
[root @ machine5 ~]$ systemctl restart containerd
[root @ machine5 ~]$ systemctl status containerd
重启后,上面的Containerd 的warming信息没有了。
不过很尬尴的看到另一条“Warming” :level=warning msg="runtime v1 is deprecated since containerd v1.4, consider using runtime v2"
也就是说 Containerd 1.4 开始弃用runtime v1 了。但 kubelet 1.24.1 使用 runtime v1.
[root @ machine5 ~]$ systemctl start kubelet
[root @ machine5 ~]$ systemctl status kubelet
● kubelet.service - Kubernetes Kubelet
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2022-06-10 23:37:42 CST; 4min 19s ago
#################### 我 是 分 割 线 ####################
[root @ machine1 ~]$ kubectl get node
NAME STATUS ROLES AGE VERSION
machine1 Ready <none> 20h v1.24.1
machine2 Ready <none> 20h v1.24.1
machine3 Ready <none> 20h v1.24.1
machine4 Ready <none> 20h v1.24.1
machine5 Ready <none> 5m13s v1.24.1
以上问题解决
一只干运维又作过运维产品经理的解决方案🐕
最近,当我启动我的Rails服务器时,我收到了一长串警告。虽然它不影响我的应用程序,但我想知道如何解决这些警告。我的估计是imagemagick以某种方式被调用了两次?当我在警告前后检查我的git日志时。我想知道如何解决这个问题。-bcrypt-ruby(3.1.2)-better_errors(1.0.1)+bcrypt(3.1.7)+bcrypt-ruby(3.1.5)-bcrypt(>=3.1.3)+better_errors(1.1.0)bcrypt和imagemagick有关系吗?/Users/rbchris/.rbenv/versions/2.0.0-p247/lib/ru
我是Google云的新手,我正在尝试对其进行首次部署。我的第一个部署是RubyonRails项目。我基本上是在关注thisguideinthegoogleclouddocumentation.唯一的区别是我使用的是我自己的项目,而不是他们提供的“helloworld”项目。这是我的app.yaml文件runtime:customvm:trueentrypoint:bundleexecrackup-p8080-Eproductionconfig.ruresources:cpu:0.5memory_gb:1.3disk_size_gb:10当我转到我的项目目录并运行gcloudprevie
我可以在Azure网站上部署RubyonRails吗? 最佳答案 还没有。目前仅支持.NET和PHP。 关于ruby-on-rails-RubyonRails可以部署在Azure网站上吗?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.com/questions/12964010/
一、引擎主循环UE版本:4.27一、引擎主循环的位置:Launch.cpp:GuardedMain函数二、、GuardedMain函数执行逻辑:1、EnginePreInit:加载大多数模块int32ErrorLevel=EnginePreInit(CmdLine);PreInit模块加载顺序:模块加载过程:(1)注册模块中定义的UObject,同时为每个类构造一个类默认对象(CDO,记录类的默认状态,作为模板用于子类实例创建)(2)调用模块的StartUpModule方法2、FEngineLoop::Init()1、检查Engine的配置文件找出使用了哪一个GameEngine类(UGame
前置步骤我们都操作完了,这篇开始介绍jenkins的集成。话不多说,看操作1、登录进入jenkins后会让你选择安装插件,选择第一个默认的就行。安装完成后设置账号密码,重新登录。2、配置JDK和Git都需要执行路径,所以需要先把执行路径找到,先进入服务器的docker容器,2.1JDK的路径root@69eef9ee86cf:/usr/bin#echo$JAVA_HOME/usr/local/openjdk-82.2Git的路径root@69eef9ee86cf:/#whichgit/usr/bin/git3、先配置JDK和Git。点击:ManageJenkins>>GlobalToolCon
深度学习部署:Windows安装pycocotools报错解决方法1.pycocotools库的简介2.pycocotools安装的坑3.解决办法更多Ai资讯:公主号AiCharm本系列是作者在跑一些深度学习实例时,遇到的各种各样的问题及解决办法,希望能够帮助到大家。ERROR:Commanderroredoutwithexitstatus1:'D:\Anaconda3\python.exe'-u-c'importsys,setuptools,tokenize;sys.argv[0]='"'"'C:\\Users\\46653\\AppData\\Local\\Temp\\pip-instal
Ocra无法处理需要“tk”的应用程序require'tk'puts'nope'用奥克拉http://github.com/larsch/ocra不起作用(如链接中的一个问题所述)问题:https://github.com/larsch/ocra/issues/29(Ocra是1.9的"new"rubyscript2exe,本质上它用于将rb脚本部署为可执行文件)唯一的问题似乎是缺少tcl的DLL文件我不认为这是一个问题据我所知,问题是缺少tk的DLL文件如果它们是已知的,则可以在执行ocra时将它们包括在内有没有办法知道tk工作所需的DLL依赖项? 最佳答
我有一个类unzipper.rb,它使用Rubyzip解压文件。在我的本地环境中,我可以成功解压缩文件,而无需使用require'zip'明确包含依赖项但是在Heroku上,我得到一个NameError(uninitializedconstantUnzipper::Zip)我只能通过使用明确的require来解决问题:为什么这在Heroku环境中是必需的,但在本地主机上却不是?我的印象是Rails自动需要所有gem。app/services/unzipper.rbrequire'zip'#OnlyrequiredforHeroku.Workslocallywithout!class
出于某种原因,heroku尝试要求dm-sqlite-adapter,即使它应该在这里使用Postgres。请注意,这发生在我打开任何URL时-而不是在gitpush本身期间。我构建了一个默认的Facebook应用程序。gem文件:source:gemcuttergem"foreman"gem"sinatra"gem"mogli"gem"json"gem"httparty"gem"thin"gem"data_mapper"gem"heroku"group:productiondogem"pg"gem"dm-postgres-adapter"endgroup:development,:t
我想用Capistrano启动sidekiq。下面是代码namespace:sidekiqdotask:startdorun"cd#{current_path}&&bundleexecsidekiq-c10-eproduction-Llog/sidekiq.log&"pcapture("psaux|grepsidekiq|awk'{print$2}'|sed-n1p").strip!endend它执行成功但sidekiq仍然没有在服务器上启动。输出:$capsidekiq:starttriggeringloadcallbacks*2014-06-0315:03:01executing`