比较恶心,原来部署的k8s集群ca过期了。
整个集群都废了。
尝试了需改证书时间,不过改来改去这个好了,那个坏了,只能重置了。
coredns 启动失败
calico 没有启动
kube-apiserver 启动不了6443
重置命令
在每个节点上执行 kubeadm reset -f [reset] Reading configuration from the cluster... [reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [preflight] Running pre-flight checks [reset] Removing info for node "k8snode1" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace [reset] Stopping the kubelet service [reset] Unmounting mounted directories in "/var/lib/kubelet" [reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] [reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni] The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d The reset process does not reset or clean up iptables rules or IPVS tables. If you wish to reset iptables, you must do so manually by using the "iptables" command. If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) to reset your system's IPVS tables. The reset process does not clean your kubeconfig files and you must remove them manually. Please, check the contents of the $HOME/.kube/config file. 这里会删除一些缓存的文件 当然还有一些文件也需要手动删除 例如kuboard的etcd文件缓存--否则安装kuboard是会出问题
2.安装或重新配置master节点 我直接重新撇脂master节点了,在下面
2.1安装
curl -sSL https://kuboard.cn/install-script/v1.18.x/install_kubelet.sh | sh -s 1.18.0 这里可以看看安装脚本 install_kubelet.sh #!/bin/bash # 在 master 节点和 worker 节点都要执行 # 安装 docker # 参考文档如下 # https://docs.docker.com/install/linux/docker-ce/centos/ # https://docs.docker.com/install/linux/linux-postinstall/ # 卸载旧版本 yum remove -y docker \ docker-client \ docker-client-latest \ docker-ce-cli \ docker-common \ docker-latest \ docker-latest-logrotate \ docker-logrotate \ docker-selinux \ docker-engine-selinux \ docker-engine # 设置 yum repository yum install -y yum-utils \ device-mapper-persistent-data \ lvm2 yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo # 安装并启动 docker yum install -y docker-ce-19.03.8 docker-ce-cli-19.03.8 containerd.io systemctl enable docker systemctl start docker # 安装 nfs-utils # 必须先安装 nfs-utils 才能挂载 nfs 网络存储 yum install -y nfs-utils yum install -y wget # 关闭 防火墙 systemctl stop firewalld systemctl disable firewalld # 关闭 SeLinux setenforce 0 sed -i "s/SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config # 关闭 swap swapoff -a yes | cp /etc/fstab /etc/fstab_bak cat /etc/fstab_bak |grep -v swap > /etc/fstab # 修改 /etc/sysctl.conf # 如果有配置,则修改 sed -i "s#^net.ipv4.ip_forward.*#net.ipv4.ip_forward=1#g" /etc/sysctl.conf sed -i "s#^net.bridge.bridge-nf-call-ip6tables.*#net.bridge.bridge-nf-call-ip6tables=1#g" /etc/sysctl.conf sed -i "s#^net.bridge.bridge-nf-call-iptables.*#net.bridge.bridge-nf-call-iptables=1#g" /etc/sysctl.conf sed -i "s#^net.ipv6.conf.all.disable_ipv6.*#net.ipv6.conf.all.disable_ipv6=1#g" /etc/sysctl.conf sed -i "s#^net.ipv6.conf.default.disable_ipv6.*#net.ipv6.conf.default.disable_ipv6=1#g" /etc/sysctl.conf sed -i "s#^net.ipv6.conf.lo.disable_ipv6.*#net.ipv6.conf.lo.disable_ipv6=1#g" /etc/sysctl.conf sed -i "s#^net.ipv6.conf.all.forwarding.*#net.ipv6.conf.all.forwarding=1#g" /etc/sysctl.conf # 可能没有,追加 echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf echo "net.bridge.bridge-nf-call-ip6tables = 1" >> /etc/sysctl.conf echo "net.bridge.bridge-nf-call-iptables = 1" >> /etc/sysctl.conf echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf echo "net.ipv6.conf.default.disable_ipv6 = 1" >> /etc/sysctl.conf echo "net.ipv6.conf.lo.disable_ipv6 = 1" >> /etc/sysctl.conf echo "net.ipv6.conf.all.forwarding = 1" >> /etc/sysctl.conf # 执行命令以应用 sysctl -p # 配置K8S的yum源 cat <<EOF > /etc/yum.repos.d/kubernetes.repo [kubernetes] name=Kubernetes baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64 enabled=1 gpgcheck=0 repo_gpgcheck=0 gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg EOF # 卸载旧版本 yum remove -y kubelet kubeadm kubectl # 安装kubelet、kubeadm、kubectl # 将 ${1} 替换为 kubernetes 版本号,例如 1.17.2 yum install -y kubelet-${1} kubeadm-${1} kubectl-${1} # 修改docker Cgroup Driver为systemd # # 将/usr/lib/systemd/system/docker.service文件中的这一行 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock # # 修改为 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd # 如果不修改,在添加 worker 节点时可能会碰到如下错误 # [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". # Please follow the guide at https://kubernetes.io/docs/setup/cri/ sed -i "s#^ExecStart=/usr/bin/dockerd.*#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=systemd#g" /usr/lib/systemd/system/docker.service # 设置 docker 镜像,提高 docker 镜像下载速度和稳定性 # 如果您访问 https://hub.docker.io 速度非常稳定,亦可以跳过这个步骤 curl -sSL https://kuboard.cn/install-script/set_mirror.sh | sh -s ${REGISTRY_MIRROR} # 重启 docker,并启动 kubelet systemctl daemon-reload systemctl restart docker systemctl enable kubelet && systemctl start kubelet docker version
这里注意等一会查下pod的状态
kubectl get pod -n kube-system 如下全部running时在安装kuboard calico-kube-controllers-5b8b769fcd-qxkxs 1/1 Running 0 3h1m calico-node-96s4v 1/1 Running 0 3h1m calico-node-f6hqx 1/1 Running 0 173m calico-node-z2wcj 1/1 Running 0 179m coredns-66db54ff7f-8vhsn 1/1 Running 0 3h1m coredns-66db54ff7f-qd85x 1/1 Running 0 3h1m etcd-k8smaster 1/1 Running 0 3h1m
2.2初始化master
这里注意提前配置下
hosts apiserver 之前已经写入hosts文件了
下面两个参数是export的,只在当前shell窗口有效,所以注意配置下
POD_SUBNET= APISERVER_NAME=
curl -sSL https://kuboard.cn/install-script/v1.18.x/init_master.sh | sh -s 1.18.0 脚本init_master.sh #!/bin/bash # 只在 master 节点执行 # 脚本出错时终止执行 set -e if [ ${#POD_SUBNET} -eq 0 ] || [ ${#APISERVER_NAME} -eq 0 ]; then echo -e "\033[31;1m请确保您已经设置了环境变量 POD_SUBNET 和 APISERVER_NAME \033[0m" echo 当前POD_SUBNET=$POD_SUBNET echo 当前APISERVER_NAME=$APISERVER_NAME exit 1 fi # 查看完整配置选项 https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta2 rm -f ./kubeadm-config.yaml cat <<EOF > ./kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta2 kind: ClusterConfiguration kubernetesVersion: v${1} imageRepository: registry.aliyuncs.com/k8sxio controlPlaneEndpoint: "${APISERVER_NAME}:6443" networking: serviceSubnet: "10.96.0.0/16" podSubnet: "${POD_SUBNET}" dnsDomain: "cluster.local" EOF # kubeadm init # 根据您服务器网速的情况,您需要等候 3 - 10 分钟 kubeadm init --config=kubeadm-config.yaml --upload-certs # 配置 kubectl rm -rf /root/.kube/ mkdir /root/.kube/ cp -i /etc/kubernetes/admin.conf /root/.kube/config # 安装 calico 网络插件 # 参考文档 https://docs.projectcalico.org/v3.13/getting-started/kubernetes/self-managed-onprem/onpremises echo "安装calico-3.13.1" rm -f calico-3.13.1.yaml wget https://kuboard.cn/install-script/calico/calico-3.13.1.yaml kubectl apply -f calico-3.13.1.yaml
3.集群worker添加
获取join命令
# 只在 master 节点执行
kubeadm token create --print-join-command
可获取kubeadm join 命令及参数(有效期两个小时)如下所示:
kubeadm join apiserver.zkl:6443 --token zkmlzr.yah4yxaplhibdpsi --discovery-token-ca-cert-hash sha256:1ca3b3b05ad43727194e173edb2ca59ae81dabbe4372bee235d22e242014aea6
获得worker初始化参数信息
初始化worker
# 只在 worker 节点执行
# 替换 x.x.x.x 为 master 节点的内网 IP
export MASTER_IP=192.168.0.81
# 替换 apiserver.demo 为初始化 master 节点时所使用的 APISERVER_NAME
export APISERVER_NAME=apiserver.zkl
echo "${MASTER_IP} ${APISERVER_NAME}" >> /etc/hosts
# 替换为 master 节点上 kubeadm token create 命令的输出
kubeadm join apiserver.zkl:6443 --token zkmlzr.yah4yxaplhibdpsi --discovery-token-ca-cert-hash sha256:1ca3b3b05ad43727194e173edb2ca59ae81dabbe4372bee235d22e242014aea6
4.安装 nginx-ingress
# 只在 master 节点执行 kubectl apply -f https://kuboard.cn/install-script/v1.18.x/nginx-ingress.yaml
5.安装kuboard
官方网址:
命令
kubectl apply -f 使用命令查看安装结果 [root@k8smaster home]# kubectl get pod -n kuboard NAME READY STATUS RESTARTS AGE kuboard-agent-2-5b87774856-58d8r 1/1 Running 0 164m kuboard-agent-85bdc64c56-7j85z 1/1 Running 0 164m kuboard-etcd-b6fn5 1/1 Running 0 165m kuboard-etcd-crnzj 1/1 Running 0 165m kuboard-etcd-l9kg8 1/1 Running 0 165m kuboard-v3-695f6bd686-44xtt 1/1 Running 0 165m
查看kuboard的端口号 netstat -ntpl tcp 0 0 0.0.0.0:30080 0.0.0.0:* LISTEN 107308/kube-proxy tcp 0 0 0.0.0.0:30081 0.0.0.0:* LISTEN 107308/kube-proxy
6.登录配置kuboard
访问地址: http://ip:30080/
输入初始用户名和密码,并登录
用户名:
admin
密码:
Kuboard123
进入default集群
7.metric-server 问题
开始cpu内存监控时看不到的,需要安装
metrics-scraper 这个没啥问题,按照提示安装就ok了
metric-server 一直在notready状态。
查看相关信息 当前的镜像是0.6.2
kubelet-insecure-tls 该配置是有的,但是仍然提示异常,问题处在权限问题上
删除deployment
kubectl get deployment -n kube-system 查看原deployment
kubectl delete deployment metric-server -n kube-system 删除deployment
手工下载0.5.0版本 wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.5.0/components.yaml 修改 配置 containers:- args: - --cert-dir=/tmp - --secure-port=443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls # 加上该启动参数 image: ccr.ccs.tencentyun.com/mirrors/metrics-server:v0.5.0 # 国内集群,请替换成这个镜像 kubectl apply -f metrics-server.yaml 构建并启动 查看日志 kubectl logs metrics-server-7f6b85b597-9f26g -n kube-system 查看pod kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE metrics-server-7f6b85b597-9f26g 1/1 Running 0 61s 验证:查看node资源使用情况 kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% k8smaster 342m 17% 2164Mi 37% k8snode1 134m 6% 1000Mi 17% k8snode2 131m 6% 1143Mi 19%
完事
乐享:知识积累,快乐无限。