etcd 是CoreOS团队于2013年6月发起的开源项目,它的目标是构建一个高可用的分布式键值数据库。etcd内部采用raft协议作为一致性算法,etcd基于Go语言实现
官方网址:
https://etcd.io
github地址:
https://github.com/etcd-io/etcd
官方硬件推荐:
https://etcd.io/docs/v3.5/op-guide/hardware

一、硬件配置说明:

  1. CPU

很少有 etcd 部署需要大量 CPU 容量。典型的集群需要两到四个核心才能顺利运行。重负载的 etcd 部署,每秒服务数千个客户端或数万个请求,往往受 CPU 限制,因为 etcd 可以服务来自内存的请求。如此繁重的部署通常需要八到十六个专用核心。

  1. 内存

etcd 的内存占用相对较小,但其性能仍然取决于是否有足够的内存。etcd 服务器将积极缓存键值数据,并花费大部分其余的内存跟踪观察者。通常 8GB 就足够了。对于具有数千个观察者和数百万个键的大量部署,相应地分配 16GB 到 64GB 内存。

  1. 磁盘

快速磁盘是 etcd 部署性能和稳定性的最关键因素。

慢速磁盘会增加 etcd 请求延迟并可能损害集群稳定性。由于 etcd 的共识协议依赖于将元数据持久存储到日志中,因此大多数 etcd 集群成员必须将每个请求写入磁盘。此外,etcd 还将逐步检查其状态到磁盘,以便截断此日志。如果这些写入时间过长,心跳可能会超时并触发选举,从而破坏集群的稳定性。一般来说,要判断一个磁盘对于 etcd 是否足够快,可以使用诸如fio之类的基准测试工具。在此处阅读示例。

etcd 对磁盘写入延迟非常敏感。通常需要 50 个顺序 IOPS(例如,7200 RPM 磁盘)。对于负载较重的集群,建议使用 500 顺序 IOPS(例如,典型的本地 SSD 或高性能虚拟化块设备)。请注意,大多数云提供商发布并发 IOPS 而不是顺序 IOPS;发布的并发 IOPS 可以是顺序 IOPS 的 10 倍。要测量实际的顺序 IOPS,我们建议使用磁盘基准测试工具,例如diskbench或fio。

etcd 只需要适度的磁盘带宽,但当失败的成员必须赶上集群时,更多的磁盘带宽会购买更快的恢复时间。通常 10MB/s 将在 15 秒内恢复 100MB 数据。对于大型集群,建议 100MB/s 或更高,以在 15 秒内恢复 1GB 数据。

如果可能,请使用 SSD 支持 etcd 的存储。SSD 通常比旋转磁盘提供更低的写入延迟和更少的变化,从而提高 etcd 的稳定性和可靠性。如果使用旋转磁盘,请尽可能获得最快的磁盘 (15,000 RPM)。对于旋转磁盘和 SSD,使用 RAID 0 也是提高磁盘速度的有效方法。对于至少三个集群成员,RAID 的镜像和/或奇偶校验变体是不必要的;etcd 的一致性复制已经获得了高可用性。

  1. 网络

多成员 etcd 部署受益于快速可靠的网络。为了让 etcd 保持一致和分区容错,一个不可靠的网络分区中断将导致可用性差。低延迟确保 etcd 成员可以快速通信。高带宽可以减少恢复失败的 etcd 成员的时间。1GbE 足以满足常见的 etcd 部署。对于大型 etcd 集群,10GbE 网络将减少平均恢复时间。
尽可能在单个数据中心内部署 etcd 成员,以避免延迟开销并减少分区事件的可能性。如果需要另一个数据中心的故障域,请选择离现有数据中心更近的数据中心。另请阅读调优文档以获取有关跨数据中心部署的更多信息

基于硬件配置,可以支持的pod数量:

  • 8核16G SSD,基本够用,数千pod
  • 8核 8G SSD 数百个pod
  • 16核32G SSD 上万Pod

官方文档:
https://etcd.io/docs/v3.5/op-guide/maintenance/

etcd具有下面的属性:

  • 完全复制:集群中的每个节点都可以使用完整的文档
  • 高可用性:etcd可用于避免硬件的单点故障或网络问题
  • 一致性:每次读取都会返回跨多主机的最新写入
  • 简单:包括一个定义良好、面向用户的API(gRPC)
  • 安全:实现了带有可选的客户端证书身份验证的自动化TLS
  • 快速:每秒10000此写入的基准速度
  • 可靠:使用Raft算法实现了存储的合理分布Etcd的工作原理

注意:etcd没有配置文件,直接通过etcd.service传递:

[root@etcd01 ~]# cat /etc/systemd/system/etcd.service 
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd  #数据保存目录
ExecStart=/usr/local/bin/etcd \  #二进制文件路径
  --name=etcd-192.168.17.140 \    #当前node名称
  --cert-file=/etc/kubernetes/ssl/etcd.pem \   #公钥
  --key-file=/etc/kubernetes/ssl/etcd-key.pem \  #私钥
  --peer-cert-file=/etc/kubernetes/ssl/etcd.pem \ #连接对端的公钥
  --peer-key-file=/etc/kubernetes/ssl/etcd-key.pem \ #连接对端的私钥
  --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ #ca
  --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \
  --initial-advertise-peer-urls=https://192.168.17.140:2380 \  #集群端口,通告自己的集群端口
  --listen-peer-urls=https://192.168.17.140:2380 \ #集群之间通讯端口
  --listen-client-urls=https://192.168.17.140:2379,http://127.0.0.1:2379 \  #客户端端口,
  --advertise-client-urls=https://192.168.17.140:2379 \  #通告自己的客户端端口,跟api-server交互的端口
  --initial-cluster-token=etcd-cluster-0 \  #创建集群使用的token,一个集群内的节点保持一致
  --initial-cluster=etcd-192.168.17.140=https://192.168.17.140:2380,etcd-192.168.17.141=https://192.168.17.141:2380,etcd-192.168.17.142=https://192.168.17.142:2380 \  #集群所有节点信息
  --initial-cluster-state=new \ #新建集群时为new,如果已经存在的集群为existing
  --data-dir=/var/lib/etcd \  #数据目录路径
  --wal-dir= \   #预写式日志路径,默认跟数据目录下
  --snapshot-count=50000 \   #快照
  #etcd参数优化:
  --auto-compaction-retention=10 \  #数据压缩相关参数, 第一次压缩等待10小时,以后每次10小时*10%=1小时压缩一次,会导致cpu负载变高,可能会导致网络堵塞,
  --auto-compaction-mode=periodic \   #周期性压缩
  --max-request-bytes=10485760 \ # request size limit(请求的最大字节数,默认一个key最大1.5Mib官方推荐最大10Mib),10485760/1024/1024单个数据往etcd写入最大是多大
  --quota-backend-bytes=8589934592 #storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有告警信息),


Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999

[Install]
WantedBy=multi-user.target

二、ETCD参数优化:

  • –auto-compaction-retention=10 \ #数据压缩相关参数, 第一次压缩等待10小时,以后每次10小时*10%=1小时压缩一次
  • –auto-compaction-mode=periodic \ #周期性压缩
  • –max-request-bytes=10485760 \ # request size limit(请求的最大字节数,默认一个key最大1.5Mib官方推荐最大10Mib)
  • –quota-backend-bytes=8589934592 #storage size limit(磁盘存储空间大小限制,默认为2G,此值超过8G启动会有告警信息)

集群碎片整理(有些时候etcd工作时间比较长了,数据不是连续的需要整理,按照顺序的io):

ETCDCTL_API=3 /usr/local/bin/etcdctl defrag --cluster --endpoints=https://192.168.17.140:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem
[root@etcd01 ~]# ETCDCTL_API=3 /usr/local/bin/etcdctl defrag --cluster --endpoints=https://192.168.17.140:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem
Finished defragmenting etcd member[https://192.168.17.141:2379]
Finished defragmenting etcd member[https://192.168.17.142:2379]
Finished defragmenting etcd member[https://192.168.17.140:2379]

etcd数据存储的都是元数据,数据量也就几个G

[root@etcd01 ~]# ll /var/lib/etcd/
总用量 0
drwx------ 4 root root 29 54 10:28 member
[root@etcd01 ~]# ll /var/lib/etcd/member/
总用量 0
drwx------ 2 root root 200 54 11:06 snap
drwx------ 2 root root 109 54 10:28 wal
[root@etcd01 ~]# ll /var/lib/etcd/member/snap/
总用量 4236
-rw-r--r-- 1 root root    7601 419 01:00 0000000000000001-0000000000000003.snap
-rw-r--r-- 1 root root   10318 422 17:34 0000000000000010-000000000000c354.snap
-rw-r--r-- 1 root root   10318 422 23:06 0000000000000011-00000000000186a5.snap
-rw-r--r-- 1 root root   11489 54 11:06 0000000000000017-00000000000249f6.snap
-rw------- 1 root root 4292608 54 15:18 db
[root@etcd01 ~]# ll /var/lib/etcd/member/wal
总用量 187504
-rw------- 1 root root 64000616 422 23:42 0000000000000000-0000000000000000.wal
-rw------- 1 root root 64000000 54 15:18 0000000000000001-0000000000019b5d.wal
-rw------- 1 root root 64000000 54 10:28 0.tmp

snap: 数据目录
wal: 预写式日志,在插入数据的时候,先写完成日志,再保存数据,如果日志没有写入成功就相当于数据未写入完成,可以通过日志恢复数

三、etcd命令(默认是v3版本):

[root@etcd01 ~]# etcdctl --help
NAME:
	etcdctl - A simple command line client for etcd3.

USAGE:
	etcdctl [flags]

VERSION:
	3.5.1

API VERSION:
	3.5


COMMANDS:
	alarm disarm		Disarms all alarms
	alarm list		Lists all alarms
	auth disable		Disables authentication
	auth enable		Enables authentication
	auth status		Returns authentication status
	check datascale		Check the memory usage of holding data for different workloads on a given server endpoint.
	check perf		Check the performance of the etcd cluster
	compaction		Compacts the event history in etcd
	defrag			Defragments the storage of the etcd members with given endpoints
	del			Removes the specified key or range of keys [key, range_end)
	elect			Observes and participates in leader election
	endpoint hashkv		Prints the KV history hash for each endpoint in --endpoints
	endpoint health		Checks the healthiness of endpoints specified in `--endpoints` flag
	endpoint status		Prints out the status of endpoints specified in `--endpoints` flag
	get			Gets the key or a range of keys
	help			Help about any command
	lease grant		Creates leases
	lease keep-alive	Keeps leases alive (renew)
	lease list		List all active leases
	lease revoke		Revokes leases
	lease timetolive	Get lease information
	lock			Acquires a named lock
	make-mirror		Makes a mirror at the destination etcd cluster
	member add		Adds a member into the cluster
	member list		Lists all members in the cluster
	member promote		Promotes a non-voting member in the cluster
	member remove		Removes a member from the cluster
	member update		Updates a member in the cluster
	move-leader		Transfers leadership to another etcd cluster member.
	put			Puts the given key into the store
	role add		Adds a new role
	role delete		Deletes a role
	role get		Gets detailed information of a role
	role grant-permission	Grants a key to a role
	role list		Lists all roles
	role revoke-permission	Revokes a key from a role
	snapshot restore	Restores an etcd member snapshot to an etcd directory
	snapshot save		Stores an etcd node backend snapshot to a given file
	snapshot status		[deprecated] Gets backend snapshot status of a given file
	txn			Txn processes all the requests in one transaction
	user add		Adds a new user
	user delete		Deletes a user
	user get		Gets detailed information of a user
	user grant-role		Grants a role to a user
	user list		Lists all users
	user passwd		Changes password of user
	user revoke-role	Revokes a role from a user
	version			Prints the version of etcdctl
	watch			Watches events stream on keys or prefixes

OPTIONS:
      --cacert=""				verify certificates of TLS-enabled secure servers using this CA bundle
      --cert=""					identify secure client using this TLS certificate file
      --command-timeout=5s			timeout for short running command (excluding dial timeout)
      --debug[=false]				enable client-side debug logging
      --dial-timeout=2s				dial timeout for client connections
  -d, --discovery-srv=""			domain name to query for SRV records describing cluster endpoints
      --discovery-srv-name=""			service name to query when using DNS discovery
      --endpoints=[127.0.0.1:2379]		gRPC endpoints
  -h, --help[=false]				help for etcdctl
      --hex[=false]				print byte strings as hex encoded strings
      --insecure-discovery[=true]		accept insecure SRV records describing cluster endpoints
      --insecure-skip-tls-verify[=false]	skip server certificate verification (CAUTION: this option should be enabled only for testing purposes)
      --insecure-transport[=true]		disable transport security for client connections
      --keepalive-time=2s			keepalive time for client connections
      --keepalive-timeout=6s			keepalive timeout for client connections
      --key=""					identify secure client using this TLS key file
      --password=""				password for authentication (if this option is used, --user option shouldn't include password)
      --user=""					username[:password] for authentication (prompt if password is not supplied)
  -w, --write-out="simple"			set the output format (fields, json, protobuf, simple, table)
  • 查看etcd集群状态(可以进行监控,查看是否是successfully)
export NODE_IPS="192.168.17.140 192.168.17.141 192.168.17.142"
for ip in ${NODE_IPS};do ETCDCTL_API=3 /usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health;done

[root@etcd01 ~]# export NODE_IPS="192.168.17.140 192.168.17.141 192.168.17.142"
[root@etcd01 ~]# for ip in ${NODE_IPS};do ETCDCTL_API=3 /usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health;done
https://192.168.17.140:2379 is healthy: successfully committed proposal: took = 6.786219ms
https://192.168.17.141:2379 is healthy: successfully committed proposal: took = 10.078909ms
https://192.168.17.142:2379 is healthy: successfully committed proposal: took = 8.635907ms

以表格方式显示节点详细信息:

export NODE_IPS="192.168.17.140 192.168.17.141 192.168.17.142"
for ip in ${NODE_IPS};do ETCDCTL_API=3 /usr/local/bin/etcdctl --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health;done
#添加--write-out参数以表格方式显示详细信息
[root@etcd01 ~]# export NODE_IPS="192.168.17.140 192.168.17.141 192.168.17.142"
[root@etcd01 ~]# for ip in ${NODE_IPS};do ETCDCTL_API=3 /usr/local/bin/etcdctl --write-out=table endpoint status  --endpoints=https://${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/etcd.pem --key=/etc/kubernetes/ssl/etcd-key.pem endpoint health;done
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.17.140:2379 | f0b00c5fba82edde |   3.5.1 |  2.0 MB |     false |      false |        24 |     196500 |             196500 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.17.141:2379 | 31b79580a6603995 |   3.5.1 |  1.9 MB |      true |      false |        24 |     196500 |             196500 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.17.142:2379 | bd6bb6e56a019be8 |   3.5.1 |  2.0 MB |     false |      false |        24 |     196500 |             196500 |        |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

# IS LEADER 是否是主节点,一般是主节点写,然后同步到其他节点
#IS LEARNER 是否再同步
  • 查看etcd集群的所有key
[root@etcd01 ~]# etcdctl  get / --prefix --keys-only
  • 根据key查看value
[root@etcd01 ~]# etcdctl  get / --prefix --keys-only |grep nginx
/calico/resources/v3/projectcalico.org/workloadendpoints/default/node02-k8s-nginx-eth0
/registry/deployments/test/nginx-deployment
/registry/pods/default/nginx
/registry/services/specs/test/nginx-service
[root@etcd01 ~]# 
[root@etcd01 ~]# etcdctl get /registry/pods/default/nginx
/registry/pods/default/nginx
k8s
	
v1Pod 
nginxdefault"*$63501c8d-8349-4b93-b8ae-61384447b83f2ޕ퐆Z

runnginxz² 

kubectl-runUpdatevޕ퐆FieldsV1: 
񡤺metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"nginx\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}BĄ
Go-http-clientUpdatev¥¶¯FieldsV1: 
{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.200.140.73\"}":{".":{},"f:ip":{}}},"f:startTime":{}}}Bstatus 
 
kube-api-access-rk8zmkЁh
"

token
(&
 
kube-root-ca.crt 
ca.crtca.crt
)'
%
	namespace 
v1metadata.namespace¤ 
nginxnginx*BJL
kube-api-access-rk8zm-/var/run/secrets/kubernetes.io/serviceaccount"2j/dev/termination-logrAlways¢FileAlways 2
                                                                                                              ClusterFirstBdefaultJdefaultR192.168.17.151X`hrdefault-scheduler²6
node.kubernetes.io/not-readyExists"	NoExecute(¬²8
node.kubernetes.io/unreachableExists"	NoExecute(¬Ɓ񿂺PreemptLowerPriority± 
Running#

InitializedTruޕ퐆*2 
ReadyTru󇑆*2'
ContainersReadyTru󇑆*2$

10.200.140.7ޕ퐆BĂ.*2"*192.168.17.1512
nginx 
      
                                                                                                                                                                                                                                   󇑆oÿError󊇓:Idocker://3a2f801bd7810b187320749d3d0d365d5020ef626dc127ea8660bb73d88d97a8 (2
                                                                                     nginx:latest:_docker-pullable://nginx@sha256:0d17b565c37bcbd895e9d92315a05c1c3c9a29f762b011a10c54a66cd53c9b31BIdocker://3eb9468b2bc7dfd62f0ba9588dce1e8a996a5884de7a0173c7cbe44fcbab7e54HJ
BestEffortZb
10.200.140.73"
  • 删除数据
[root@etcd01 ~]# etcdctl del /registry/pods/default/nginx
1
[root@master01 ssl]# kubectl get pods 
NAME          READY   STATUS    RESTARTS        AGE
dujie-test1   1/1     Running   1 (20h ago)     11d
nginx         1/1     Running   2 (5h45m ago)   17d
[root@master01 ssl]# 
[root@master01 ssl]# kubectl get pods 
NAME          READY   STATUS    RESTARTS            AGE
dujie-test1   1/1     Running   1 (<invalid> ago)   7d7h
  • etcd数据watch机制:
    基于不断监看数据,发生变化就主动出发通知客户端,etcd v3的watch机制支持watch某个固定的key,也支持watch一个范围
#在etcd集群的node1上watch一个key,没有此key也可以执行watch,后期可以再创建
[root@etcd01 ~]# ETCDCTL_API=3 etcdctl watch /data
#在etcd node2上修改数据,验证etcd node1是否能够发现数据变化
[root@etcd02 ~]#  ETCDCTL_API=3 etcdctl put /data "data v1"
OK
[root@etcd02 ~]#  ETCDCTL_API=3 etcdctl put /data "data v2"
OK
[root@etcd01 ~]# ETCDCTL_API=3 etcdctl watch /data

PUT
/data
data v1
PUT
/data
data v2