节点、Pod亲和生产案例
节点亲和、pod亲和
污点和容忍可以实现简单的调度,但有些问题还是无法处理
pod和节点之间的关系:
某些Pod优先选择有ssd=true标签的节点,如果没有在考虑部署到其它节点;
某些Pod需要部署在ssd=true和type=physical的节点上,但是优先部署在ssd=true的节点上;
pod和pod之间的关系:
同一个应用的Pod不同的副本或者同一个项目的应用尽量或必须不部署在同一个节点或者符合某个标签的一类节点上或者不同的域;
相互依赖的两个Pod尽量或必须部署在同一个节点上或者同一个域内。
Affinity 亲和力:
NodeAffinity:节点亲和力/反亲和力
PodAffinity:pod亲和力
PodAntiAffinity:Pod反亲和力
节点亲和配置
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
# 亲和配置
affinity:
nodeAffinity: # 节点亲和
requiredDuringSchedulingIgnoredDuringExecution: #节点硬亲和
nodeSelectorTerms: # node标签,可匹配多个
- matchExpressions:
- key: region # 标签key
operator: In # 匹配条件
values: # value
- "chaoyang"
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
requiredDuringSchedulingIgnoredDuringExecution
:硬亲和配置nodeSelectorTerms
:节点选择器配置,可以配置多个matchExpressions
(满足一个即可),每个matchExpressions
下可以配置多个key、value的选择器(都需要满足),其中values
可以配置多个(满足一个即可)
preferredDuringSchedulingIgnoredDuringExecution
:软亲和配置weight
:软亲和力权重,权重越高优先级越大,范围1-100preference
:软亲和力配置项,和weight
统计,可配置多个matchExpressions
operator
:In
:相当于key=value的形式NotIn
:相当于key!=value的形式Exists
:节点存在label的key即可,不能配置values字段DoesNotExists
:节点不存在label的key,不能配置values字段Gt
:大于value指定的值Lt
:小于value指定的值
案例一
k8s集群中有两台机器是朝阳区的机器,请吧某个应用部署在朝阳区的两台机器上,并且一台机器只能部署一个pod副本
1、首先给朝阳区两台机器添加label
[root@k8s-master01 affinity]# kubectl label node k8s-node03 k8s-node06 region=chaoyang
[root@k8s-master01 affinity]# kubectl get nodes --show-labels |grep region=chaoyang
k8s-node03 Ready <none> 4d2h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node03,kubernetes.io/os=linux,region=chaoyang,system=Ansion
k8s-node06 Ready <none> 19h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,environment=test,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node06,kubernetes.io/os=linux,region=chaoyang
2、创建deployment
[root@k8s-master01 affinity]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: test
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
# 亲和配置
affinity:
nodeAffinity: # 节点亲和
requiredDuringSchedulingIgnoredDuringExecution: #节点硬亲和
nodeSelectorTerms: # node标签,可匹配多个
- matchExpressions:
- key: region # 标签key
operator: In # 匹配条件
values: # value
- "chaoyang"
podAntiAffinity: # pod反亲和
requiredDuringSchedulingIgnoredDuringExecution: # pod 硬反亲和
- labelSelector: # 匹配pod标签,单数
matchExpressions: # 标签
- key: app # pod标签
operator: In #匹配条件
values: # value
- nginx # 标签value
topologyKey: kubernetes.io/hostname
containers:
- image: registry.cn-hangzhou.aliyuncs.com/dyclouds/nginx:1.15.12
name: nginx
可以看到pod已经调度到node3和node6节点了,并且如果这时候修改副本为大于2个,那么大于2个的pod会起不来,pending状态,因为pod 反亲和已经定义了相同label的不能在一个域topologyKey: kubernetes.io/hostname
每台节点的hostname都不一样,即是不同域
案例2
同一个应用的不同副本需要在同一台节点,比如这个应用是需要gpu服务器,所有副本都需要部署到有gpu服务器的节点
1、给gpu服务器打标签
[root@k8s-master01 affinity]# kubectl label node k8s-node07 gpu=true
2、创建deployment,配置pod亲和
[root@k8s-master01 affinity]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: test
spec:
replicas: 6
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: gpu
operator: In
values:
- "true"
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
containers:
- image: registry.cn-hangzhou.aliyuncs.com/dyclouds/nginx:1.15.12
name: nginx
可以看到所有服务都在node07上了
案例3
尽量调度到高配置服务器
[root@k8s-master01 affinity]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: test
spec:
replicas: 6
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: region
operator: In
values:
- haidian
# podAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# - labelSelector:
# matchExpressions:
# - key: app
# operator: In
# values:
# - nginx
# topologyKey: kubernetes.io/hostname
# podAntiAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - weight: 100
# podAffinityTerm:
# labelSelector:
# matchExpressions:
# - key: security
# operator: In
# valu
# s:
# - S2
# namespaces:
# - default
# topologyKey: failure-domain.beta.kubernetes.io/zone
containers:
- image: registry.cn-hangzhou.aliyuncs.com/dyclouds/nginx:1.15.12
name: nginx
案例4
同一个应用多区域部署,我有3个区域,一个是海淀,一个是朝阳,还一个是,每个区域都有2台机器,需求是将每个区域的每台节点都部署应用,且两个副本不能再同一台节点
给对应区域节点打label同上,忽略,下面的结果可以看到三个区域一共5台机器
[root@k8s-master01 affinity]# kubectl get nodes --show-labels |egrep "region=haidian|region=chaoyang|environment=prod"
k8s-node01 Ready <none> 4d4h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,environment=prod,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node01,kubernetes.io/os=linux,region=daxing,ssd=true
k8s-node02 Ready <none> 4d4h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,gpu=false,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node02,kubernetes.io/os=linux,region=haidian,ssd=true
k8s-node03 Ready <none> 4d4h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node03,kubernetes.io/os=linux,region=chaoyang,system=Ansion
k8s-node04 Ready <none> 22h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,environment=prod,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node04,kubernetes.io/os=linux,region=haidian,system=Ansion
k8s-node06 Ready <none> 21h v1.28.2 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,environment=test,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node06,kubernetes.io/os=linux,region=chaoyang
创建deployment并添加亲和配置
[root@k8s-master01 affinity]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: test
spec:
replicas: 6
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity: # node硬亲和,要求必须部署在对应的三个区域
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: region
operator: In
values:
- chaoyang
- haidian
- matchExpressions:
- key: environment
operator: In
values:
- prod
podAntiAffinity: # pod硬反亲和,要求每个副本必须在不同的节点
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
containers:
- image: registry.cn-hangzhou.aliyuncs.com/dyclouds/nginx:1.15.12
name: nginx
上图可以看到,有5台节点已经成功部署在对应的区域,因为我们的副本数是6,大于了5台,因为pod硬反亲和的配置,所以剩余的一台只能是pending状态。
拓扑域Topologykey详解
topologkey:拓扑域,主要针对宿主机,相当于对宿主机进行区域的划分,用label进行判断,不同的key和不同的value是属于不同的拓扑域
2.1 同一个应用必须部署在不同的宿主机
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test1.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: diff-nodes
labels:
app: diff-nodes
spec:
selector:
matchLabels:
app: diff-nodes
replicas: 2
template:
metadata:
labels:
app: diff-nodes
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- diff-nodes
topologyKey: kubernetes.io/hostname
namespaces: []
containers:
- name: diff-nodes
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.2 同一个应用尽量部署在不同的宿主机
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test2.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: diff-nodes
labels:
app: diff-nodes
spec:
selector:
matchLabels:
app: diff-nodes
replicas: 2
template:
metadata:
labels:
app: diff-nodes
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- diff-nodes
topologyKey: kubernetes.io/hostname
namespaces: []
weight: 100
containers:
- name: diff-nodes
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.3 同一个应用分布在不同的机房
如果集群处于不同的可用域,可以把应用分布在不同的可用域,以提高服务的高可用性:
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test3.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: diff-zone
labels:
app: diff-zone
spec:
selector:
matchLabels:
app: diff-zone
replicas: 2
template:
metadata:
labels:
app: diff-zone
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- diff-zone
topologyKey: zone
namespaces: []
weight: 100
containers:
- name: diff-nodes
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.4 应用尽量和缓存服务部署在同一个可用域
如果集群分布在不同的可用域,为了提升基础组件的使用性能,可以把应用程序尽量和缓存服务部署在同一个可用域。
首先部署一个缓存服务:
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test4.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
labels:
app: my-app
spec:
selector:
matchLabels:
app: my-app
replicas: 2
template:
metadata:
labels:
app: my-app
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- cache
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- name: my-app
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.5 计算服务必须部署到高性能机器
假设集群中有一批机器是高性能机器,而有一些需要密集计算的服务,需要部署至这些机器,以提高计算性能,此时可以使用节点亲和力来控制 Pod 尽量或者必须部署至这些节点上。
比如计算服务只能部署在 ssd 或 nvme 的节点上:
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test5.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: compute
labels:
app: compute
spec:
selector:
matchLabels:
app: compute
replicas: 2
template:
metadata:
labels:
app: compute
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
- nvme
containers:
- name: compute
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.5 应用尽量不部署至低性能机器
假如已知集群中有一些机器可能性能不佳或者其他因素的影响,需要控制某个服务尽量不部署至这些机器,此时只需要把 operator 改为 NotIn 即可:
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test6.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: compute-intensive
labels:
app: compute-intensive
spec:
selector:
matchLabels:
app: compute-intensive
replicas: 2
template:
metadata:
labels:
app: compute-intensive
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: disktype
operator: NotIn
values:
- low
containers:
- name: compute-intensive
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
2.5 应用均匀分布在不同的机房
Kubernetes 的 topologySpreadConstraints(拓扑域约束) 是一种高级的调度策略,用于确保工作负载的副本在集群中的不同拓扑域(如节点、可用区、区域等)之间均匀分布。
root@VM-26-130-ubuntu:/usr/local/src/k8s/affinity# cat test7.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
labels:
app: example
spec:
selector:
matchLabels:
app: example
replicas: 2
template:
metadata:
labels:
app: example
spec:
topologySpreadConstraints:
- maxSkew: 1
whenUnsatisfiable: DoNotSchedule
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: example
containers:
- name: example
image: registry.cn-beijing.aliyuncs.com/dotbalo/nginx:1.15.12
imagePullPolicy: IfNotPresent
topologySpreadConstraints
:拓扑域约束配置,可以是多个副本均匀分布在不同的域中,配置多个时,需要全部满足maxSkew
:指定允许的最大偏差。例如,如果 maxSkew 设置为 1,那么在任何拓扑域中,副本的数量最多只能相差 1whenUnsatisfiable
:指定当无法满足拓扑约束时的行为DoNotSchedule
:不允许调度新的 Pod,直到满足约束ScheduleAnyway
:即使不满足约束,也允许调度新的 Pod
topologyKey
:指定拓扑域的键labelSelector
:指定要应用拓扑约束的 Pod 的标签选择器,通常配置为当前 Pod 的标签