k8s(三)、kube-router: BGP直通连接网络

##

前言

上一篇讲到,k8s使用traefik代理集群内部服务,灵活地注入代理配置的方式,提供对外服务(参考:k8s(二)、对外服务)。在本篇,使用gns3思科模拟器IOU,搭建BGP网络环境,通过vmware桥接的方式,直接对接虚拟机内的k8s集群,与集群内部BGP网络实现互通。

拓扑图:

这里写图片描述

1.路由器配置
-

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
####router1:
IOU1#show running-config

interface Ethernet0/0
ip address 192.168.0.101 255.255.255.0
vrrp 1 ip 192.168.0.100

router bgp 64513
bgp log-neighbor-changes
neighbor 192.168.0.102 remote-as 64513
neighbor 192.168.0.170 remote-as 64512
neighbor 192.168.0.171 remote-as 64512
maximum-paths 2
no auto-summary


###router2:
IOU2#show running-config
interface Ethernet0/0
ip address 192.168.0.102 255.255.255.0
vrrp 1 ip 192.168.0.100
vrrp 1 priority 100

router bgp 64513
bgp log-neighbor-changes
neighbor 192.168.0.101 remote-as 64513
neighbor 192.168.0.170 remote-as 64512
neighbor 192.168.0.171 remote-as 64512
maximum-paths 2
no auto-summary

2.kube-route配置:

k8s集群搭建步骤见k8s(一)、 1.9.0高可用集群本地离线部署记录
在此前配置好的kube-route的yaml配置文件中,需添加如下几项参数,分别指定本地bgp as,对等体as、ip等。

1
2
3
4
5
- --advertise-cluster-ip=true   #宣告集群IP
- --advertise-external-ip=true #宣告svc外部ip,如果svc指定了external-ip则生效
- --cluster-asn=64512 #指定本地集群bgp as号
- --peer-router-ips=192.168.0.100 #指定对等体ip,这里可以写多个,以','隔开,本次实验路由器配置了vrrp,指定vip即可
- --peer-router-asns=64513 #对等体as号

完整的kube-router.yaml文件如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-router-cfg
namespace: kube-system
labels:
tier: node
k8s-app: kube-router
data:
cni-conf.json: |
{
"name":"kubernetes",
"type":"bridge",
"bridge":"kube-bridge",
"isDefaultGateway":true,
"ipam": {
"type":"host-local"
}
}
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
k8s-app: kube-router
tier: node
name: kube-router
namespace: kube-system
spec:
template:
metadata:
labels:
k8s-app: kube-router
tier: node
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
serviceAccountName: kube-router
serviceAccount: kube-router
containers:
- name: kube-router
image: cloudnativelabs/kube-router
imagePullPolicy: Always
args:
- --run-router=true
- --run-firewall=true
- --run-service-proxy=true
- --advertise-cluster-ip=true
- --advertise-external-ip=true
- --cluster-asn=64512
- --peer-router-ips=192.168.0.100
- --peer-router-asns=64513
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
livenessProbe:
httpGet:
path: /healthz
port: 20244
initialDelaySeconds: 10
periodSeconds: 3
resources:
requests:
cpu: 250m
memory: 250Mi
securityContext:
privileged: true
volumeMounts:
- name: lib-modules
mountPath: /lib/modules
readOnly: true
- name: cni-conf-dir
mountPath: /etc/cni/net.d
- name: kubeconfig
mountPath: /var/lib/kube-router/kubeconfig
readOnly: true
initContainers:
- name: install-cni
image: busybox
imagePullPolicy: Always
command:
- /bin/sh
- -c
- set -e -x;
if [ ! -f /etc/cni/net.d/10-kuberouter.conf ]; then
TMP=/etc/cni/net.d/.tmp-kuberouter-cfg;
cp /etc/kube-router/cni-conf.json ${TMP};
mv ${TMP} /etc/cni/net.d/10-kuberouter.conf;
fi
volumeMounts:
- mountPath: /etc/cni/net.d
name: cni-conf-dir
- mountPath: /etc/kube-router
name: kube-router-cfg
hostNetwork: true
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
key: node-role.kubernetes.io/master
operator: Exists
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
- name: cni-conf-dir
hostPath:
path: /etc/cni/net.d
- name: kube-router-cfg
configMap:
name: kube-router-cfg
- name: kubeconfig
hostPath:
path: /var/lib/kube-router/kubeconfig
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-router
namespace: kube-system
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kube-router
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
- services
- nodes
- endpoints
verbs:
- list
- get
- watch
- apiGroups:
- "networking.k8s.io"
resources:
- networkpolicies
verbs:
- list
- get
- watch
- apiGroups:
- extensions
resources:
- networkpolicies
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: kube-router
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-router
subjects:
- kind: ServiceAccount
name: kube-router
namespace: kube-system

直接删除此前的kube-router配置,重新create这个yaml文件内的资源:

1
2
kubectl delete -f kubeadm-kuberouter.yaml  #更新配置参数前执行
kubectl create -f kubeadm-kuberouter.yaml #更新配置参数后执行

接着部署一个测试用得nginx的pod实例,yaml文件如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[root@171 nginx]# cat nginx-deploy.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "2"
creationTimestamp: 2018-04-09T04:02:02Z
generation: 4
labels:
app: nginx
name: nginx-deploy
namespace: default
resourceVersion: "111504"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/nginx-deploy
uid: c28090c0-3baa-11e8-b75a-000c29858eab
spec:
replicas: 1
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx:1.9.1
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

查看podIP,本地curl测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@171 nginx]# kubectl get pods -o wide 
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deploy-5964dfd755-lv2kb 1/1 Running 0 29m 10.244.1.35 171
[root@171 nginx]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@171 nginx]#

3.检验结果

路由器查看BGP邻居:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
IOU1#show ip bgp summary 
BGP router identifier 192.168.0.101, local AS number 64513
BGP table version is 19, main routing table version 19
4 network entries using 560 bytes of memory
6 path entries using 480 bytes of memory
2 multipath network entries and 4 multipath paths
1/1 BGP path/bestpath attribute entries using 144 bytes of memory
1 BGP AS-PATH entries using 24 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1208 total bytes of memory
BGP activity 5/1 prefixes, 12/6 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.0.102 4 64513 100 106 19 0 0 01:27:20 0
192.168.0.170 4 64512 107 86 19 0 0 00:37:19 3
192.168.0.171 4 64512 98 85 19 0 0 00:37:21 3
IOU1#

路由器查看BGP路由条目:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
IOU1#show ip route bgp
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
E1 - OSPF external type 1, E2 - OSPF external type 2
i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
ia - IS-IS inter area, * - candidate default, U - per-user static route
o - ODR, P - periodic downloaded static route, H - NHRP, l - LISP
a - application route
+ - replicated route, % - next hop override

Gateway of last resort is 192.168.0.1 to network 0.0.0.0

10.0.0.0/8 is variably subnetted, 4 subnets, 2 masks
B 10.96.0.1/32 [20/0] via 192.168.0.171, 00:38:36
[20/0] via 192.168.0.170, 00:38:36
B 10.96.0.10/32 [20/0] via 192.168.0.171, 00:38:36
[20/0] via 192.168.0.170, 00:38:36
B 10.244.0.0/24 [20/0] via 192.168.0.170, 00:38:36
B 10.244.1.0/24 [20/0] via 192.168.0.171, 00:38:39

可以看到,bgp邻居建立成功,k8s集群内部路由学习成功

开一台测试机检查:
测试机修改默认网关指向路由器,模拟外部网络环境中的一台普通pc:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@python ~]# ip route
default via 192.168.0.100 dev eth3 proto static

[root@python ~]# curl 10.244.1.35
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
[root@python ~]#

测试成功,至此,集群内部bgp网络,和外部bgp网络对接成功!

友情提示:别忘了保存路由器的配置到nvram里(copy running-config startup-config),否则重启就丢配置啦。好久没碰网络设备了,这个茬给忘了,被坑了一次,嘿嘿


9-22踩坑

今日尝试在k8s集群中添加与原集群node(192.168.9.x/24)不在同一个网段的新node(192.168.20.x/24),创建好了之后出现了非常奇怪的现象:新node中的kube-router与集群外的网络核心设备之间建立的peer邻居关系一直重复地处于established —> idle状态频繁转变,非常的不稳定,当处于established状态时,新node与原node间的丢包率甚至达到70%以上,处于idle状态时,node间ping包正常
问题截图:
在这里插入图片描述

在进入kube-router后使用gobgp neighbor查看发现新node与外部网络设备ebgp\原node的ibgp邻居关系一直处于频繁变化的状态无法稳定建立关系,百思不得其解,最后终于在github上找到了kube-router唯一的类似issue,项目成员解释如下:
Github issue链接
在这里插入图片描述

个人理解是:
kube-router只支持与同一个子网的node建立ibgp对等体关系,跨子网的节点无法建立对等体邻居关系;同时,ebgp协议的ebgp max hop属性,默认值为1,路由器设备一般支持手动修改此值,而kube-router较早之前的版本仅支持默认值1,无法手动配置此值,因此EBGP邻居与kube-router也必须在同一个子网中,后面的版本已解决此问题,升级版本后node跨子网建ebgp邻居不再有问题。

赏一瓶快乐回宅水吧~
-------------本文结束感谢您的阅读-------------