This plugin provides integration with BGP capable network devices which removes need to do outgoing NAT for containers network connectivity and provide ECMP based load balancing between multiple hosts. More information about concept can be found from RFC 7938 and from Meta's blog.
You can use any BGP compatible router. If you don't have any, you can use these steps to setup lab.
Download GoBGP binary.
Example config:
[global.config]
as = 65500
router-id = "192.168.8.137"
[[peer-groups]]
[peer-groups.config]
peer-group-name = "bgp-lb"
peer-as = 64512
[[peer-groups.afi-safis]]
[peer-groups.afi-safis.config]
afi-safi-name = "ipv4-unicast"
[[peer-groups.afi-safis]]
[peer-groups.afi-safis.config]
afi-safi-name = "ipv4-flowspec"
[[dynamic-neighbors]]
[dynamic-neighbors.config]
prefix = "192.168.8.0/24"
peer-group = "bgp-lb"Run with command ./gobgpd --log-level=debug -f gobgp.toml
Create host specific bridge network for outgoing connectivity (Like docker_gwbridge but for non-swarm/non-overlay workloads):
docker network create \
--driver bridge \
--subnet 172.23.1.0/24 \
--gateway 172.23.1.1 \
--ipv6 \
--subnet 2001:0db8:0000:1001::/64 \
--gateway 2001:0db8:0000:1001::1 \
-o com.docker.network.bridge.name=bgplb_gwbridge \
-o com.docker.network.bridge.enable_icc=false \
-o com.docker.network.bridge.enable_ip_masquerade=false \
--label bgplb_advertise=true \
bgplb_gwbridgeLabel bgplb_advertise=true will tell bgplb driver to advertise it with BGP.
Option com.docker.network.bridge.enable_ip_masquerade=false will disable NAT from outgoing connections.
Option com.docker.network.bridge.enable_icc=false is optional, it will disable inter container connectivity.
docker plugin install \
--grant-all-permissions \
ollijanatuinen/docker-bgp-lb:v1.7 \
ROUTER_ID=192.168.8.40 \
PEER_ADDRESS=192.168.8.137 \
PEER_AS=65500 \
SIGUSR2_HANDLER=trueGoBGP inform about incoming BGP connection with message like this:
{
"Key": "192.168.8.40",
"Topic": "Peer",
"level": "debug",
"msg": "Accepted a new dynamic neighbor",
"time": "2024-04-10T09:58:09Z"
}docker network create \
--driver ollijanatuinen/docker-bgp-lb:v1.7 \
--ipam-driver ollijanatuinen/docker-bgp-lb:v1.7 \
--subnet 10.0.0.101/32 \
--ipv6 \
--ipam-opt v6subnet=2001:0db8:0000:1000::101/128 \
web1
docker run -d \
--name=web1 \
--network=bgplb_gwbridge \
--network=web1 \
--ip 172.23.1.25 \
--ip6 2001:0db8:0000:1001::25 \
--add-host web2=2001:0db8:0000:1000::102 \
--health-cmd "curl -f http://localhost/ || exit 1" \
--health-start-period 15s \
--stop-timeout 30 \
--stop-signal SIGUSR2 \
ollijanatuinen/debug:nginx
docker network create \
--driver ollijanatuinen/docker-bgp-lb:v1.7 \
--ipam-driver ollijanatuinen/docker-bgp-lb:v1.7 \
--subnet 10.0.0.102/32 \
--ipv6 \
--ipam-opt v6subnet=2001:0db8:0000:1000::102/128 \
web2
docker run -d \
--name=web2 \
--network=bgplb_gwbridge \
--network=web2 \
--ip 172.23.1.26 \
--ip6 2001:0db8:0000:1001::26 \
--add-host web1=2001:0db8:0000:1000::101 \
--health-cmd "curl -f http://localhost/ || exit 1" \
--health-start-period 15s \
--stop-timeout 30 \
--stop-signal SIGUSR2 \
ollijanatuinen/debug:nginxAfter containers are in "healthy" state two things will happen:
- New local routes like this are added:
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.101 0.0.0.0 255.255.255.255 UH 0 0 0 bgplb-f9bb8454b
10.0.0.102 0.0.0.0 255.255.255.255 UH 0 0 0 bgplb-f9bb8454c
Destination Next Hop Flag Met Ref Use If
2001:db8:0:1::101/128 :: U 1024 3 0 bgplb-f9bb8454b
2001:db8:0:1::101/128 :: U 1024 3 0 bgplb-f9bb8454c
- GoBGP inform about new BGP route with messages like this:
{
"Key": "192.168.8.40",
"Topic": "Peer",
"attributes": [
{
"type": 1,
"value": 0
},
{
"type": 2,
"as_paths": [
{
"segment_type": 2,
"num": 1,
"asns": [
64512
]
}
]
},
{
"type": 3,
"nexthop": "192.168.8.40"
}
],
"level": "debug",
"msg": "received update",
"nlri": [
{
"prefix": "10.0.0.101/32"
}
],
"time": "2024-04-10T10:04:59Z",
"withdrawals": []
}If you installed plugin with SIGUSR2_HANDLER=true and started container with --stop-signal SIGUSR2 option, three things will happen:
- GoBGP inform about removed BGP route with message like this:
{
"Data": {
"nlri": {
"prefix": "10.0.0.101/32"
},
"attrs": [
{
"type": 1,
"value": 0
},
{
"type": 2,
"as_paths": [
{
"segment_type": 2,
"num": 1,
"asns": [
64512
]
}
]
},
{
"type": 3,
"nexthop": "192.168.8.40"
}
],
"age": 1712743499,
"withdrawal": true,
"source-id": "192.168.8.40",
"neighbor-ip": "192.168.8.40"
},
"Key": "192.168.8.40",
"Topic": "Peer",
"level": "debug",
"msg": "From me, ignore",
"time": "2024-04-10T10:10:28Z"
}- Local route to
10.0.0.101/32will be removed. - After 5 seconds delay, normal container stop signal
SIGTERMwill be send to container and it will stop (unless you setSIGUSR2_ACTIONanything else than default valuestop).
In Swarm mode we only define our load balancer subnet for services.
Docker will automatically add docker_gwbridge as second network for them which those containers uses for outgoing traffic.
To make those connections also using routed connectivity without NAT, we need reconfigure that network like described in here.
docker network rm docker_gwbridge
docker network create \
--driver bridge \
--subnet 172.23.2.0/24 \
--gateway 172.23.2.1 \
--ipv6 \
--subnet 2001:0db8:0000:1002::/64 \
--gateway 2001:0db8:0000:1002::1 \
-o com.docker.network.bridge.name=docker_gwbridge \
-o com.docker.network.bridge.enable_icc=false \
-o com.docker.network.bridge.enable_ip_masquerade=false \
--label bgplb_advertise=true \
docker_gwbridgeNote! It is easiest to do this when node is not yet as part of swarm because other why you need remove and recreate also ingress network which affects all nodes same time (however, we are not actually using ingress at all in this configuration). On current implementation, on this point you also need disable and re-enable bgp-lb plugin to trigger that network BGP advertise.
In Swarm mode we want to give two extra parameters:
--endpoint-mode=dnsrrwhich disables VIP reservation done by Swarm so our IP address gets allocated directly to container.--mode=globalwhich makes one replica of container running on every node in Swarm which have bgp-lb plugin installed.- This is where BGP-LB shows its power because all of those nodes will start advertising our load balancer IPs with BGP.
- You can still limit this to certain nodes with
--constraintparameter.
docker network create \
--driver ollijanatuinen/docker-bgp-lb:v1.7 \
--ipam-driver ollijanatuinen/docker-bgp-lb:v1.7 \
--subnet 10.0.0.103/32 \
--ipv6 \
--ipam-opt v6subnet=2001:0db8:0000:1000::103/128 \
web
docker service create \
--name web \
--network=web \
--endpoint-mode=dnsrr \
--mode=global \
ollijanatuinen/debug:nginxDocker does not currently support disabling IPv4 which why yours containers always has IPv4 address in bgplb_gwbridge and docker_gwbridge networks.
However, you can skip configuring by IPv4 address for load balancing interface by simply skipping --subnet parameter when creating load balancing subnet and only specify --ipam-opt v6subnet=
Technically network will still have 0.0.0.0/32 configured as IPv4 subnet and it will get assigned to containers but Linux ignore it and this plugin will not advertise it with BGP.
BGP configuration is a bit tricky to get correctly done which why you might notice that it does not work with first try.
You can capture BGP messages with tcpdump like this tcpdump -i eth0 port 179 -n -vvv -s 65535 -w bgp-debug.pcap and then investigate those with Wireshark.
Look this example how to look those messages and this section of RFC about what are explanations for those errors.