Photo by Hans Eiskonen on Unsplash
Ever needed to diagnose weird networking bugs on BOSH deployed VMs? We ran into a problem with a BOSH release that would create routes for itself on the networking stack. To detect this, Arpy was created.
How it Works
Arpy is a simple bash script which uses the BOSH CLI and jq. The IP addresses of each of the VMs in a particular deployment are looped through. An SSH session is established and an arping
command is run against all the IP addresses in the deployment. You can quickly identify which virtual machines can (and importantly cannot) send and receive traffic.
Using Arpy
Start by setting a few environment variables:
export BOSH_ENVIRONMENT="name of your bosh director"export BOSH_DEPLOYMENT="name of your deployment"
To see the results on the screen and dump them to a flat file run:
bash <(curl -Ls https://raw.githubusercontent.com/starkandwayne/arpy/master/arpy.sh) 2>&1 | tee results.out
Sifting the Output
The output below shows 1 iteration of arpy
, to spot signs of trouble look for any Received 0 response(s)
where the ARPING x.x.x.x from x.x.x.x
isn’t itself. The last section is the only one which did not get a response but we can safely ignore it because the arp request came from itself.
checking arping 10.128.57.198 from node/f75c8f57-8639-4a47-a29a-21fb7b4900e6
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Unauthorized use is strictly prohibited. All access and activity
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | is subject to logging and monitoring.
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | ARPING 10.128.57.198 from 10.128.57.200 eth0
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Unicast reply from 10.128.57.198 [00:50:56:8B:39:99] 2.048ms
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Sent 1 probes (1 broadcast(s))
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Received 1 response(s)
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Connection to 10.128.57.200 closed.
checking arping 10.128.57.199 from node/f75c8f57-8639-4a47-a29a-21fb7b4900e6
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Unauthorized use is strictly prohibited. All access and activity
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | is subject to logging and monitoring.
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | ARPING 10.128.57.199 from 10.128.57.200 eth0
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Unicast reply from 10.128.57.199 [00:50:56:8B:BD:5F] 2.632ms
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Sent 1 probes (1 broadcast(s))
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Received 1 response(s)
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Connection to 10.128.57.200 closed.
checking arping 10.128.57.200 from node/f75c8f57-8639-4a47-a29a-21fb7b4900e6
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Unauthorized use is strictly prohibited. All access and activity
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | is subject to logging and monitoring.
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | ARPING 10.128.57.200 from 10.128.57.200 eth0
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Sent 1 probes (1 broadcast(s))
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stdout | Received 0 response(s)
node/f75c8f57-8639-4a47-a29a-21fb7b4900e6: stderr | Connection to 10.128.57.200 closed.
This can help to identify odd network configurations such as:
~# ip addr show
...< removed for brevity>
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 1e:ed:d0:2e:29:4e brd ff:ff:ff:ff:ff:ff
inet 10.128.57.200/32 brd 10.128.57.200 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.128.57.199/32 brd 10.128.57.199 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.128.57.198/32 brd 10.128.57.198 scope global kube-ipvs0
valid_lft forever preferred_lft forever
When arpy
was run on the above VM with it’s network configuration it was not able to communicate with 10.128.57.199-200
since this VM thought itself had those IP addresses. Odd, right? arpy
identified there was a networking misconfiguration.
Hope this helps!