How ARP Killed a Static Route « ipSpace.web weblog

The quantity of bizarre stuff we uncover in netlab integration assessments is astounding, or perhaps I’ve a knack for wanting into the mistaken darkish corners (my spouse would undoubtedly agree with that). Immediately’s particular: when having two subsequent hops kills a static route.

TL&DR: default ARP settings on a multi-subnet Linux host are lower than optimum.

We use these rules when creating netlab integration assessments:

They need to comprise a single device-under-test and a bunch of connected probes.
They need to take a look at a single characteristic.
They need to not depend on the device-under-test. All validation must be carried out on probes.

How do you take a look at static routes below these restrictions? Right here’s what we did:

Join a tool below take a look at to a Linux node (working FRRouting) with a loopback interface.
Configure a static route for the loopback interface on the device-under-test with the following hop pointing to the Ethernet IP deal with of the Linux node.
On the Linux node, ping the Ethernet interface of the device-under-test from the loopback IP deal with.

The outgoing ICMP request packet ought to all the time be despatched (the vacation spot IP deal with is immediately related). The reply, nevertheless, will come again solely when the machine below take a look at has a accurately configured static route for the supply IP deal with of the ICMP request packet.

Right here’s a pattern community topology (the take a look at the place we found this quirk was a bit extra convoluted):

That is the corresponding netlab lab topology:

netlab lab topology configuring a single static route

---
supplier: clab
module: [ routing ]

nodes:
  dut:
    machine: vjunos-switch
    routing.static:
    - node: probe
      nexthop.node: probe
  probe:
    machine: frr

hyperlinks: [ dut-probe ]

As anticipated, each accurately configured machine simply passes this take a look at. Mission completed.

Probe can ping DUT

$ netlab join probe
Connecting to container clab-X-probe, beginning bash

Use vtysh to connect with FRR daemon

probe(bash)# ping -c 3 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=0.484 ms
64 bytes from 10.1.0.1: seq=1 ttl=64 time=0.759 ms
64 bytes from 10.1.0.1: seq=2 ttl=64 time=0.661 ms

--- 10.1.0.1 ping statistics ---
3 packets transmitted, 3 packets acquired, 0% packet loss
round-trip min/avg/max = 0.484/0.634/0.759 ms

Not likely. There may be a couple of hyperlink between the gadgets, during which case netlab configures a number of static routes for a similar vacation spot. Let’s rerun the take a look at with the next topology:

Two links, two next hops for the same destination — Two hyperlinks, two subsequent hops for a similar vacation spot

Right here’s the related netlab topology if you wish to rerun the experiment. The one change is in line 13 (two hyperlinks as a substitute of 1).

netlab lab topology configuring two static routes for a similar vacation spot

supplier: clab
module: [ routing ]

nodes:
  dut:
    machine: vjunos-switch
    routing.static:
    - node: probe
      nexthop.node: probe
  probe:
    machine: frr

hyperlinks: [ dut-probe, dut-probe ]

Most gadgets move this take a look at with flying colours, however the ping command fails with vJunos-switch till we ping the swap from the directly-connected Linux interface:

A ping from the Linux loopback interface fails till we do a easy ping

probe(bash)# ping 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
^C
--- 10.1.0.1 ping statistics ---
5 packets transmitted, 0 packets acquired, 100% packet loss
probe(bash)# ping -c 1 10.1.0.1
PING 10.1.0.1 (10.1.0.1): 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=2.248 ms

--- 10.1.0.1 ping statistics ---
1 packets transmitted, 1 packets acquired, 0% packet loss
round-trip min/avg/max = 2.248/2.248/2.248 ms
probe(bash)# ping -c 3 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=0.437 ms
64 bytes from 10.1.0.1: seq=1 ttl=64 time=0.775 ms
64 bytes from 10.1.0.1: seq=2 ttl=64 time=0.489 ms

--- 10.1.0.1 ping statistics ---
3 packets transmitted, 3 packets acquired, 0% packet loss
round-trip min/avg/max = 0.437/0.567/0.775 ms

That is clearly a big WTAF second, however thankfully, I skilled comparable stuff earlier than and instantly suspected we needed to heat the ARP cache on one of many gadgets. Subsequent step: A packet seize on the hyperlink that the ICMP request needs to be traversing. It would make your jaw drop 😳

Bizarre ARP requests despatched by the Linux node

$ netlab seize probe eth1 ip or arp
Beginning packet seize on probe/eth1: sudo ip netns exec clab-X-probe tcpdump -i eth1 --immediate-mode -l -vv ip or arp
tcpdump: listening on eth1, link-type EN10MB (Ethernet), snapshot size 262144 bytes
11:31:43.900846 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:44.903669 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:45.927675 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:47.901226 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28

I can’t determine what’s weirder:

Linux is sending an ARP request from the IP deal with belonging to a different interface
Junos will not be answering that ARP request.

Aspect word: Pinging 10.1.0.1 from 10.1.0.2 warms the ARP cache on Linux and the Junos swap, so the following ICMP packet despatched from the loopback interface’s IP deal with wants no ARP decision.

Studying RFC 826 is ineffective. It’s so underspecified that it’s unattainable to reply whether or not Linux violates it (most likely not). Moreover, as I’m studying it, the receiver of an ARP request ought to reply to something and enter any deal with info into its ARP desk. I’m most likely lacking one thing (together with some bizarre Linux sysctl parameter); feedback are most welcome.

However Why Did It Work with a Single Hyperlink?

Right here’s the head-scratching half: why did a single-link take a look at work whereas the two-link take a look at requires an ARP warmup train?

It seems that Junos (like every other respectable community machine) preemptively resolves the following hops of the static routes. That creates an ARP entry on each ends of the hyperlink, and the ping works.

vJunos-switch configuration, routing, and ARP desk with a single hyperlink

admin@dut> present configuration routing-options
static {
    route 10.0.0.2/32 next-hop 10.1.0.2;
}

admin@dut> present route desk inet.0

inet.0: 4 locations, 4 routes (4 energetic, 0 holddown, 0 hidden)
Restrict/Threshold: 1048576/1048576 locations
+ = Energetic Route, - = Final Energetic, * = Each

10.0.0.1/32        *[Direct/0] 00:02:29
                    >  through lo0.0
10.0.0.2/32        *[Static/5] 00:02:26
                    >  to 10.1.0.2 through ge-0/0/0.0
10.1.0.0/30        *[Direct/0] 00:02:29
                    >  through ge-0/0/0.0
10.1.0.1/32        *[Local/0] 00:02:29
                       Native through ge-0/0/0.0

admin@dut> present arp
MAC Deal with       Deal with         Identify                      Interface               Flags
52:55:0a:00:00:02 10.0.0.2        probe                     fxp0.0                  none
aa:c1:ab:81:dd:60 10.1.0.2        10.1.0.2                  ge-0/0/0.0              none
02:00:00:00:00:10 128.0.0.16      fpc0                      em1.0                   none
Whole entries: 3

Nonetheless, the vJunos-switch VM does that just for one of many subsequent hops. Within the two-link situation, the static route has two subsequent hops, however we will see solely a single ARP entry within the ARP desk:

Junos static route has two subsequent hops (for 2 hyperlinks), however there’s a single next-hop ARP entry

admin@dut> present configuration routing-options
static {
    route 10.0.0.2/32 next-hop [ 10.1.0.2 10.1.0.6 ];
}

admin@dut> present route desk inet.0

inet.0: 6 locations, 6 routes (6 energetic, 0 holddown, 0 hidden)
Restrict/Threshold: 1048576/1048576 locations
+ = Energetic Route, - = Final Energetic, * = Each

10.0.0.1/32        *[Direct/0] 00:13:08
                    >  through lo0.0
10.0.0.2/32        *[Static/5] 00:13:05
                       to 10.1.0.2 through ge-0/0/0.0
                    >  to 10.1.0.6 through ge-0/0/1.0
10.1.0.0/30        *[Direct/0] 00:13:08
                    >  through ge-0/0/0.0
10.1.0.1/32        *[Local/0] 00:13:08
                       Native through ge-0/0/0.0
10.1.0.4/30        *[Direct/0] 00:13:08
                    >  through ge-0/0/1.0
10.1.0.5/32        *[Local/0] 00:13:08
                       Native through ge-0/0/1.0

admin@dut> present arp
MAC Deal with       Deal with         Identify                      Interface               Flags
52:55:0a:00:00:02 10.0.0.2        probe                     fxp0.0                  none
aa:c1:ab:b7:35:90 10.1.0.6        10.1.0.6                  ge-0/0/1.0              none
02:00:00:00:00:10 128.0.0.16      fpc0                      em1.0                   none
Whole entries: 3

The ARP cache on Linux mirrors that:

ARP cache on Linux node comprises a single Junos IP deal with

$ netlab join probe arp
Connecting to container clab-X-probe, executing arp
ge-0-0-1.0.dut (10.1.0.5) at 0c:00:8b:48:ac:02 [ether]  on eth2
ge-0-0-0.0.dut (10.1.0.1) at   on eth1

In the end, my unimaginable “luck” (I selected the mistaken interface to ping) created this good storm, and I don’t need to inform you how a lot time we wasted chasing this specific gremlin.

Of Course, There’s a sysctl Hack

A couple of days after I wrote this weblog put up, Stefano Sasso encountered the identical bizarre habits when testing static routes on VyOS. This time, he discovered the nerd knob to tweak to make Linux behave like an inexpensive multi-subnet machine. We added that setting to the Linux and FRR preliminary configuration script, hoping we received’t encounter yet one more even weirder variant sooner or later.

What’s All This FECi Stuff

The Hidden Threats inside EMEA’s Fiber Infrastructure

Orange Companions with Camusat to Handle Scope 3 Sustainability Problem – IT Connection

TL&DR: default ARP settings on a multi-subnet Linux host are lower than optimum.

We use these rules when creating netlab integration assessments:

They need to comprise a single device-under-test and a bunch of connected probes.
They need to take a look at a single characteristic.
They need to not depend on the device-under-test. All validation must be carried out on probes.

How do you take a look at static routes below these restrictions? Right here’s what we did:

Join a tool below take a look at to a Linux node (working FRRouting) with a loopback interface.
Configure a static route for the loopback interface on the device-under-test with the following hop pointing to the Ethernet IP deal with of the Linux node.
On the Linux node, ping the Ethernet interface of the device-under-test from the loopback IP deal with.

Right here’s a pattern community topology (the take a look at the place we found this quirk was a bit extra convoluted):

That is the corresponding netlab lab topology:

netlab lab topology configuring a single static route

---
supplier: clab
module: [ routing ]

nodes:
  dut:
    machine: vjunos-switch
    routing.static:
    - node: probe
      nexthop.node: probe
  probe:
    machine: frr

hyperlinks: [ dut-probe ]

As anticipated, each accurately configured machine simply passes this take a look at. Mission completed.

Probe can ping DUT

$ netlab join probe
Connecting to container clab-X-probe, beginning bash

Use vtysh to connect with FRR daemon

probe(bash)# ping -c 3 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=0.484 ms
64 bytes from 10.1.0.1: seq=1 ttl=64 time=0.759 ms
64 bytes from 10.1.0.1: seq=2 ttl=64 time=0.661 ms

--- 10.1.0.1 ping statistics ---
3 packets transmitted, 3 packets acquired, 0% packet loss
round-trip min/avg/max = 0.484/0.634/0.759 ms

Right here’s the related netlab topology if you wish to rerun the experiment. The one change is in line 13 (two hyperlinks as a substitute of 1).

netlab lab topology configuring two static routes for a similar vacation spot

supplier: clab
module: [ routing ]

nodes:
  dut:
    machine: vjunos-switch
    routing.static:
    - node: probe
      nexthop.node: probe
  probe:
    machine: frr

hyperlinks: [ dut-probe, dut-probe ]

Most gadgets move this take a look at with flying colours, however the ping command fails with vJunos-switch till we ping the swap from the directly-connected Linux interface:

A ping from the Linux loopback interface fails till we do a easy ping

probe(bash)# ping 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
^C
--- 10.1.0.1 ping statistics ---
5 packets transmitted, 0 packets acquired, 100% packet loss
probe(bash)# ping -c 1 10.1.0.1
PING 10.1.0.1 (10.1.0.1): 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=2.248 ms

--- 10.1.0.1 ping statistics ---
1 packets transmitted, 1 packets acquired, 0% packet loss
round-trip min/avg/max = 2.248/2.248/2.248 ms
probe(bash)# ping -c 3 10.1.0.1 -I 10.0.0.2
PING 10.1.0.1 (10.1.0.1) from 10.0.0.2: 56 information bytes
64 bytes from 10.1.0.1: seq=0 ttl=64 time=0.437 ms
64 bytes from 10.1.0.1: seq=1 ttl=64 time=0.775 ms
64 bytes from 10.1.0.1: seq=2 ttl=64 time=0.489 ms

--- 10.1.0.1 ping statistics ---
3 packets transmitted, 3 packets acquired, 0% packet loss
round-trip min/avg/max = 0.437/0.567/0.775 ms

Bizarre ARP requests despatched by the Linux node

$ netlab seize probe eth1 ip or arp
Beginning packet seize on probe/eth1: sudo ip netns exec clab-X-probe tcpdump -i eth1 --immediate-mode -l -vv ip or arp
tcpdump: listening on eth1, link-type EN10MB (Ethernet), snapshot size 262144 bytes
11:31:43.900846 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:44.903669 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:45.927675 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28
11:31:47.901226 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.1.0.1 inform 10.0.0.2, size 28

I can’t determine what’s weirder:

Linux is sending an ARP request from the IP deal with belonging to a different interface
Junos will not be answering that ARP request.

However Why Did It Work with a Single Hyperlink?

Right here’s the head-scratching half: why did a single-link take a look at work whereas the two-link take a look at requires an ARP warmup train?

vJunos-switch configuration, routing, and ARP desk with a single hyperlink

admin@dut> present configuration routing-options
static {
    route 10.0.0.2/32 next-hop 10.1.0.2;
}

admin@dut> present route desk inet.0

inet.0: 4 locations, 4 routes (4 energetic, 0 holddown, 0 hidden)
Restrict/Threshold: 1048576/1048576 locations
+ = Energetic Route, - = Final Energetic, * = Each

10.0.0.1/32        *[Direct/0] 00:02:29
                    >  through lo0.0
10.0.0.2/32        *[Static/5] 00:02:26
                    >  to 10.1.0.2 through ge-0/0/0.0
10.1.0.0/30        *[Direct/0] 00:02:29
                    >  through ge-0/0/0.0
10.1.0.1/32        *[Local/0] 00:02:29
                       Native through ge-0/0/0.0

admin@dut> present arp
MAC Deal with       Deal with         Identify                      Interface               Flags
52:55:0a:00:00:02 10.0.0.2        probe                     fxp0.0                  none
aa:c1:ab:81:dd:60 10.1.0.2        10.1.0.2                  ge-0/0/0.0              none
02:00:00:00:00:10 128.0.0.16      fpc0                      em1.0                   none
Whole entries: 3

Junos static route has two subsequent hops (for 2 hyperlinks), however there’s a single next-hop ARP entry

admin@dut> present configuration routing-options
static {
    route 10.0.0.2/32 next-hop [ 10.1.0.2 10.1.0.6 ];
}

admin@dut> present route desk inet.0

inet.0: 6 locations, 6 routes (6 energetic, 0 holddown, 0 hidden)
Restrict/Threshold: 1048576/1048576 locations
+ = Energetic Route, - = Final Energetic, * = Each

10.0.0.1/32        *[Direct/0] 00:13:08
                    >  through lo0.0
10.0.0.2/32        *[Static/5] 00:13:05
                       to 10.1.0.2 through ge-0/0/0.0
                    >  to 10.1.0.6 through ge-0/0/1.0
10.1.0.0/30        *[Direct/0] 00:13:08
                    >  through ge-0/0/0.0
10.1.0.1/32        *[Local/0] 00:13:08
                       Native through ge-0/0/0.0
10.1.0.4/30        *[Direct/0] 00:13:08
                    >  through ge-0/0/1.0
10.1.0.5/32        *[Local/0] 00:13:08
                       Native through ge-0/0/1.0

admin@dut> present arp
MAC Deal with       Deal with         Identify                      Interface               Flags
52:55:0a:00:00:02 10.0.0.2        probe                     fxp0.0                  none
aa:c1:ab:b7:35:90 10.1.0.6        10.1.0.6                  ge-0/0/1.0              none
02:00:00:00:00:10 128.0.0.16      fpc0                      em1.0                   none
Whole entries: 3

The ARP cache on Linux mirrors that:

ARP cache on Linux node comprises a single Junos IP deal with

$ netlab join probe arp
Connecting to container clab-X-probe, executing arp
ge-0-0-1.0.dut (10.1.0.5) at 0c:00:8b:48:ac:02 [ether]  on eth2
ge-0-0-0.0.dut (10.1.0.1) at   on eth1

In the end, my unimaginable “luck” (I selected the mistaken interface to ping) created this good storm, and I don’t need to inform you how a lot time we wasted chasing this specific gremlin.

Of Course, There’s a sysctl Hack

How ARP Killed a Static Route « ipSpace.web weblog

What’s All This FECi Stuff

The Hidden Threats inside EMEA’s Fiber Infrastructure

Orange Companions with Camusat to Handle Scope 3 Sustainability Problem – IT Connection

Ripple’s xrpl.js npm Bundle Backdoored to Steal Non-public Keys in Main Provide Chain Assault

What day of the yr may have the fewest noninduced births? (Distinction between mathematical and statistical reasoning)

Md Sazzad Hossain

Related Posts

What’s All This FECi Stuff

The Hidden Threats inside EMEA’s Fiber Infrastructure

Orange Companions with Camusat to Handle Scope 3 Sustainability Problem – IT Connection

Inventory your Kindle for summer season: Stand up to 93% off in style reads throughout Amazon’s Guide Sale

Switching, Routing, and Bridging Terminology « ipSpace.web weblog

What day of the yr may have the fewest noninduced births? (Distinction between mathematical and statistical reasoning)

Leave a Reply Cancel reply

Recommended

Telcos’ New Agenda is App Modernization – IT Connection

Diese Unternehmen hat es schon erwischt

Categories

CyberDefenseGo

Recent

Vital SAP Exploit, AI-Powered Phishing, Main Breaches, New CVEs & Extra

John Gaeta, Founder and CEO of Escape.ai – Interview Sequence

Search

Welcome Back!

Retrieve your password

How ARP Killed a Static Route « ipSpace.web weblog

However Why Did It Work with a Single Hyperlink?

Of Course, There’s a sysctl Hack

You might also like

However Why Did It Work with a Single Hyperlink?

Of Course, There’s a sysctl Hack

Ripple’s xrpl.js npm Bundle Backdoored to Steal Non-public Keys in Main Provide Chain Assault

What day of the yr may have the fewest noninduced births? (Distinction between mathematical and statistical reasoning)

Related Posts

Leave a Reply Cancel reply

Recommended

Categories

CyberDefenseGo

Recent

Search

Welcome Back!

Retrieve your password