This issue is related to an OSPF route that is in the OSPF database but doesn’t get installed in the routing table.
The sample topology is below. R16 is a router, with the interfaces towards R17/R18 in a vrf called CustA.
R16 isn’t installing R19’s loopback0 in the routing table.
show ip route vrf CustA Routing Table: CustA 184.108.40.206/32 is subnetted, 2 subnets O 220.127.116.11 [110/11] via 172.23.17.17, 00:00:45, Ethernet0/0.1617 O 18.104.22.168 [110/11] via 172.23.16.18, 00:00:45, Ethernet0/0.1618 172.23.0.0/16 is variably subnetted, 5 subnets, 2 masks O 172.23.18.0/24 [110/20] via 172.23.17.17, 00:00:45, Ethernet0/0.1617 [110/20] via 172.23.16.18, 00:00:45, Ethernet0/0.1618 192.168.1.0/32 is subnetted, 1 subnets C 192.168.1.16 is directly connected, Loopback1
But 22.214.171.124 shows up in the OSPF database??
sh ip ospf database summary 126.96.36.199 OSPF Router with ID (192.168.1.16) (Process ID 2) Summary Net Link States (Area 10) LS age: 1564 Options: (No TOS-capability, DC, Upward) LS Type: Summary Links(Network) Link State ID: 188.8.131.52 (summary Network Number) Advertising Router: 184.108.40.206 LS Seq Number: 80000009 Checksum: 0x16F6 Length: 28 Network Mask: /32 MTID: 0 Metric: 11
So the next step was turning on debugging. Specifically, debug ip ospf 2 spf which will show the output of any spf calculations. After doing a shut/no shut on the R19 loopback the following debugs showed up:
OSPF-2 INTER: Start partial processing: type 3, LSID 220.127.116.11, mask 255.255.255.255, OSPF-2 INTER: adv_rtr 18.104.22.168, age 3600, seq 0x80000002, area 10 OSPF-2 INTER: Downward bit set/Non-backbone LSA
So what does Downward bit set/Non-backbone LSA mean?
Well the Downward bit set, this originated in RFC4577 and is there to specifically address loop prevention when using OSPF as a PE-CE routing protocol for MPLS/BGP. Specifically it addresses the issue of routes coming from the PE router’s BGP process being redistributed into OSPF, then from OSPF back to BGP which would cause a loop. To get around this issue, the DN Bit was used to identify routes that are redistributed from a PE router to the CE router, once the DN Bit is set to 1 or Downward(in IOS) the route would be ignored by any other PE. This also covers sites that are using multiple PE-CE routers and connections.
This makes sense that the PE Router ignores the LSA for SPF when the DN bit is set. But it isn’t set in this case as you can see here in the pcap capture:
The DN bit is not set! What now? So we have covered the Downward bit set part of the error message. So it must be related to the second part of the message which is Non-backbone LSA. If we go back to RFC4577 and look a little deeper there is a section on PEs and OSPF Area 0. It states that “If a PE attaches to a CE via a link that is in a non-zero area, then the PE serves as an ABR for that area.” So given that the PE functions as an area border router (ABR) for that area, they are allowed to flood inter-area routes to the CE using Type 3 LSAs. The RFC also states that if the OSPF domain connecting to the PE router has any area 0 routers, they must connect to the PE directly or through a virtual link.
This means that the MPLS network functions as a “Super backbone”, allowing discontiguous area 0 networks to be disconnected but only if they are connected to the super backbone which functions as a third level of hierarchy above are 0. This sounds like the issue, but we’re not running a super backbone running MPLS and BGP. But if we look at the ospf process, by running show ip ospf it actually thinks we are connected to MPLS VPN Superbackbone shown below.
R16#sh ip ospf Routing Process "ospf 2" with ID 192.168.1.16 Domain ID type 0x0005, value 0.0.0.2 Start time: 00:00:49.870, Time elapsed: 10:47:03.240 Supports only single TOS(TOS0) routes Supports opaque LSA Supports Link-local Signaling (LLS) Supports area transit capability Supports NSSA (compatible with RFC 3101) Connected to MPLS VPN Superbackbone, VRF CustA
Aaah, finally the router thinks because we are running ospf in a VRF we are automatically going to be treated as a PE and therefore we need to implement the loop prevention/design constraints. Our route is being rejected because it’s coming from area 0, through to area 10 where R16 is acting as a PE router ignoring it.
So, we need to treat it like an MPLS enabled PE and turn off those loop prevention checks. The command to do this is:
From the Cisco site: The OSPF Support for Multi-VRF on CE Routers feature provides the capability of suppressing provider edge (PE) checks that are needed to prevent loops when the PE is performing a mutual redistribution of packets between the OSPF and BGP protocols. When VPN routing and forward (VRF) is used on a router that is not a PE (that is, one that is not running BGP), the checks can be turned off to allow for correct population of the VRF routing table with routes to IP prefixes.
As soon as this is enabled, the SPF runs and accepts the LSA.
OSPF-2 INTER: Start partial processing: type 3, LSID 22.214.171.124, mask 255.255.255.255, OSPF-2 INTER: adv_rtr 126.96.36.199, age 1, seq 0x80000001, area 10 OSPF-2 SPF : Add better path to LSA ID 188.8.131.52, gateway 0.0.0.0, dist 21 OSPF-2 SPF : Add path: next-hop 172.23.17.17, interface Ethernet0/0.1617 OSPF-2 INTER: Add succeeded for summary route to 184.108.40.206/255.255.255.255, metric 21 OSPF-2 INTER: next-hop Ethernet0/0.1617/172.23.17.17, area 10
As you can see in this output the Routing Bit Set on this LSA which means that the route is valid and present in the routing table.
show ip ospf database summary 220.127.116.11 OSPF Router with ID (192.168.1.16) (Process ID 2) Summary Net Link States (Area 10) Routing Bit Set on this LSA in topology Base with MTID 0 LS age: 79 Options: (No TOS-capability, DC, Upward) LS Type: Summary Links(Network) Link State ID: 18.104.22.168 (summary Network Number) Advertising Router: 22.214.171.124 LS Seq Number: 8000000A Checksum: 0x14F7 Length: 28 Network Mask: /32 MTID: 0 Metric: 11
I have struggled with this for quite awhile, not knowing exactly why you should run capability vrf-lite.
Short answer, if you are running OSPF in a VRF you need to turn on the vrf-lite capability but it is extremely useful to understand exactly why.