Welcome to this 2-part series around the open source networking tool - Batfish. Within this series we will cover:
- An overview of Batfish.
- How to install Batfish.
- Using Batfish to perform configuration analysis.
- Using Batfish to perform impact analysis.
What is Batfish?
Batfish is an open source network configuration analysis tool, that provides the ability to validate configuration data, query network adjacencies, verify firewall ACL rule sets and also analyze routing/flow paths.
In other words (https://www.batfish.org/),
Batfish finds errors and guarantees the correctness of planned or current network configurations. It enables safe and rapid network evolution, without the fear of outages or security breaches.
Batfish runs as a containerized service, with operations against the Batfish service/API performed via the python SDK - pybatfish.
Network configurations are packaged and fed to Batfish via snapshots. A snapshot being a collection of information (configuration files, routing data, up/down status of nodes and links) that represent the state of the network. Therefore, Batfish does NOT require direct access to network devices. The required layout of a snapshot is based on the following folders:
- configs/ - network device configuration files.
- hosts/ - host configuration files.
- iptables/ - host iptable configuration files.
From the data supplied via the snapshot, Batfish builds a series of models which are then queried using various questions using pybatfish (example shown below),
>>> bfq.ipOwners().answer().frame().head() status: ASSIGNED .... no task information status: TERMINATEDNORMALLY .... Tue Jun 25 21:15:10 2019 DST Begin job. Node VRF Interface IP Mask Active 0 leaf1 default Vlan10 172.16.1.254 24 True 1 server2 default eth1 172.16.2.1 24 True 2 spine1 management mgmt0 10.255.0.130 16 False 3 server1 default eth1 172.16.1.1 24 True 4 spine1 default Ethernet1/4 10.1.0.1 30 True
The installation is based on 2 steps. First, we deploy the service via Docker. We then install the client.
docker pull batfish/allinone docker run --name batfish -d -v batfish-data:/data -p 8888:8888 -p 9997:9997 -p 9996:9996 batfish/allinone
In addition to an API, the Batfish service also provides a set of Jupyter notebooks. This gives you a set of pre-baked how-to guides and scenarios so you can start kicking the tyres with minimal fuss. Also as they are provided as Jupyter notebooks you get additional benefits such as being able to interact with Python inline via the browser. To access this go to
<your_ip>:8888. You will be asked for a token, which you can get from running the command
docker logs batfish
Figure 1 - Jupyter Notebook.
Next, we install pybatfish via a
pip. Like so,
apt install python3-pip python3 -m pip install --upgrade git+https://github.com/batfish/pybatfish.git
For the context of this article, our example network will be based on the following topology.
In short, a spine and leaf topology, using OSPF to distribute the loopbacks and eBGP peerings between the spines and leafs.
Figure 2 - Example Topology.
To download a pre-packaged snapshot of our snapshot, clone the following repo:
cd /opt/ git clone https://github.com/rickdonato/batfish.git cd batfish
Initialize Network and Snapshot
Next, via pybatfish, we will initialize our snapshot and also load the batfish questions.
>>> from pybatfish.client.commands import * >>> from pybatfish.question import bfq >>> from pybatfish.question.question import load_questions >>> from pybatfish.datamodel.flow import (HeaderConstraints, PathConstraints) >>> >>> bf_session.host = "172.29.236.139" # <batfish_service_ip> >>> load_questions() Successfully loaded 63 questions from remote
>>> NETWORK_NAME = "ebgp-spine-leaf-network" >>> SNAPSHOT_NAME = "ebgp-spine-leaf-snapshot" >>> SNAPSHOT_PATH = "nxos9k-ebgp-spine-leaf/snapshot-1" >>> bf_set_network(NETWORK_NAME) >>> bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)
Once you have initialized the snapshot, we need to check how successful Batfish has been at parsing our configurations.
To display an overview run the command
>>> bfq.fileParseStatus().answer().frame() status: TERMINATEDNORMALLY .... Wed Jun 26 09:13:37 2019 DST Begin job. File_Name Status Nodes 0 configs/leaf1.cfg PASSED ['leaf1'] 1 configs/leaf2.cfg PASSED ['leaf2'] 2 configs/spine1.cfg PASSED ['spine1'] 3 configs/spine2.cfg PASSED ['spine2'] 4 hosts/server1.json PASSED ['server1'] 5 hosts/server2.json PASSED ['server2'] 6 iptables/server1.iptables PASSED ['iptables/server1.iptables'] 7 iptables/server2.iptables PASSED ['iptables/server2.iptables']
As you can see we have no errors, as everything is showing as
PASSED. However, if this was not the case further information can be found, such as which parts of the configuration were problematic via the command -
Note: You may be asking: what
.frame() is used for. In short, this wraps the answer as a Pandas DataFrame.
Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.
In other words, the Pandas Dataframe provides us with a great data structure and set of methods/options that we can use when working with our results.
Now that Batfish is installed, and we have our snapshot initialized we can start to have some fun.
First of all, we will perform configuration analysis in order to ensure our network configuration is aligned to a set of requirements.
Our example will be based on the following requirements:
- Jumbo frames are enabled on the links between the leafs and spines.
- Loopbacks are learned across nodes via OSPF and shown within RIBs.
- BGP Multipath is enabled.
- Leafs are configured to only form BGP peerings with spines.
- Spines are configured to only form BGP peerings with leafs.
In addition, we will also verify, based on our configurations,
- Each BGP adjacency is successfully established.
Note: As noted by one of our readers you may find in the newer versions of Batfish (>=2019.07.31) using
| as the seperator within your question may raise an exception. Instead use
,, like so -
Our first check will report any interfaces (excluding edge and loopbacks) that have an MTU lower than 9216.
First we collect the MTU and interface description properties for the leaf and spine nodes only. We also drop any rows that contain a
None value using the
>>> node_props = bfq.interfaceProperties(nodes="/leaf|spine/", properties="Description|MTU").answer().frame().dropna()
Next we filter only the interface descriptions that do not contain the strings to server, Loopback or OOB. The key part to this filter expression is the invert operator -
>>> node_props = node_props[~node_props['Description'].str.contains('to server|Loopback|OOB')]
We can then perform a lambda function against our node properties to display only the MTUs that are lower than 9216.
>>> node_props[node_props["MTU"].apply(lambda x: x < 9216)] Interface Description MTU 143 spine2[Ethernet1/1] to leaf2 1500 144 spine2[Ethernet1/2] to leaf2 1500 145 spine2[Ethernet1/3] to leaf1 1500 146 spine1[Ethernet1/1] to leaf1 1500 147 spine1[Ethernet1/4] to leaf2 1500 148 spine2[Ethernet1/4] to leaf1 1500 149 spine1[Ethernet1/2] to leaf1 1500 189 spine1[Ethernet1/3] to leaf2 1500 243 leaf1[Ethernet1/4] to spine2 1500 244 leaf1[Ethernet1/1] to spine1 1500 245 leaf1[Ethernet1/2] to spine1 1500 246 leaf2[Ethernet1/4] to spine1 1500 247 leaf1[Ethernet1/3] to spine2 1500 248 leaf2[Ethernet1/3] to spine1 1500 249 leaf2[Ethernet1/2] to spine2 1500 250 leaf2[Ethernet1/1] to spine2 1500
As you can see from the result of this check is that jumbo frames are not enabled!
Note: Let’s say you wanted to move the above into a script that could be plugged into a CI pipeline we could use additional methods of the Panda Dataframe. For example, one method that can be extremely useful is
.empty which returns a
False if 0 rows are returned.
Loopback Advertisement via OSPF
Next, we will check the RIBs to ensure that the loopbacks have been learned via OSPF.
This is achieved via the
bfq.routes() question, allowing us to search through the RIBs of each of the leaf and spine nodes.
>>> routes = bfq.routes(nodes="/spine|leaf/", protocols="ospf").answer().frame() status: ASSIGNED .... Wed Jun 26 12:24:04 2019 DST Begin job. status: TERMINATEDNORMALLY .... Wed Jun 26 12:24:04 2019 DST Begin job. >>> routes[routes['Network'].str.contains('/32')] Node VRF Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance Tag 0 leaf2 default 220.127.116.11/32 spine2 10.2.128.1 dynamic ospf 45 110 None 1 leaf1 default 18.104.22.168/32 spine2 10.2.0.5 dynamic ospf 45 110 None 2 leaf2 default 22.214.171.124/32 spine1 10.1.128.1 dynamic ospf 85 110 None 3 leaf1 default 126.96.36.199/32 spine2 10.2.0.1 dynamic ospf 45 110 None 6 leaf2 default 188.8.131.52/32 spine2 10.2.128.1 dynamic ospf 85 110 None 8 spine1 default 184.108.40.206/32 leaf1 10.0.0.2 dynamic ospf 45 110 None 9 leaf1 default 220.127.116.11/32 spine1 10.0.0.1 dynamic ospf 85 110 None 10 leaf2 default 18.104.22.168/32 spine2 10.2.128.5 dynamic ospf 45 110 None 11 leaf2 default 22.214.171.124/32 spine1 10.1.0.1 dynamic ospf 45 110 None 13 leaf1 default 126.96.36.199/32 spine1 10.0.128.1 dynamic ospf 85 110 None 15 spine2 default 188.8.131.52/32 leaf1 10.2.0.6 dynamic ospf 85 110 None 18 spine1 default 184.108.40.206/32 leaf2 10.1.128.2 dynamic ospf 85 110 None 20 spine2 default 220.127.116.11/32 leaf1 10.2.0.2 dynamic ospf 85 110 None 25 leaf1 default 18.104.22.168/32 spine1 10.0.128.1 dynamic ospf 45 110 None 26 leaf2 default 22.214.171.124/32 spine1 10.1.0.1 dynamic ospf 85 110 None 30 spine1 default 126.96.36.199/32 leaf1 10.0.0.2 dynamic ospf 85 110 None … <output omitted>
From this we can see the loopbacks within the RIBs and that they have been learnt via OSPF. GREAT!
We will now check that BGP multipath is configured using the
bgpProcessConfiguration question. Like so,
>>> bfq.bgpProcessConfiguration(properties="Multipath_EBGP|Neighbors").answer().frame() status: TERMINATEDNORMALLY .... Wed Jun 26 14:36:13 2019 DST Begin job. Node VRF Router_ID Multipath_EBGP Neighbors 0 spine1 default 188.8.131.52 False ['184.108.40.206/32', '220.127.116.11/32'] 1 spine2 default 18.104.22.168 False ['22.214.171.124/32', '126.96.36.199/32'] 2 leaf1 default 188.8.131.52 False ['184.108.40.206/32', '220.127.116.11/32'] 3 leaf2 default 18.104.22.168 False ['22.214.171.124/32', '126.96.36.199/32']
With this we have discovered BGP multipath is not configured within our topology!
In addition we can also use
bfq.traceroute(). This will return the various hops (trace) in the network for the given start and destination locations for each path that traffic would take. Therefore for our topology we would expect to see 8 traces for our flow. As you can see from the example below, only 4 traces are shown.
>>> len(bfq.traceroute(startLocation='server1', ... headers=HeaderConstraints(dstIps='server2' ... )).answer().frame()['Traces']) status: TERMINATEDNORMALLY .... Wed Jul 3 15:34:47 2019 DST Begin job. 4
BGP Peer Configuration
Let's now turn our attention to checking the BGP peering configuration. Ensuring:
- leafs only allowed to peer with spines,
- spines only allowed to peer with leafs.
>>> bfq.bgpPeerConfiguration(nodes="/leaf/", properties="Remote_IP").answer().frame() status: TERMINATEDNORMALLY .... Wed Jun 26 11:05:35 2019 DST Begin job. Node VRF Local_Interface Remote_IP 0 leaf1 default None 188.8.131.52 1 leaf2 default None 184.108.40.206 2 leaf2 default None 220.127.116.11 3 leaf1 default None 18.104.22.168 >>> bfq.bgpPeerConfiguration(nodes="/spine/", properties="Remote_IP").answer().frame() status: TERMINATEDNORMALLY .... Wed Jun 26 11:05:42 2019 DST Begin job. Node VRF Local_Interface Remote_IP 0 spine1 default None 22.214.171.124 1 spine2 default None 126.96.36.199 2 spine1 default None 188.8.131.52 3 spine2 default None 184.108.40.206
Great. Our configuration is correct as per our compliance requirements.
BGP Session Establishment
We will now check that the BGP sessions will successfully establish based on the configurations. This is achieved via the
>>> bfq.bgpSessionStatus(nodes="/spine|leaf/").answer().frame() status: TERMINATEDNORMALLY .... Wed Jun 26 15:01:16 2019 DST Begin job. Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Session_Type Established_Status 0 leaf1 default 64521 None 220.127.116.11 64520 spine1 None 18.104.22.168 EBGP_MULTIHOP ESTABLISHED 1 leaf1 default 64521 None 22.214.171.124 64520 spine2 None 126.96.36.199 EBGP_MULTIHOP ESTABLISHED 2 leaf2 default 64522 None 188.8.131.52 64520 spine1 None 184.108.40.206 EBGP_MULTIHOP ESTABLISHED 3 leaf2 default 64522 None 220.127.116.11 64520 spine2 None 18.104.22.168 EBGP_MULTIHOP ESTABLISHED 4 spine1 default 64520 None 22.214.171.124 64521 leaf1 None 126.96.36.199 EBGP_MULTIHOP ESTABLISHED 5 spine1 default 64520 None 188.8.131.52 64522 leaf2 None 184.108.40.206 EBGP_MULTIHOP ESTABLISHED 6 spine2 default 64520 None 220.127.116.11 64521 leaf1 None 18.104.22.168 EBGP_MULTIHOP ESTABLISHED 7 spine2 default 64520 None 22.214.171.124 64522 leaf2 None 126.96.36.199 EBGP_MULTIHOP ESTABLISHED
For sessions not showing as established, further information as to why can be found using
I hope you have enjoyed part 1 of Unleashing the Batfish, check out the 2nd installment where we will be diving into Impact Analysis.