Introduction

Welcome to this 2-part series around the open source networking tool - Batfish. Within this series we will cover:

  • An overview of Batfish.
  • How to install Batfish.
  • Using Batfish to perform configuration analysis.
  • Using Batfish to perform impact analysis.

Let’s begin...

What is Batfish?

Batfish is an open source network configuration analysis tool, that provides the ability to validate configuration data, query network adjacencies, verify firewall ACL rule sets and also analyze routing/flow paths.

In other words (https://www.batfish.org/),

Batfish finds errors and guarantees the correctness of planned or current network configurations. It enables safe and rapid network evolution, without the fear of outages or security breaches.[1]

Batfish runs as a containerized service, with operations against the Batfish service/API performed via the python SDK - pybatfish.

Network configurations are packaged and fed to Batfish via snapshots. A snapshot being a collection of information (configuration files, routing data, up/down status of nodes and links) that represent the state of the network. Therefore, Batfish does NOT require direct access to network devices.[2] The required layout of a snapshot is based on the following folders:

  • configs/ - network device configuration files.
  • hosts/ - host configuration files.
  • iptables/ - host iptable configuration files.

From the data supplied via the snapshot, Batfish builds a series of models which are then queried using various questions using pybatfish (example shown below),

>>> bfq.ipOwners().answer().frame().head()
status: ASSIGNED
.... no task information
status: TERMINATEDNORMALLY
.... Tue Jun 25 21:15:10 2019 DST Begin job.
      Node         VRF    Interface            IP Mask Active
0    leaf1     default       Vlan10  172.16.1.254   24   True
1  server2     default         eth1    172.16.2.1   24   True
2   spine1  management        mgmt0  10.255.0.130   16  False
3  server1     default         eth1    172.16.1.1   24   True
4   spine1     default  Ethernet1/4      10.1.0.1   30   True

Installation

The installation is based on 2 steps. First, we deploy the service via Docker. We then install the client.

Install Service

docker pull batfish/allinone
docker run --name batfish -d -v batfish-data:/data -p 8888:8888 -p 9997:9997 -p 9996:9996 batfish/allinone

In addition to an API, the Batfish service also provides a set of Jupyter notebooks. This gives you a set of pre-baked how-to guides and scenarios so you can start kicking the tyres with minimal fuss. Also as they are provided as Jupyter notebooks you get additional benefits such as being able to interact with Python inline via the browser. To access this go to <your_ip>:8888. You will be asked for a token, which you can get from running the command docker logs batfish

image1
Figure 1 - Jupyter Notebook.

Install Client

Next, we install pybatfish via a pip. Like so,

apt install python3-pip
python3 -m pip install --upgrade git+https://github.com/batfish/pybatfish.git

Example Topology

For the context of this article, our example network will be based on the following topology.

In short, a spine and leaf topology, using OSPF to distribute the loopbacks and eBGP peerings between the spines and leafs.

image2
Figure 2 - Example Topology.

To download a pre-packaged snapshot of our snapshot, clone the following repo:

cd /opt/ 
git clone https://github.com/rickdonato/network-automation.git
cd network-automation/batfish

Initialize Network and Snapshot

Next, via pybatfish, we will initialize our snapshot and also load the batfish questions.

Imports

>>> from pybatfish.client.commands import *
>>> from pybatfish.question import bfq
>>> from pybatfish.question.question import load_questions
>>> from pybatfish.datamodel.flow import (HeaderConstraints,
                                         PathConstraints)
>>>
>>> bf_session.host = "172.29.236.139" # <batfish_service_ip>
>>> load_questions()
Successfully loaded 63 questions from remote

Initialize

>>> NETWORK_NAME = "ebgp-spine-leaf-network"
>>> SNAPSHOT_NAME = "ebgp-spine-leaf-snapshot"
>>> SNAPSHOT_PATH = "nxos9k-ebgp-spine-leaf/snapshot-1"

>>> bf_set_network(NETWORK_NAME)
>>> bf_init_snapshot(SNAPSHOT_PATH, name=SNAPSHOT_NAME, overwrite=True)

Validate Parsing

Once you have initialized the snapshot, we need to check how successful Batfish has been at parsing our configurations.

To display an overview run the command bfq.fileParseStatus().answer().frame().

Like so,

>>> bfq.fileParseStatus().answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 09:13:37 2019 DST Begin job.
                   File_Name  Status                          Nodes
0          configs/leaf1.cfg  PASSED                      ['leaf1']
1          configs/leaf2.cfg  PASSED                      ['leaf2']
2         configs/spine1.cfg  PASSED                     ['spine1']
3         configs/spine2.cfg  PASSED                     ['spine2']
4         hosts/server1.json  PASSED                    ['server1']
5         hosts/server2.json  PASSED                    ['server2']
6  iptables/server1.iptables  PASSED  ['iptables/server1.iptables']
7  iptables/server2.iptables  PASSED  ['iptables/server2.iptables']

As you can see we have no errors, as everything is showing as PASSED. However, if this was not the case further information can be found, such as which parts of the configuration were problematic via the command - bfq.parseWarning().answer().frame().

Note: You may be asking: what .frame() is used for. In short, this wraps the answer as a Pandas DataFrame.

Pandas is a high-level data manipulation tool developed by Wes McKinney. It is built on the Numpy package and its key data structure is called the DataFrame. DataFrames allow you to store and manipulate tabular data in rows of observations and columns of variables.[3]

In other words, the Pandas Dataframe provides us with a great data structure and set of methods/options that we can use when working with our results.

Configuration Analysis

Now that Batfish is installed, and we have our snapshot initialized we can start to have some fun.

First of all, we will perform configuration analysis in order to ensure our network configuration is aligned to a set of requirements.

Our example will be based on the following requirements:

  • Jumbo frames are enabled on the links between the leafs and spines.
  • Loopbacks are learned across nodes via OSPF and shown within RIBs.
  • BGP Multipath is enabled.
  • Leafs are configured to only form BGP peerings with spines.
  • Spines are configured to only form BGP peerings with leafs.

In addition, we will also verify, based on our configurations,

  • Each BGP adjacency is successfully established.

Note: As noted by one of our readers you may find in the newer versions of Batfish (>=2019.07.31) using | as the seperator within your question may raise an exception. Instead use ,, like so - "Multipath_EBGP,Neighbors".

Let’s begin…

Compliance Checks

Jumbo Frames

Our first check will report any interfaces (excluding edge and loopbacks) that have an MTU lower than 9216.

First we collect the MTU and interface description properties for the leaf and spine nodes only. We also drop any rows that contain a None value using the dropna() method.

>>> node_props = bfq.interfaceProperties(nodes="/leaf|spine/", properties="Description|MTU").answer().frame().dropna()

Next we filter only the interface descriptions that do not contain the strings to server, Loopback or OOB. The key part to this filter expression is the invert operator - ~.

>>> node_props = node_props[~node_props['Description'].str.contains('to server|Loopback|OOB')]

We can then perform a lambda function against our node properties to display only the MTUs that are lower than 9216.

>>> node_props[node_props["MTU"].apply(lambda x: x < 9216)]
               Interface Description   MTU
143  spine2[Ethernet1/1]    to leaf2  1500
144  spine2[Ethernet1/2]    to leaf2  1500
145  spine2[Ethernet1/3]    to leaf1  1500
146  spine1[Ethernet1/1]    to leaf1  1500
147  spine1[Ethernet1/4]    to leaf2  1500
148  spine2[Ethernet1/4]    to leaf1  1500
149  spine1[Ethernet1/2]    to leaf1  1500
189  spine1[Ethernet1/3]    to leaf2  1500
243   leaf1[Ethernet1/4]   to spine2  1500
244   leaf1[Ethernet1/1]   to spine1  1500
245   leaf1[Ethernet1/2]   to spine1  1500
246   leaf2[Ethernet1/4]   to spine1  1500
247   leaf1[Ethernet1/3]   to spine2  1500
248   leaf2[Ethernet1/3]   to spine1  1500
249   leaf2[Ethernet1/2]   to spine2  1500
250   leaf2[Ethernet1/1]   to spine2  1500

As you can see from the result of this check is that jumbo frames are not enabled!

Note: Let’s say you wanted to move the above into a script that could be plugged into a CI pipeline we could use additional methods of the Panda Dataframe. For example, one method that can be extremely useful is .empty which returns a bool(), i.e False if 0 rows are returned.

Loopback Advertisement via OSPF

Next, we will check the RIBs to ensure that the loopbacks have been learned via OSPF.

This is achieved via the bfq.routes() question, allowing us to search through the RIBs of each of the leaf and spine nodes.

>>> routes = bfq.routes(nodes="/spine|leaf/", protocols="ospf").answer().frame()
status: ASSIGNED
.... Wed Jun 26 12:24:04 2019 DST Begin job.
status: TERMINATEDNORMALLY
.... Wed Jun 26 12:24:04 2019 DST Begin job.
>>> routes[routes['Network'].str.contains('/32')]
      Node      VRF     Network Next_Hop Next_Hop_IP Next_Hop_Interface Protocol Metric Admin_Distance   Tag
0    leaf2  default  2.2.2.2/32   spine2  10.2.128.1            dynamic     ospf     45            110  None
1    leaf1  default  2.2.2.2/32   spine2    10.2.0.5            dynamic     ospf     45            110  None
2    leaf2  default  3.3.3.3/32   spine1  10.1.128.1            dynamic     ospf     85            110  None
3    leaf1  default  2.2.2.2/32   spine2    10.2.0.1            dynamic     ospf     45            110  None
6    leaf2  default  3.3.3.3/32   spine2  10.2.128.1            dynamic     ospf     85            110  None
8   spine1  default  3.3.3.3/32    leaf1    10.0.0.2            dynamic     ospf     45            110  None
9    leaf1  default  4.4.4.4/32   spine1    10.0.0.1            dynamic     ospf     85            110  None
10   leaf2  default  2.2.2.2/32   spine2  10.2.128.5            dynamic     ospf     45            110  None
11   leaf2  default  1.1.1.1/32   spine1    10.1.0.1            dynamic     ospf     45            110  None
13   leaf1  default  4.4.4.4/32   spine1  10.0.128.1            dynamic     ospf     85            110  None
15  spine2  default  1.1.1.1/32    leaf1    10.2.0.6            dynamic     ospf     85            110  None
18  spine1  default  2.2.2.2/32    leaf2  10.1.128.2            dynamic     ospf     85            110  None
20  spine2  default  1.1.1.1/32    leaf1    10.2.0.2            dynamic     ospf     85            110  None
25   leaf1  default  1.1.1.1/32   spine1  10.0.128.1            dynamic     ospf     45            110  None
26   leaf2  default  3.3.3.3/32   spine1    10.1.0.1            dynamic     ospf     85            110  None
30  spine1  default  2.2.2.2/32    leaf1    10.0.0.2            dynamic     ospf     85            110  None
… <output omitted>

From this we can see the loopbacks within the RIBs and that they have been learnt via OSPF. GREAT!

BGP Multipath

We will now check that BGP multipath is configured using the bgpProcessConfiguration question. Like so,

>>> bfq.bgpProcessConfiguration(properties="Multipath_EBGP|Neighbors").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 14:36:13 2019 DST Begin job.
     Node      VRF Router_ID Multipath_EBGP                     Neighbors
0  spine1  default   1.1.1.1          False  ['3.3.3.3/32', '4.4.4.4/32']
1  spine2  default   2.2.2.2          False  ['3.3.3.3/32', '4.4.4.4/32']
2   leaf1  default   3.3.3.3          False  ['1.1.1.1/32', '2.2.2.2/32']
3   leaf2  default   4.4.4.4          False  ['1.1.1.1/32', '2.2.2.2/32']

With this we have discovered BGP multipath is not configured within our topology!

In addition we can also use bfq.traceroute(). This will return the various hops (trace) in the network for the given start and destination locations for each path that traffic would take. Therefore for our topology we would expect to see 8 traces for our flow. As you can see from the example below, only 4 traces are shown.

>>> len(bfq.traceroute(startLocation='server1',
...     headers=HeaderConstraints(dstIps='server2'
...     )).answer().frame()['Traces'][0])
status: TERMINATEDNORMALLY
.... Wed Jul  3 15:34:47 2019 DST Begin job.
4

BGP Peer Configuration

Let's now turn our attention to checking the BGP peering configuration. Ensuring:

  • leafs only allowed to peer with spines,
  • spines only allowed to peer with leafs.
>>> bfq.bgpPeerConfiguration(nodes="/leaf/", properties="Remote_IP").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 11:05:35 2019 DST Begin job.
    Node      VRF Local_Interface Remote_IP
0  leaf1  default            None   2.2.2.2
1  leaf2  default            None   1.1.1.1
2  leaf2  default            None   2.2.2.2
3  leaf1  default            None   1.1.1.1

>>> bfq.bgpPeerConfiguration(nodes="/spine/", properties="Remote_IP").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 11:05:42 2019 DST Begin job.
     Node      VRF Local_Interface Remote_IP
0  spine1  default            None   4.4.4.4
1  spine2  default            None   3.3.3.3
2  spine1  default            None   3.3.3.3
3  spine2  default            None   4.4.4.4

Great. Our configuration is correct as per our compliance requirements.

BGP Session Establishment

We will now check that the BGP sessions will successfully establish based on the configurations. This is achieved via the bfq.bgpSessionStatus() question.

>>> bfq.bgpSessionStatus(nodes="/spine|leaf/").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 15:01:16 2019 DST Begin job.
     Node      VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP   Session_Type Established_Status
0   leaf1  default    64521            None  3.3.3.3     64520      spine1             None   1.1.1.1  EBGP_MULTIHOP        ESTABLISHED
1   leaf1  default    64521            None  3.3.3.3     64520      spine2             None   2.2.2.2  EBGP_MULTIHOP        ESTABLISHED
2   leaf2  default    64522            None  4.4.4.4     64520      spine1             None   1.1.1.1  EBGP_MULTIHOP        ESTABLISHED
3   leaf2  default    64522            None  4.4.4.4     64520      spine2             None   2.2.2.2  EBGP_MULTIHOP        ESTABLISHED
4  spine1  default    64520            None  1.1.1.1     64521       leaf1             None   3.3.3.3  EBGP_MULTIHOP        ESTABLISHED
5  spine1  default    64520            None  1.1.1.1     64522       leaf2             None   4.4.4.4  EBGP_MULTIHOP        ESTABLISHED
6  spine2  default    64520            None  2.2.2.2     64521       leaf1             None   3.3.3.3  EBGP_MULTIHOP        ESTABLISHED
7  spine2  default    64520            None  2.2.2.2     64522       leaf2             None   4.4.4.4  EBGP_MULTIHOP        ESTABLISHED

For sessions not showing as established, further information as to why can be found using bfq.bgpSessionCompatibility.

Outro

I hope you have enjoyed part 1 of -- Unleashing the Batfish -- stay tuned (check out our free newsletter) for the 2nd installment where we will be diving into Impact Analysis.

References


  1. "Batfish - An open source network configuration analysis tool." https://www.batfish.org/. Accessed 25 Jun. 2019. ↩︎

  2. "batfish/batfish: Batfish is a network configuration analysis tool ... - GitHub." https://github.com/batfish/batfish. Accessed 2 Jul. 2019. ↩︎

  3. "Pandas Basics - Learn Python - Free Interactive Python Tutorial." https://www.learnpython.org/en/Pandas_Basics. Accessed 3 Jul. 2019. ↩︎