Introduction

When it comes to network automation tooling the landscape is vast and can seem daunting. The key to navigating this landscape is understanding the role of each tool. To do this the landscape can be broken down into the following domains:

  • Configuration Management
  • Monitoring and Logging
  • Testing and Compliance
  • Source of Truth
  • Version Control Systems
  • Automation Hub and Continuous Integration/Continuous Delivery [CI/CD]
  • Event-Driven Automation
  • Infrastructure as Code (IaC) Orchestration
  • Security

image1
Figure 1 - Network Automation Landscape.

Configuration Management

There are multiple tools that excel in this space and have different approaches on configuration generation and deployment. Among the most popular are:

Ansible

Ansible is an open source IT automation engine, based upon an agentless architecture. It works by connecting to nodes, automating software provisioning, configuration management, and application deployment.

image7
Figure 2 - Ansible model.[1]

SaltStack

SaltStack as known as Salt contains a flexible configuration management framework, which is built on the remote execution core. This framework executes on the minions, allowing effortless, simultaneous configuration on hosts, by rendering language-specific state files.

Netmiko/NAPALM/Nornir

Netmiko/NAPALM/Nornir are python based libraries and tools that perform different tasks, but when put together can be used for configuration management.

  • Nornir: Is a pure Python automation framework intended to be used directly from python. It deals with inventory and manages the job of dispatching the tasks you want to run against your nodes and devices.
  • NAPALM: Python library that implements a set of functions to interact with different router vendor devices using a unified API. NAPALM can either connect to the device via netmiko (SSH) or via other transport mechanisms (REST, NetConf) - dependant on the supported APIs by the device.
  • Netmiko: Multi-vendor library to simplify Paramiko SSH connection to network devices.

image5
Figure 3 - Nornir, Napalm, Netmiko Overview

Jinja

Jinja: This is a templating language mostly used in the web development world for creation of static HTML pages using python. Because of its quality and being a python package, it is also used in tools like Ansible and SaltStack, allowing network engineers to create network device configuration from Jinja based templates.

Below shows an example of a Jinja based template for generating the VLAN configuration for a Cisco switch.

{% for id, name in vlans.iteritems() -%}
!
{% if id is number %}
vlan {{ id }}
    name {{ name }}
{% endif %}
!
{% endfor %}

Of course there are vendor-related tools in this space. Here are a couple worth mentioning:

Cisco NSO

Cisco Network Services Orchestrator (NSO) is a platform for automating services across networks and can be used to add, change or delete services. It is an orchestrator model driven (YANG) platform. It supports multi-vendor networks.

JunOS Space

Junos Space Network Management Platform is a platform that simplifies and automates the management of Juniper’s switching, routing and security devices.

Monitoring and Logging

Gone are the days where an RRDTool was used to store and graph network devices metrics. The SysAdmin world has come a long way with monitoring and observability solutions.

We can leverage most of the tools for metric gathering, streaming telemetry, log collection, visualisation and alerting for the network infrastructure.

Below provides some of the most popular tools. Typically, a combination of these pluggable systems are deployed to provide a monitoring solution stack.

Elastic Stack

Elastic Stack was originally conceived as a platform for logging solutions, but it can be augmented with network appliance metrics. This provides an almost complete stack solution for your monitoring needs.

elk
Figure 4 - ELK Stack.

The Elastic stack is composed of:

  • Beats: Lightweight data-shippers. Used to collect data and send it to an upstream processing/storage system.
  • Logstash: Log collection, parsing, and processing.
  • Elasticsearch: Storage and Search analytics engine.
  • Kibana: Visualization and Data exploration tool.

image9
Figure 5 - Kibana.

Note: Prior to Beats, Elastic Stack was previously known as ELK.

ElastAlert

ElastAlert is a reliable, modular and easy-to-setup-and-configure alerting engine. It queries Elasticsearch periodically and the data is passed to a defined rule type when it matches it triggers one or more alerts that determine the action to take.

Prometheus

Prometheus - Monitoring System & Time Series Database is an open-source monitoring and alerting system. It scrapes metrics from instrumented jobs, either directly or via an intermediary gateway. For visualization it can integrated with Grafana. For alerting it has an Alert Manager component that can send notifications to alert cloud solutions like PagerDuty/Opsgenie, email, Slack, etc.

Grafana

Grafana is a visualization tool popular for metrics data, which also includes Grafana Loki for its log aggregation system.

image8
Figure 6 - Grafana.

InfluxDB

InfluxDB: Purpose-Built Open-Source Time Series Database is an open-source time series database. It is usually coupled with data-shippers like Telegraf or Flux and visualisation tools like Grafana.

Splunk

Splunk captures, indexes and correlates data in a searchable repository from which it can generate graphs, reports, alerts, dashboards and visualisations. This is generally an enterprise product deployed as a standalone solution, while the other products listed above are either free or open source at their core.

Testing and Compliance

Testing and compliance checks are dependent on the network design, infrastructure and even the role configured on the devices. So it varies depending on the context. Nevertheless, there are tools and frameworks out there that can help in this space:

pyATS/Genie

pyATS is an end-to-end testing ecosystem. It specialises in data-driven and reusable testing, it extensible by design and it enables developers to start with small, simple, linear test cases and easily scale towards large and complex tests. Genie builds on top of pyATS to provide usable libraries and CLI.

$ genie diff demo1 demo2 && cat ./diff_ios-access2_show-vlan_parsed.txt
1it [00:00, 45.41it/s]
+==============================================================================+
| Genie Diff Summary between directories demo1/ and demo2/                     |
+==============================================================================+
|  File: ios-access2_show-vlan_parsed.txt                                      |
|   - Diff can be found at ./diff_ios-access2_show-vlan_parsed.txt             |
|------------------------------------------------------------------------------|

--- demo1/ios-access2_show-vlan_parsed.txt
+++ demo2/ios-access2_show-vlan_parsed.txt
vlans:
+ 200: 
+  mtu: 1500
+  name: TESTVLAN200
+  said: 100200
+  shutdown: False
+  state: active
+  trans1: 0
+  trans2: 0
+  type: enet
+  vlan_id: 200

Robot Framework

Robot Framework is a generic open-source automation framework for acceptance testing. It has easy-to-use tabular test data syntax and utilizes the keyword-driven testing approach. Its testing capabilities can be extended by test libraries implemented either with Python or Java.

Pytest

Pytest is a Python framework that makes it easy to write small tests and yet scales to support complex functional testing for applications and libraries. When combined with other network libraries like pyEZ, python eAPI, NAPALM, etc. it can be used to test the network via assertion methods.

Batfish

Batfish is an open-source network configuration and analysis tool that provides the ability to validate configuration data, query network adjacencies, verify firewall ACL rule sets and also analyse routing/flow paths. There is a commercial offering of Batfish that provides a UI, along with pre-created validation tests.

Below shows an example of a querying Batfish to validate which links have an MTU of less than 9216.

>>> node_props[node_props["MTU"].apply(lambda x: x < 9216)]
               Interface Description   MTU
143  spine2[Ethernet1/1]    to leaf2  1500
144  spine2[Ethernet1/2]    to leaf2  1500
249   leaf2[Ethernet1/2]   to spine2  1500
250   leaf2[Ethernet1/1]   to spine2  1500

Source of Truth

When building and automating our network, how do we know what attributes to use within our configuration templates, or what IP we should assign to our links? This is were a source of truth comes into play.

A source of truth is a single location that holds all of the configuration for, and attributes about, the network infrastructure. Network automation tools consume this data when performing the necessary networking tasks.

In the field of source of truth platforms there are 2 key acronyms you should understand - IPAM and DDI:

  • IPAM (IP Address Management) - IPAM is a means of planning, tracking and managing the Internet Protocol address space used in a network.[2]
  • DDI (DNS, DHCP and IPAM) - A DDI solution provides a centralised platform to manage DNS and DHCP services and has an IPAM component.[3]
    A source of truth platform can encompass (but is not limited to) IPAM or DDI.

There are various solutions in the market, both commercial (for example InfoBlox) and open-source. In this article we will cover the 2 main open-source tools available in the industry.

Netbox

Netbox is an open-source application, built upon the Python Django framework, designed to help manage and document computer networks. Netbox provides a good UI allowing you to visualize your racks, a REST API and many other great features such as webhooks and export templating to name but a few.

image11
Figure 7 - Netbox

NSoT (Network Source of Truth)

Network Source of Truth (NSoT) is a Django-based, open-source application for IPAM, network devices and network interfaces. NSoT provides a REST API, CLI client, Python modules and a UI for the administration of the database inventory. Though the UI is not as mature as Netbox, NSoT has a great and flexible CLI.

#  nsot devices list -a rack=r1 -a hw_type=switch -a vendor=dell
+--------------------------------------+
| ID   Hostname (Key)   Attributes     |
+--------------------------------------+
| 40   access-sw-010    hw_type=switch |
|                       model=n3048    |
|                       position=u20   |
|                       rack=r1        |
|                       vendor=dell    |
+--------------------------------------+

Version Control Systems

Version control systems are a category of software tools that help a software team manage changes to source code over time. Version control software keeps track of every modification to the code in a special kind of database. If a mistake is made, developers can turn back the clock and compare earlier versions of the code to help fix the mistake while minimizing disruption to all team members.[4]

Historically Version Control Systems (VCS) have been used for software development. With the introduction of Infrastructure as Code (IaC) the role of VCS has become even greater. More recently with the movement of network automation, VCS systems are playing a key role in the management and lifecycle of network configuration, along with NetDevOp CI/CD pipelines.

Below is a summary of some of the roles VCS can play regarding network automation:

  • Application or scripts repository.
  • Ansible Playbooks, Roles or Collections repository (which can be pulled by Ansible Tower to run jobs.
  • Rundeck projects and runbooks repository.
  • Thanks to their diff capabilities, version control systems are a good tool for device configuration source of truth as they can compare configuration files.

Git

Git is installed and maintained on your local system. It allows you to perform source code versioning, track local file changes and share changes with a remote repository (such as Github, Gitlab or BitBucket).

One important concept of Git is branches. Feature/Dev branches are created from your master branch. Changes are then made within this feature/dev branch and then merged back into the master branch, as shown below.

image10
Figure 8 - Git Branching.

GitHub

Github is a cloud service (SaaS) for the remote hosting of git repositories. In addition to hosting your code, the site helps manage software development projects with features like issue tracking, collaborating with other GitHub users, and hosting web pages.

image6
Figure x - Github

GitLab

GitLab, similar to Github, is another web-based service for the remote hosting of git repositories. But unlike Github it provides an open-source community edition that you can deploy by yourself.

Automation Hub and CI/CD

An Automation Hub can be defined as a centralised server that handles your automation tasks. Usually it can integrate with multiple tools and systems and is used for scheduling jobs. On the other hand we have the Continuous Integration and/or Continuous Delivery tools, that are focused on providing continuous code integrations and tests along pipelines to deliver code.

Now, most tools in those areas overlap in most functionalities. For example:

Ansible Tower or AWX

Ansible Tower or AWX is a centralised server for Ansible-related playbooks that can execute scheduled jobs, and can integrate with control version system tools like Github to be part of a CI/CD pipeline. Ansible Tower is the commercial offering of AWX from Redhat.

image3
Figure 9 - AWX

Rundeck

Rundeck is similar in its functions to Ansible Tower; this software can run jobs in a scheduled manner. Its configuration can be customised to also serve as part of a CI/CD pipeline process.

image4
Figure 10 - Rundeck

Jenkins

Jenkins was originally conceived to treat CI/CD pipeline processes, but it can also serve as cron job manager.

image2
Figure 11 - Jenkins

Ansible Tower and Rundeck are competing in the same space (both were originally created to handle automated jobs in a planned and scheduled manner), while other tools like Jenkins were originally developed with software CI/CD in mind.

In the CI/CD space there are far more SaaS-based tools available, such as:

  • Travis CI
  • GitLab CI
  • CircleCi
  • DroneCi
  • Bamboo

Infrastructure as Code (IaC) Orchestration

For virtual and cloud network infrastructure environments - Infrastructure as Code - tools help provision, manage and control the assets via machine-readable definition files. For example allowing you to create a VPC in GCP or AWS, or creating VMs across different public or private clouds.

Terraform by HashiCorp

Terraform by HashiCorp is a tool for building, changing and versioning infrastructure safely and efficiently. It can be used to manage existing and popular service providers as well as custom, in-house solutions.

Terraform examples:

# Create a new instance of the latest Ubuntu 14.04 on an
# t2.micro node with an AWS Tag naming it "HelloWorld"

provider "aws" {
  region = "us-west-2"
}

data "aws_ami" "ubuntu" {
  most_recent = true
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-trusty-14.04-amd64-server-*"]
  }
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
  owners = ["099720109477"] # Canonical
}

resource "aws_instance" "web" {
  ami           = "${data.aws_ami.ubuntu.id}"
  instance_type = "t2.micro"
  tags = {
    Name = "HelloWorld"
  }
}

Pulumi - Modern Infrastructure as Code

Pulumi allows you to Interact with your infrastructure with real programming languages. Pulumi is an open-source platform for building and deploying cloud infrastructure and applications in your favourite language on any cloud. This allows you to describe your resources in code, such as VMs, networks, databases, containers, Kubernetes clusters and workloads and serverless functions.

Here are some Pulumi Python examples:

# Copyright 2016-2018, Pulumi Corporation.  All rights reserved.

import pulumi
import pulumi_aws as aws

size = 't2.micro'

ami = aws.get_ami(most_recent="true",
                  owners=["137112412989"],
                  filters=[{"name":"name","values":["amzn-ami-hvm-*"]}])

group = aws.ec2.SecurityGroup('web-secgrp',
    description='Enable HTTP access',
    ingress=[
        { 'protocol': 'tcp', 'from_port': 80, 'to_port': 80, 'cidr_blocks': ['0.0.0.0/0'] }
    ])

user_data = """
#!/bin/bash
echo "Hello, World!" > index.html
nohup python -m SimpleHTTPServer 80 &
"""

server = aws.ec2.Instance('web-server-www',
    instance_type=size,
    security_groups=[group.name],
    user_data=user_data,
    ami=ami.id)
    
pulumi.export('public_ip', server.public_ip)
pulumi.export('public_dns', server.public_dns)

In addition, each cloud provider normally has their own set of tools that can help you automate and perceive the infrastructure as code. For example: AWS CloudFormation - Infrastructure as Code.

Event-Driven Automation

Event-driven automation is a term generally given to an automated workflow that is actioned when a specific event has been triggered.

StackStorm

StackStorm is an event-driven automation platform for auto-remediation, security responses, troubleshooting, deployments and more. It ties together the existing infrastructure and application environment for automation tasks (like taking actions in response to an event).

Although StackStorm is focused on event-driven automation, there are other ways to achieve this, although most of them are DIY.

For example, on the Elastic stack you can have a Logstash instance send the message directly to your Alerting platform when a certain log event has been matched. Or you can have certain rules on the Alert Manager of your Prometheus platform to send specific alerts to your ChatOps or trigger an Ansible playbook to run.

Most of the solutions need some expertise on the matter and that is why StackStorm becomes appealing because it brings simplicity and more capabilities in this space.

SaltStack

The SaltStack platform (covered previously within the configuration management domain) is also considered to be event-driven since you can listen and react to Salt and non-Salt events from its agents. For more information you can see their Event-Driven Infrastructure documentation.

Security

The security space usually brings its own set of tools depending on the vendor. Most appliances and solutions provide a northbound API for a network engineer to interact with the system in a programmatic way.

Below provides the main tools available for Credential and Sensitive Data Management.

Ansible Vault

Ansible Vault is a feature of Ansible that allows you to keep sensitive data such as passwords or keys in encrypted files. These vault files can then be distributed or placed in source control.

Vault by HashiCorp

Vault is a platform that securely stores and tightly controls access to tokens, passwords, certificates and encryption keys for protecting secrets and other sensitive data using UI, CLI or HTTP API.

$ vault kv get secret/routers

====== Metadata ======
Key              Value
---              -----
created_time     2020-01-04T12:54:03.250328Z
deletion_time    n/a
destroyed        false
version          2

===== Data =====
Key        Value
---        -----
some_user    some_credential

Ansible Vault is a more lightweight and even temporary solution if you are already using Ansible in your workflow, while Vault by Hashicorp provides a more complex but robust, highly available solution with better integration thanks to its HTTP API.

References


  1. "End-to-End Application Provisioning with Ansible and ... - IBM." 21 Nov. 2018, https://www.ibm.com/cloud/blog/end-to-end-application-provisioning-with-ansible-and-terraform. Accessed 20 Jan. 2020. ↩︎

  2. "What is IPAM (IP Address Management)? | DDI ... - Infoblox." https://www.infoblox.com/glossary/ipam-ip-address-management/. Accessed 21 Jan. 2020. ↩︎

  3. "What is DDI (DNS, DHCP, IPAM)? | BlueCat Networks." https://www.bluecatnetworks.com/glossary/glossary-ddi/. Accessed 21 Jan. 2020. ↩︎

  4. "What is version control | Atlassian Git Tutorial." https://www.atlassian.com/git/tutorials/what-is-version-control. Accessed 21 Jan. 2020. ↩︎