BGP EVPN (Border Gateway Protocol Ethernet Virtual Private Network) is one of the most discussed options for building data center fabrics. With Pluribus now incorporating it into our solution toolkit, it’s time to dive deeper into what BGP EVPN is and how can it best be used as one of many tools to build and operate data center fabrics.
BGP EVPN is one approach to solving some important networking problems in a way that improves over previous protocols and architectures, including better scalability, redundancy, and performance. It has many applications including as a tool to build data center fabric overlays.
In this blog, we will explain what BGP EVPN is and explore a few options for how it can be used to build and scale data center fabrics, including ways we can use it to extend the Pluribus Adaptive Cloud Fabric and provide multi-fabric interoperability. We will also compare it to alternative ways to build data center fabrics, including SDN-automated approaches that can simplify network operations.
What is BGP EVPN?
BGP EVPN provides an increasingly popular, standards-based approach to create overlay networks using various data plane encapsulations, with VXLAN encapsulation most commonly used for data center overlay fabrics.
The original motivations for BGP EVPN are outlined in RFC 7209, Requirements for Ethernet VPN (EVPN). To summarize briefly:
- Extend Layer 2 VPN services across an underlying Layer 3 network. Originally the target networks were service provider networks built on IP/MPLS.
- Improve on prior approaches such as Virtual Private LAN Service (VPLS) by improving redundancy, optimal traffic forwarding, scalability and simplicity of provisioning.
In principle, these EVPN requirements could be solved with varying control plane protocols, but in practice, BGP (specifically Multiprotocol BGP or MP-BGP) has been the EVPN control plane of choice from the early days, as outlined in RFC 7432. The basic idea of a BGP EVPN is shown below, with Ethernet networks connected over an intermediate IP network, labeled the IP underlay. The Provider Edge (PE) routers implement BGP EVPN protocol and peer with each other over the IP network. Standard BGP architectural tools such as route reflectors (not shown) can be applied as needed for scalability, redundancy and performance.
BGP EVPN supports multiple “route types” to address different use cases. The most commonly implemented are types 2 and 5:
- Type 2, MAC/IP route, advertises Ethernet MAC addresses between connected Ethernet networks and thus enables layer 2 forwarding of Ethernet frames across the EVPN without routing. This is the type used to “stretch” layer 2 Ethernet segments, including VLANs, which is critical to many data center applications.
- Type 5, IP prefix route, advertises IP routes to enable layer 3 routing between the Ethernet segments. An important application of this type is extension of layer 3 Virtual Routing and Forwarding (VRF) instances that isolate or segment different applications or users.
BGP plays a few key roles in realizing the EVPN scaling requirements. Among them are:
- Removes MAC address learning from the data plane. In a standard Ethernet network, MAC addresses are learned by switches observing frames (packets) as they are transmitted. Whenever a switch encounters an unknown destination MAC address, it floods the frame, i.e. replicates it to all possible destinations. This type of unknown unicast frame flooding can lead to traffic congestion, potentially including broadcast storms, and scales very poorly.
- Reduces broadcast and multicast traffic load. Similar to unknown unicast flooding, broadcast and multicast traffic can create heavy traffic loads on the network – and the potential for outages due to broadcast storms. BGP EVPNs can reduce these problems by forwarding such traffic selectively.
- Enables optimal forwarding, load balancing and convergence across the underlay IP network.
It’s important to remember that BGP EVPN is not the only way to accomplish these goals. For example, the Pluribus Adaptive Cloud Fabric achieves equivalent benefits using a protocol-free controllerless SDN architecture, which we’ll discuss below.
How Does BGP EVPN Fit in a Data Center Fabric?
As noted above, BGP EVPN was originally focused on constructing L2VPNs over IP wide area networks, but it has gained industry traction as a way to create data center fabric overlays. Why is that?
We have previously discussed (e.g. here and here) why overlay networks are valuable in building scalable DC fabrics. In summary, industry best practice has moved toward building DC fabric underlay networks with scalable layer 3 architectures and then implementing VXLAN-based overlays to provide virtualized layer 2 network connectivity and secure segmentation between applications and tenants, among other benefits. This general architecture is shown in Figure 2.
There are many options for building and managing the VXLAN tunnels to create an overlay as highlighted in the table below. As you can see, BGP EVPN is just one of the options to choose from when building a stand-alone data center fabric, but it can also play a role in hybrid deployments for multi-fabric interoperability and scaling.
No control plane
Software-defined control plane
Protocol-based control plane
Hybrid / mixed control plane
Table 1: Example Approaches for Building and Managing VXLAN Overlays
For a very simple network, manual static configuration of VXLAN overlays may be feasible, but in general some type of control plane is required to meet the scalability challenges highlighted in the previous section.
VXLAN Tunnel Endpoints (VTEPs) can be created in software in the servers that run applications or in hardware in the top-of-rack switches they connect to. VMware’s NSX software is perhaps the best-known example of a server-based VXLAN overlay. There are pros and cons to server- vs. switch-based overlays, which we won’t cover here in detail, but the principle reasons for using switch-based overlays are lower cost and higher performance. For the rest of this blog, we will focus on switch-based overlays, particularly on the Pluribus Adaptive Cloud Fabric and BGP EVPN, and the ways in which they can work together.
First, let’s look at how BGP EVPN works in a data center overlay context.
RFC 8365, “A Network Virtualization Overlay Solution Using Ethernet VPN (EVPN),” provides an architectural guide for how to build overlays using BGP EVPN with VXLAN (as well as other data plane encapsulation methods). It outlines the intended data center applications and requirements similarly to those mentioned above:
“An NVO is a solution to address the requirements of a multi-tenant data center, especially one with virtualized hosts, e.g., Virtual Machines (VMs) or virtual workloads. [It provides]:
- Isolation of network traffic per tenant
- Support for a large number of tenants (tens or hundreds of thousands)
- Extension of Layer 2 (L2) connectivity among different VMs belonging to a given tenant segment (subnet) across different Points of Delivery (PoDs) within a data center or between different data centers
- Allowing a given VM to move between different physical points of attachment within a given L2 segment
The underlay network for NVO solutions is assumed to provide IP connectivity between NVO endpoints.”
When BGP EVPN is used to create an overlay within a single data center, VXLAN tunnels extend between the top-of-rack leaf switches, and every leaf switch now becomes an EVPN instance (effectively taking the role of the PE router in Figure 1). The underlay leaf-and-spine network, like the wide-area IP network in Figure 1, simply provides IP connectivity between these BGP peers on the leaf switches. In most common implementations, the spine switches only participate in the underlay; they do not terminate tunnels or participate in the BGP EVPN control plane. This is depicted in Figure 3 below.
One major issue in using BGP EVPN in this way is complex configurations. If you have only a handful of PE routers that implement BGP EVPN, as in Figure 1, the complexity may be manageable. But in the data center overlay use case, that complexity is now replicated in every leaf switch. Figure 4 shows an example of BGP EVPN configuration in a data center fabric with 32 leaf switches requiring 640 lines of configuration code for each service that is required. As the number of switches, tenants and services increases, this can quickly become hard to manage with manual, box-by-box configuration approaches.
For many customers, that means practical implementations of BGP EVPN require some type of network automation. Unfortunately, there is no standard or agreed approach to network automation for BGP EVPN or more generally. A few larger operators with deep networking expertise can afford to dedicate staff and time to create their own automation environments, using programming languages such as Python and tools such as Ansible, but most others will need a more complete automation solution provided by either their networking vendor or an independent third party. While comparison of automation approaches is beyond the scope of this blog, it’s clear that any decision to implement BGP EVPN should be considered in the context of network automation.
How Does BGP EVPN Enable Multi-site Data Center Fabrics?
Extending a BGP EVPN fabric between geographically separated data centers raises a choice: whether to extend one large BGP EVPN overlay fabric as in Figure 5(a), or separate the individual DC fabrics with a third data center interconnect (DCI) fabric, as in Figure 5(b). There are pros and cons to each approach.
The single unified fabric with an extended (or “stretched”) overlay, as shown in Figure 5(a), can in theory simplify provisioning of virtual connections between data centers, but in practice configuring BGP EVPN for a large scale multi-site fabric can be quite complex and control plane scaling can be a major limiting factor in the number of nodes and overlay services supported. The single, unified protocol-based BGP control plane also severely limits the allowed distance and latency between DCs, and is more susceptible to failure modes that could take down both data centers. As a result, deployments of such stretched overlays with BGP EVPN are limited.
By contrast, the SDN-enabled Pluribus Adaptive Cloud Fabric offers a much simpler and more robust solution to create unified multi-site data center fabrics and eliminates distance limitations.
The multi-fabric interconnection approach of Figure 5(b) can be more robust and scalable. At present, there is no open standard defined for this approach, and vendor-proprietary implementations vary. Some even use spine switches rather than leaf switches as the border gateway between EVPN fabric domains (a case that is not shown in the figure). In principle, different control plane technologies can be used within each fabric if needed. (For example, Cisco historically recommended a technology called Overlay Transport Virtualization (OTV) specifically for DCI, while the newer Cisco ACI multi-site architecture uses BGP EVPN for the DCI fabric but not for the intra-DC fabrics, which are based on proprietary protocols.) A major downside of any multi-fabric approach is that service provisioning can be more complex and error-prone than for a single extended fabric. Each service instance that crosses multiple DCs must be configured in all three (or more) EVPN domains and stitched together. This drives an even greater urgency to implement some type of network automation, or perhaps better put, network orchestration. This approach also generally requires either a more powerful, specialized border leaf switch to handle the multi-fabric interconnection or separate switches dedicated to the DCI fabric, increasing cost in either case.
(As an aside, these examples illustrate that the concepts of “Data Center Interconnect” and “Data Center Unification” are interrelated and can be complex. To dive into that topic more deeply, see my blog, “What is Data Center Interconnect vs. Data Center Unification?”)
How is Pluribus using BGP EVPN?
When we announced our support for BGP EVPN as part of our Thousand-Node Fabric architecture, we noted that it will give our customers more choice in how to scale their networks. We are committed to supporting customers who want to use BGP EVPN to meet their requirements, but we also believe that it is best seen as a tool that complements the Pluribus Adaptive Cloud Fabric, rather than replacing it.
The Adaptive Cloud Fabric remains the industry’s most comprehensive, cost-effective and easy-to-use solution for data center fabrics and distributed clouds, what we call “Networking – Simplified.” With our recent innovations, including a hierarchical multi-pod architecture and new capabilities incorporated into the Pluribus UNUM management system (Figure 6), the Adaptive Cloud Fabric can scale to as many as 1024 switches in a unified fabric with full automation of underlay and overlay, and a rich set of overlay services. This is all achieved via the protocol-free controllerless SDN control plane, so a BGP protocol-based control plane does not need to be configured in every switch in the fabric.
As a result, the Adaptive Cloud Fabric can meet the scaling needs of a high percentage of enterprises and even many service providers without the need to configure BGP EVPN.
Nonetheless, BGP EVPN can add value to some customers who use the Pluribus Adaptive Cloud Fabric. We highlighted two applications in our Thousand Node Fabric blog, and briefly review them here: open, multi-vendor fabric extension and Adaptive Cloud Fabric interconnection.
BGP EVPN for Multi-vendor Fabric Extension
Figure 7 shows how BGP EVPN can support extension of overlay services from an Adaptive Cloud Fabric to fabrics provided by other vendors. We enable BGP EVPN only on a border leaf cluster (two switches) within the Adaptive Cloud Fabric, and that cluster maps the Adaptive Cloud Fabric control plane to the BGP control plane. Note that this is similar to the multi-fabric interconnection approach shown in Figure 5(b), in the which the DCI fabric is separate from the fabric in each data center. All the benefits of the Adaptive Cloud Fabric are retained within the data center (or multiple data centers) where it is deployed and BGP EVPN need not be configured on every leaf switch.
BGP EVPN for Adaptive Cloud Fabric Interconnection
Figure 8 illustrate another use of BGP EVPN, to interconnect multiple Adaptive Cloud Fabrics and extend services between them. By enabling EVPN interconnection of Adaptive Cloud Fabrics, we allow customers to optimize the scale of each individual fabric based on their own constraints, which might include geography or a desire to partition network operations. EVPN interconnection can also enable scaling beyond the thousand nodes supported in an individual Adaptive Cloud Fabric to create an extended fabric of many thousands of switching nodes. Again, this hybrid approach retains all the Adaptive Cloud Fabric benefits within each fabric and limits BGP EVPN configuration to a pair of border leaf switches per fabric.
Summary of Adaptive Cloud Fabric with BGP EVPN
While BGP EVPN and the Pluribus Adaptive Cloud Fabric can both be used to create an overlay network composed of a mesh of VXLAN tunnels and both can manage control-plane address learning in the overlay for scalability and performance, they do so in different ways. BGP EVPN uses the BGP protocol to distribute control plane information, while the Adaptive Cloud Fabric uses a protocol-free controllerless SDN approach that substantially reduces operational complexity and can increase scalability. As a result, Pluribus recommends BGP EVPN as a complementary tool to the Adaptive Cloud Fabric, to be used for things like multi-vendor interoperability and multi-fabric interconnection.
Subscribe to our updates and be the first to hear about the latest blog posts, product announcements, thought leadership and other news and information from Pluribus Networks.
About the Author
Jay Gill is Senior Director of Marketing at Pluribus Networks, responsible for product marketing and open networking thought leadership. Prior to Pluribus, he guided product marketing for optical networking at Infinera, and held a variety of positions at Cisco focused on growing the company’s service provider business. Earlier in his career, Jay worked in engineering and product development at several service providers including both incumbents and startups. Jay holds a BSEE and MSEE from Stanford and an MBA from UCLA Anderson.