According to the IDC Innovators: Datacenter Software Defined Networking, 2018 report, SDN “represents an architectural approach to datacenter networking in the cloud era.” IDC continues: “To support digital transformation, datacenter networks must become agile, both architecturally and operationally. They must possess the intelligent automation that will make them ‘cloud like’ and increasingly autonomous.”
There is no single path to the software-defined future of data center infrastructure. In fact, there are several considerations, but perhaps none is as critical as the decision to go with a controller-based versus a controllerless architecture.
According to SDx Central, a controller acts as the brains of the software-defined network, serving as “a strategic control point in the SDN network.” A traditional controller runs on external servers – typically three are deployed for redundancy – and holds a centralized view of the entire state of the network. The controller typically connects to the packet forwarding nodes via an out-of-band (OOB) management channel. A controllerless approach also has a centralized view of the entire state of the network but distributes that state and intelligence across all switches – in other words, each node in the network has a view of the network’s full state, not just its neighbor’s.
To date, SDN has been slow to take off because deployments have been inhibited by the complexity associated with controllers and “open” yet proprietary protocols like OpenFlow that require significant changes to network architecture and operation. In approaches built on the centralized controller model, certain vendors have implementations in which the switch nodes have no intelligence and their forwarding tables are programmed with a protocol that has to traverse the OOB network from the controller to the switch. One obvious challenge in such approaches is that the OOB network equals a single point of failure (think DDOS attack), and most switches also only have one OOB management port, another single point of failure. If the controller is some distance from the nodes then this can result in limitations such as slow reconvergence and slow processing of new flows, factors which generally limit the ability of this sort of architecture to stretch the network geographically. Other vendors that support in-band communications between the nodes and the controller propose an architecture that has a minimum of one controller per site, making the design very complicated and costly. Additionally, a central controller can only manage so many nodes, which can become expensive if multiple controllers have to be deployed, with each deployment requiring three servers for redundancy. Finally, in the case of multiple controllers, many controllers do not have the ability to federate with one another, resulting in network islands.
Controllerless: A Novel Approach to SDN
A controllerless architecture, often tagged with the moniker “next-generation” SDN, still enables full state view of the network, but that state is distributed across all nodes in the switching fabric. In this model there are three layers to consider:
- The management layer, where all switches federate together to build a management fabric, sharing the full state of the network with every other switch. The entire fabric can be controlled from the CLI or via an API on any switch. This management plane fabric will automatically populate any config or state changes to all other switches in the fabric.
- The underlay, which provides physical connectivity and IP reachability and is based on standard Layer 2 and Layer 3 protocols. This enables the fabric to be built across a set of leaf nodes only (top-of-rack DC switches) while using standard protocols to interoperate with existing spine (core DC switches) from any vendor, enabling a smooth migration to SDN in brownfield scenarios. In greenfield scenarios, both leaf and spine can be part of a controllerless fabric.
- The VXLAN overlay, which virtualizes the underlay to support any topology, often a two or three stage CLOS fabric to provide any-to-any connectivity between all nodes.
While an OOB connection can be used, in a more typical deployment the in-band VXLAN tunnel overlay mesh is used for management and control messages, which means there are multiple paths and multiple ports on every node to reach every other node – no single point of failure. If one switch fails, all other switches can continue to operate, update network state and quickly reconverge. Since there is no external controller running on top of three servers for every N switches, the cost and complexity of the external controller is eliminated. Also, since full state is held in every switch and all services are distributed throughout the fabric via objects like an IPv4/IPV6 anycast gateway present in every switch, there is no problem deploying sites that are widely dispersed geographically.
This next-generation SDN delivers operational efficiencies by federating an organization’s array of switches to make them appear as one logical switch. As such, an organization with multiple switches deployed in a distributed environment – inside one data center, across a campus, across cities or across the ocean – benefits from the ability to view and manage the fabric as one logical entity. This provides a level of operational simplicity that dramatically reduces both operating costs and the potential for human error.
In the distributed controllerless approach, a switch that goes out of service has a very limited effect on the overall fabric control plane functionality and no effect at all on the data forwarding plane, making it a very robust design. If there is a switch out of service and a config change is requested, none of the switches will accept the change until the misbehaving switch is ejected from the fabric or recovered, ensuring consistent configurations across the fabric. Areas of the fabric can run and be managed in isolation from the rest during severe connectivity disruptions, gracefully rejoining and regrouping as a whole when connectivity recovers, making it a flexible solution during times of crisis. Finally, since the nodes are still running standard protocols, decision-making in the case of topology changes is rapid, providing very low reconvergence times, which is key in today’s demanding application environments.
Fabric-wide visibility to all attached devices and down to the TCP flow level simplifies troubleshooting. The ability to troubleshoot the entire fabric from any switch and even go back in time and drill down on a particular flow between two end-points can simplify and accelerate troubleshooting, further simplifying operations and also improving security.
Network Slicing for Security and Services
The ability to slice the fabric for multi-tenancy with complete isolation of the management, control and data planes enables an excellent security posture, especially in light of the rise in IoT traffic. Untrusted traffic that can bring a large attack surface on shared infrastructure can be completely isolated from more valuable traffic. This slicing can also be used in multi-tenant virtualized environments to provide full control and rich services for each tenant.
White Box Economics
Optimally, this sort of solution is available to run on open networking solutions, which can be brite box or white box switches. Thus, in addition to simplifying network management and increasing reliability to lower operational costs, capital costs can be reduced on the order of 40 to 50% compared to traditional switching solutions, while preventing vendor lock in. White box switches are also typically 1RU, achieving a similar scale-out design to the larger hyperscalers including a fabric.
A controllerless approach to next-generation SDN represents the best way to transition to the software-defined data center of tomorrow. Organizations that make the move to SDN without a controller are experiencing a dramatic reduction in costs and complexity and gaining the operational flexibility that comes by federating a large number of switches as a single programmable entity. The Pluribus Netvisor OS and the Adaptive Cloud Fabric offer an industry-leading approach to controllerless SDN and eliminate the architectural complexities of traditional SDN controllers, allowing seamless interoperation with existing networks and enabling a more graceful migration to a software-defined architecture.
Ask Econocom Italia. This innovative cloud service provider in Italy has experienced firsthand how a controllerless approach can benefit its business.
“Pluribus’ distributed, controllerless Adaptive Cloud Fabric architecture automates plug and play operation with Econocom Italia’s existing network infrastructure. Pluribus is delivering a non-disruptive and transparent solution that makes it easier for Econocom Italia to deliver, manage and secure service delivery,” Paolo Bombonati, Chief Operating Officer
Sign up for our blog digest to get the latest news, business tips, and thought leadership from The Pluribus Blog, every month.
About the Author
Mike is Chief Marketing Officer of Pluribus Networks. Mike has over 20 years of marketing, product management and business development experience in the networking industry. Prior to joining Pluribus, Mike was VP of Global Marketing at Infinera, where he built a world class marketing team and helped drive revenue from $400M to over $800M. Prior to Infinera, Mike led product marketing across Cisco’s $6B service provider routing, switching and optical portfolio and launched iconic products such as the CRS and ASR routers. He has also held senior positions at Juniper Networks, Pacific Broadband and Motorola.