As the business world re-tools itself to become dramatically more digital-centric, and their adoption of leading edge technology such as converged infrastructure and containers, the management of these new business services becomes a huge opportunity.. And if implemented in a comprehensive fashion, users will be able to answer simple questions like which VM or container is causing the problem, why some application see higher latency and even more basic questions like what is the average latency for service and which clients are seeing higher latency then average becomes impossible. For enterprises, and public and private cloud operators, if these questions can’t be answered on a daily basis, the SLA gets impacted and customers get very unhappy.
Pluribus Virtualization Centric Fabric (VCF) leverages the modern top of the rack switch architecture to use the switches as a distributed data source. And the Pluribus’ VCFcenter management console provides the network analytics needed by modern applications. VCFcenter and its associated VCF Insight and Packet Analytics (each application in VCFcenter) help answer these application-specific questions every day. The VCF architecture tracks every application flow in real-time and also records it in its Time-Machine so users can go back in time and look at every flow when the problem was reported. The Pluribus VCF architecture was founded on the ability to embedded telemetry and visibility into any commodity switch as both a packet and flow data source. Major vendors like Cisco (with their fresh launch of Tetration) and VMware (with their recent acquisition of Arkin) have heavily validated the need for application analytics for today’s modern applications.
Issues with Application Analytics today
Traditional switches and routers are dumb packet switching devices. They switch billions of packets per second between servers and clients at sub-microsecond latencies using very fast ASICs but have no capability to record anything. As such, external optical TAPs and monitoring networks have to be built to get a sense of what is actually going on in the infrastructure. The figure below shows what monitoring today looks like.
This is where the challenges start coming together. The typical enterprise and datacenter network that connects the servers are running a combination of 10, 40 and 100Gbps today. These switches have typically 40-50 servers connected to them pumping traffic at near line rate.
There are 3 possibilities to see everything that is going on:
1. Provision a fiber optics tap at every link and divert a copy of every packet to the monitoring tools. Since the fiber optics tap and passive, you have to copy every packet and the monitoring tools need to deal with few Tbits/sec or 1B+ packets per second from each switch.
2. Selectively place copper or fiber optic taps at uplinks or edge ports gets us back into the inner network becoming a black hole and we have no visibility into what is going on. A key thing we have learned over time is that without 100% visibility, you can’t fix a problem very efficiently.
3. Using the networking switches themselves to selectively mirror traffic to monitoring tools. A more popular approach these days is built upon the premise of sampling where the sampling rates are typically 1 in 5000 to 10000 packets. This approach is better than nothing, but any monitoring software being used simply does not have enough raw detail to attain any meaningful results. Worse yet, the cost goes up exponentially as more switches get monitored (monitoring fabric needs more capacity, the monitoring software gets more complex and needs more hardware resources).
Netvisor provides the distributed monitoring architecture
Pluribus VCFcenter is a collection of tools that include Insight Analytics and Packet Analytics that presents all the packets and/or application flows with drill-downs into each flow, service, VM or bare-metal server.
The data sources can be:
- Meta-data from Pluribus Virtualization Centric Fabric where each switch collects the meta-data for every flow and sends it to the Insight-Analytics application over Rest API. The Netvisor Fabric utilizes it multi-threaded high-performance control plane running over switch chip PCI-express to get useful traffic and process it. The bulk of the processing happens on local Netvisor instance and only meta-data goes to the IA application which allows this solution to scale to Billions of flows.
- Netvisor VM running on individual servers can also provide a data source. Works similar to Netvisor Fabric where each instance does the packet processing and sends the meta-data to the IA application.
- Using 3rd party switch mirror/span ports. A user can get analytics for his legacy deployments by just turning the mirror/span ports towards the VCF-IA appliance which will still use the Netvisor vflow capability on the appliance H/W to collect relevant packets. This part does have some scaling limitations.
As applications become more transitory, and services are deployed within private clouds, the ability to monitor each and every connection is of paramount importance. Given the amount of data, the traditional data network analytics sources do not scale and no longer meet the basic requirements. The Pluribus VCF architecture and VCFcenter were built in response to modern computing needs, with data visibility at the packet and transaction level paramount. VCFcenter provides the data sources and analytics tools to observe every packet and flow at a fraction of the cost of the legacy solutions.
Subscribe to our updates and be the first to hear about the latest blog posts, product announcements, thought leadership and other news and information from Pluribus Networks.
About the Author
Sunay is the CTO and a Co-Founder of Pluribus Networks. Prior to Pluribus, Sunay was a Senior Distinguished Engineer for Sun Microsystems, and was the Chief Architect for Kernel/Network Virtualization in Core Solaris OS. Sunay has an extensive 20+ year software background, and was one of the top code contributors to Solaris. Sunay holds over 50 patents encompassing network and server virtualization.