Accelerate Big Data Applications

Time sensitive application optimization with visibility, monitoring and control of your network flows.

Accelerating Big Data Applications with Pluribus Netvisor®

The business value hidden in the large quantity of data that organizations produce daily and accumulate over time is becoming more and more evident. As a consequence, Big Data processing using infrastructure such as Apache Hadoop is becoming a mainstream activity in the typical enterprise data center. In most cases, the execution time of a Hadoop job is directly related to the business outcome, and Big Data processing is expected to happen in real-time or quasi real-time.

To support the need to exchange, monitor and control huge flows of data, Pluribus has developed the Netvisor® OS (operating system) to run on bare metal switches. Without the need for an external controller, the Pluribus switches running by Netvisor federate into a fabric, offering unprecedented insight, agility and security. Pluribus Netvisor switches enhance the deployment of Apache Hadoop above and beyond traditional network operating systems.

Pluribus Networks advances network virtualization and software-defined networking (SDN) through Netvisor, the industry’s most programmable, open source-based network operating system. Netvisor is based upon the Pluribus Virtualization-Centric FabricTM (VCFTM) architecture, a proven approach to understanding flow, rapidly responding to business needs and securing your data.

Factors that affect Hadoop processing speed:

  • Data ingestion
  • Mapper data access
  • Intermediate result shuffling from mappers to reducers
  • Reducers data access
  • Results output
  • HDFS initial replication
  • HDFS replica recovery

Benefits

  • Simplified joint performance analysis and troubleshooting with vPort data (e.g. node names, data, roles, etc.).
  • Improved HDFS primary and secondary traffic control via intelligent bandwidth allocation via vFlow commands.
  • Troubleshoot HDFS replica recovery data conflicts with Mapper/Reducer data access with detailed telemetry.
  • Identification of congestion/hot spots during results shuffling (especially from many mappers to few reducers).
  • Spotting of excessive traffic between Task Trackers and Data Nodes due to suboptimal data locality.