Accelerating Big Data Applications with Pluribus Netvisor®
The business value hidden in the large quantity of data that organizations produce daily and accumulate over time is becoming more and more evident. As a consequence, Big Data processing using infrastructure such as Apache Hadoop is becoming a mainstream activity in the typical enterprise data center. In most cases, the execution time of a Hadoop job is directly related to the business outcome, and Big Data processing is expected to happen in real-time or quasi real-time.
To support the need to exchange, monitor and control huge flows of data, Pluribus has developed the Netvisor® OS (operating system) to run on bare metal switches. Without the need for an external controller, the Pluribus switches running by Netvisor federate into a fabric, offering unprecedented insight, agility and security. Pluribus Netvisor switches enhance the deployment of Apache Hadoop above and beyond traditional network operating systems.
Pluribus Networks advances network virtualization and software-defined networking (SDN) through Netvisor, the industry’s most programmable, open source-based network operating system. Netvisor is based upon the Pluribus Virtualization-Centric FabricTM (VCFTM) architecture, a proven approach to understanding flow, rapidly responding to business needs and securing your data.
Factors that affect Hadoop processing speed:
- Data ingestion
- Mapper data access
- Intermediate result shuffling from mappers to reducers
- Reducers data access
- Results output
- HDFS initial replication
- HDFS replica recovery
- Simplified joint performance analysis and troubleshooting with vPort data (e.g. node names, data, roles, etc.).
- Improved HDFS primary and secondary traffic control via intelligent bandwidth allocation via vFlow commands.
- Troubleshoot HDFS replica recovery data conflicts with Mapper/Reducer data access with detailed telemetry.
- Identification of congestion/hot spots during results shuffling (especially from many mappers to few reducers).
- Spotting of excessive traffic between Task Trackers and Data Nodes due to suboptimal data locality.