ANT Plumb

Plumb

Plumb is a large-block stream processing system for efficient multi-user pipelines.

Purpose

Plumb is designed for processing large-block, streaming data in a multi-user environment. Plumb’s novelty comes from integrating workflows from multiple users while de-duplicating computation and storage, and its use of dynamic scheduling to accommodate structural and computational skew.

Status

Plumb has been in operational use since July 2019 and (as of September 2019) has processed 110740 files with 221 TB of data.

We are planning to release Plumb in Fall 2019.

Documentation

To install and start using Plumb in your local environment you’ll need a local Hadoop/Yarn/Hdfs cluster, a mySQL database server and a lot of patience. Please see the Plumb Installation Instructions for details on how to set up a Plumb cluster, and Plumb User Guide for operation tips.

Publications

More information about Plumb can be found in a Plumb Poster or in the Plumb Tech Report.