Plumb is a large-block stream processing system for efficient multi-user pipelines.
First public release.
Plumb is designed for processing large-block, streaming data in a multi-user environment. Plumb’s novelty comes from integrating workflows from multiple users while de-duplicating computation and storage, and its use of dynamic scheduling to accommodate structural and computational skew.
Plumb has been in operational use since July 2019 and (as of September 2019) has processed 110740 files with 221 TB of data.
We are planning to release Plumb in Fall 2019.
To install and start using Plumb in your local environment you’ll need a local Hadoop/Yarn/Hdfs cluster, a mySQL database server and a lot of patience. Please see the Plumb Installation Instructions for details on how to set up a Plumb cluster, and Plumb User Guide for operation tips.