On Fault Tolerance for Distributed Iterative Dataflow Processing

Conference/Journal
IEEE
Authors
Chen Xu Markus Holzemer Manohar Kaul Juan Soto Volker Markl
BibTex
Abstract
Abstract: Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typically, these analytics are a part of a comprehensive workflow, which includes data preparation, model building, and model evaluation. General-purpose distributed dataflow frameworks execute all steps of such workflows holistically. This holistic view enables these systems to reason about and automatically optimize the entire pipeline. Here, graph and machine learning analytics are known to incur a long runtime since they ...