News & Insights
Blumberg Capital portfolio news, startup growth resources and industry insights
Pachyderm Challenges Hadoop with Containerized Data Lakes
May 10, 2016
By Susan Hall
The folks at Pachyderm believe there’s an elephant in the room when it comes to data analytics, namely the weaknesses of Hadoop. So they set out to build an Hadoop alternative based on containers, version one of which has just been released.
The Pachyderm stack uses Docker containers as well as CoreOS and Kubernetes for cluster management. It replaces HDFS with its Pachyderm File System and MapReduce with Pachyderm Pipelines.
The company has focused on fundamental principles including reproducibility, data provenance and, most importantly, collaboration – a feature they say has been sorely missing from the Big Data world and one that has generated the most excitement from potential users, according to CEO and co-founder Joe Doliner.