The framework for fast data science

Go from data extraction to business value in days, not months.
Build on top of open source tech, using Silicon Valley's best practices.

60% of Big Data projects will fail this year
Gartner

big things start small

If you are building your advanced analytics internally, we're here for you.
Our framework will help you get your initial win.

Our solution

We provide a framework so you can focus on the data itself,
without need to build yourself a complex infrastructure.

start small

Get started on your laptop: standard tools available out of the box

templates

Quickly extract data from any source, thanks to our template system

interactive code env.

Experiment with notebooks for data cleaning and transformations

clean data

Organize and partition your data in a way that scales

new generation ETL

Share your results and automate with our powerful CLI

from dev to prod

Deploy to the cloud when needed

Foundations

dsflow relies on the leading open source projects:

Apache Spark

Spark is a general-purpose distributed compute engine. It ships with Data connectors and Spark SQL.

Apache Airflow

Airflow is a job scheduler: it is used for ETL

Docker containers

Docker containers simplify environment setups.

Jupyter notebooks

Jupyter provides a rich interactive interface for code, with iPython notebooks.

Our motivation

A decade ago, Ruby on Rails and Heroku marked a major evolution for web developers. A much wider public was able to code and ship complex application using state of art methodologies: clean codebase, tests, CI, etc.

We believe that Data scientists deserve such a revolution: dsflow enables them to draw quick value from data, without compromising on code quality and scalability.

Beta user mailing list