Go from data extraction to business value in days, not months.
Build on top of open source tech, using Silicon Valley's best practices.
We provide a framework so you can focus on the data itself,
without need to build yourself a complex infrastructure.
Get started on your laptop: standard tools available out of the box
Quickly extract data from any source, thanks to our template system
Experiment with notebooks for data cleaning and transformations
Organize and partition your data in a way that scales
Share your results and automate with our powerful CLI
Deploy to the cloud when needed
dsflow relies on the leading open source projects:
Spark is a general-purpose distributed compute engine. It ships with Data connectors and Spark SQL.
Airflow is a job scheduler: it is used for ETL
Docker containers simplify environment setups.
Jupyter provides a rich interactive interface for code, with iPython notebooks.
A decade ago, Ruby on Rails and Heroku marked a major evolution for web developers. A much wider public was able to code and ship complex application using state of art methodologies: clean codebase, tests, CI, etc.
We believe that Data scientists deserve such a revolution: dsflow enables them to draw quick value from data, without compromising on code quality and scalability.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.— Josh Wills (@josh_wills) May 3, 2012