During our recent exploration of ML collaboration tool-suites, we came across dvc, a well-established open source solution developed among others by the folks at Open Data Science, a community beloved by Russian speaking data scientists.
We'd like to give it a try since it fits really well with our values at source{d} and solves the core part of our problems when we experiment:
- it's open source;
- it relies heavily on git and git-like mechanisms;
- it doesn't try to solve everything in one huge single entry-point solution but rather tackles the core problems and let us free for the rest.
To try dvc, the first step is to use it individually in one or two projects:
The second step is to have the ability to share the large data files and results for good teamwork and collaboration. To test this, two things are needed:
During our recent exploration of ML collaboration tool-suites, we came across
dvc, a well-established open source solution developed among others by the folks at Open Data Science, a community beloved by Russian speaking data scientists.We'd like to give it a try since it fits really well with our values at source{d} and solves the core part of our problems when we experiment:
To try
dvc, the first step is to use it individually in one or two projects:dvcfor the CodRep task in https://github.com/src-d/formatml/;dvcfor the topic modeling experiments (if @m09's first experiments are promising) in https://github.com/src-d/tm-experiments.The second step is to have the ability to share the large data files and results for good teamwork and collaboration. To test this, two things are needed:
dvcremote on our ML Cluster;