Try DVC as a collaboration workflow in RML

During our recent exploration of ML collaboration tool-suites, we came across [`dvc`](https://dvc.org/), a well-established open source solution developed among others by the folks at [Open Data Science](https://ods.ai/), a community beloved by Russian speaking data scientists.

We'd like to give it a try since it fits really well with our values at source{d} and solves the core part of our problems when we experiment:
- it's open source;
- it relies heavily on git and git-like mechanisms;
- it doesn't try to solve everything in one huge single entry-point solution but rather tackles the core problems and let us free for the rest.

To try `dvc`, the first step is to use it individually in one or two projects:
- [x] @m09 will set up `dvc` for the CodRep task in https://github.com/src-d/formatml/;
- [ ] @m09 or @r0mainK will set up `dvc` for the topic modeling experiments (if @m09's first experiments are promising) in https://github.com/src-d/tm-experiments.

The second step is to have the ability to share the large data files and results for good teamwork and collaboration. To test this, two things are needed:
- [x] Set up a `dvc` remote on our ML Cluster;
- [ ] Use it in a test project to see if it enhances teamwork, probably https://github.com/src-d/tm-experiments since we need to collaborate on it for dev2dev similarity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try DVC as a collaboration workflow in RML #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Try DVC as a collaboration workflow in RML #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions