Skip Navigation

Scout Archives

Home Projects Publications Archives About Sign Up or Log In

DVC

DVC is a distributed revision control system and automation framework for data scientists. The DVC documentation is written primarily around machine learning applications, but very similar workflows pop up when performing many other kinds of statistical analysis or simulation. DVC leverages Git to track program code or scripts and provides large file storage using backends such as Amazon S3, Azure Blob Storage, Google Drive, and others. This "large file storage" is meant to cover both original source data files as well as intermediate results (including parameter files for machine learning models, statistical results like fitted curves, and results of simulations). DVC's automation framework allows users to describe the steps in their analysis as stages in a "lightweight pipeline." As users make changes to their scripts and code, DVC can re-run only the stages in the pipeline whose inputs have changed. DVC can track different analyses as "experiments" that are represented as git branches, providing users with a systematic way to store alternative approaches. DVC is written in Python and should run anywhere that Python does. The Download section of the DVC site provides installation instructions for Windows, macOS, and Linux systems.
Archived Scout Publication URL
Scout Publication
Publisher
GEM Subject
Language
Date of Scout Publication
December 4th, 2020
Date Of Record Creation
November 25th, 2020 at 12:21pm
Date Of Record Release
December 1st, 2020 at 10:00am
Resource URL Clicks
270
Add Comment

Comments

(no comments available yet)