Skip Navigation

Scout Archives

Home Projects Publications Archives About Sign Up or Log In


DVC is a distributed revision control system and automation framework for data scientists. The DVC documentation is written primarily around machine learning applications, but very similar workflows pop up when performing many other kinds of statistical analysis or simulation. DVC leverages Git to track program code or scripts and provides large file storage using backends such as Amazon S3, Azure Blob Storage, Google Drive, and others. This "large file storage" is meant to cover both original source data files as well as intermediate results (including parameter files for machine learning models, statistical results like fitted curves, and results of simulations). DVC's automation framework allows users to describe the steps in their analysis as stages in a "lightweight pipeline." As users make changes to their scripts and code, DVC can re-run only the stages in the pipeline whose inputs have changed. DVC can track different analyses as "experiments" that are represented as git branches, providing users with a systematic way to store alternative approaches. DVC is written in Python and should run anywhere that Python does. The Download section of the DVC site provides installation instructions for Windows, macOS, and Linux systems.
?  Cumulative Rating:

Resource Comments

(no comments available yet for this resource)