- Create a repository on GitHub with master branch
- Select some big dataset from Kaggle or anyother of your liking (atleast 100MB size)
- Create a remote drive (google drive, S3, etc)
- Clone the repo on your local machine and track it using dvc
- push the dataset (locally present) using dvc
- Create train.py for training some ML technique (using sklearn) on the given data set
- Create DVC pipeline to create a metrics.json file. track it using dvc. Use GithubActions to automate the execution of the dvc pipeline.
- Edit the train.py file and use some other ML technique. Create a pull request with the new file.
- Compare the results like we did in the class and merge the branches if the results are better.
shayanzuberi / mlops_class_task Goto Github PK
View Code? Open in Web Editor NEWDVC Demo