khuyentran1401 / data-science-template Goto Github PK
View Code? Open in Web Editor NEWTemplate for a data science project
Template for a data science project
Hi there!
Besides dvc
, pip
, and poetry
, I've also noticed that some tools that are used in some branches are simply not used in others. I honestly didn't see the reason for that since there isn't an alternative being used for a replacement of them...
hydra |
flake8 |
prefect |
|
---|---|---|---|
dvc-poetry |
✅ | ❌ | ❌ |
dvc-pip |
✅ | ❌ | ❌ |
prefect-poetry |
❌ | ✅ | ✅ |
prefect-poetry
use hydra
?dvc-poetry
and dvc-pip
use prefect
or flake8
?I understood the reason of having dvc-poetry
and dvc-pip
as pip
and poetry
are two different strategies to manage packages, and therefore two different templates are needed. However, prefect-poetry
seems unnecessary as prefect
doesn't overlap the other tools' goals.
It seems that you started with one model and then you recreated it, thus abandoning some tools and adopting others. I am not sure... If PR are welcoming, I propose to adopt the same set of tools, unless they are conflicting or overlapping. Let me understand what is your idea so that I can collaborate with your project :)
Hi @khuyentran1401 !
I just saw in cookiecutter/cookiecutter#1881 that cookiecutter
finally has the feature of adding human-readable prompts to the different variables. This enables us to create a more sophisticated data science template.
My initial thoughts is to make a step further in what I did in #18 (I didn't check exactly how it looked like since you made some modifications). My initial idea is to categorize all this giant universe of Machine Learning tools regarding its functionalities (logging, orchestration, data storage, Python linter and code formatter, etc), and then list all tools so that the user may choose one. Therefore, my initial idea is: to create a heavily modularized data science template, in which the final template structure depends on the tools opted by the user, but at the same time to ensure that the directory structure don't vary too much.
However, I am not sure if you share the same goal as me. I just saw that you removed DVC, so you may have some considerations to do regarding this goal.
What do you think about it?
Hey Khuyen,
Thanks for creating this easy-to-use template. I really like the simple approach.
I tried creating a template based on the instructions provided. Thus far, it works but with some issues around this line (I think):
python = "{{ cookiecutter.author_name }}"
in the project.toml file.
It's referencing the author's name as opposed to the compatible_python_versions
.
First, awesome template. I've been doing some research on what's changed on templating data science projects over the last few years and came across yours.
Wasn't sure what the best route to reach out was so thought I'd just present it to you in an issue, I built a templating engine very similar to cookie cutter but includes the ability to include python functions as plugins for a command line interface. Thought you mind find it interesting / useful if you find maintaining / extending the Makefile annoying.
https://angreal.github.io/angreal/
Again - awesome work !
Hello, have you tried mkdocs for documentation?
I think it's more beautiful and complete. Are you accepting PR for this repo?
Thanks.
Update hydra to version 1.2 to stop changing the runtime working directory to the job's output directory by default.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.