coreweave / dataset-downloader Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi! I'm taking a look at CoreWeave's LLM documentation, and I'm trying to get GPT-J fine-tuned with the example dataset.
There seems to be an issue with the dataset-downloader
-- my PVC does not get populated with the dataset.
I'm currently trying to understand if there's an issue with the path being passed to it, or if the internals of main.go
aren't working correctly. Debugging this is a bit more involved because the dataset-downloader
is running in a GitHub-produced distroless container, so it will take some extra effort to inspect the state during run time.
My K8s logs for the relevant pod only show this:
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/140
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/0
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/20
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/40
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/60
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/80
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/100
2023/01/17 20:04:12 Getting book links from https://www.smashwords.com/books/category/1245/downloads/0/free/any/120
I'd expect the logs to also show something like the following (from this Printf statement):
Downloaded XYZ to /data/finetune-data/dataset/xyz
Anyway, I'm still debugging this, but just wanted to reach out and see if you have suggestions in the meantime. Thanks in advance!
We should make it more useful and less hard-coded, as it's hard-coded to Western Romance
, for example.
goroutines
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.