Comments (10)
We definately should try on eleuther eval harness. but just testing validity loss will tell us something too. regular finetuing vs. expert finetuning + merge
from mdel.
We have an issue for Eval Harness in the backlog.
from mdel.
so i am told that:
It seems they were trained on 1k batches
I think the batch size was 8 because of the number of GPUs
So that gives uhs 8k samples
So the above 1000 examples should be 8K examples.
from mdel.
Have you tried using the EleutherAI eval harness? It should give you a nice representation on how well the model performs, and can be used as an indicator?
from mdel.
I didn't understand the part about the 1000 training examples. Our datasets are much bigger than that!
from mdel.
didn't we just train our models on 1000 examples only? Or did i misunderstand that
from mdel.
@ontocord for 2. we want layer_9,10,11,12,13 ?
from mdel.
@jordiclive @ontocord
We had used layers 9-13 when we trained the experts. See: https://github.com/ontocord/MDEL/blob/main/src/mdel/train.sh#L4
from mdel.
@jordiclive Any updates on this issue?
from mdel.
@mrcabbage972 I trained 1. a model (all layers) on the exact splits...https://wandb.ai/ontocord/jordi_testing/runs/hu8j9ta1?workspace=user-jordanclive if you toggle the evaluation.
But I then thought we decided on automating the experiment again with more training data/less validation, maybe same amount of final testing data #47
from mdel.
Related Issues (20)
- Dataset generation open issues
- Report val loss aggregated by data origin
- Fix HF Hub Upload Error
- Add script for merging expert models via weight averaging HOT 7
- Integrate with LLM evaluation frameworks HOT 3
- Expert merging: c-BTM HOT 3
- Training instruction followers as composable layers and expert layers HOT 1
- inputs_ids cast to fp16 in deeperspeed bug
- Setup separate environments on Redmond.ai box HOT 2
- Automatic Training Scripts for All Expert Models
- Stabilize Training on Redmond Box
- Evaluate a merged expert model's perplexity HOT 3
- Investigate Expert Models Having High Perplexity HOT 1
- Create template for HF dataset config
- Train 2nd batch of expert models
- Get all relevant data for StarCoder into LUMI
- Tokenize the StarCoder dataset HOT 1
- Set up the training configuration
- Do a small test run
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mdel.