Comments (3)
@ontocord Can you please review the description of this issue? This is an important one, so I'd like to make sure we're aligned on the details.
from mdel.
Perplexity script was run on all combinations of:
Model: expert-arxiv, expert-freelaw, expert-github, EleutherAI/pythia-1b-deduped
Dataset: arxiv, freelaw, github
Split: train, validation_domain, validation_pile
I will post the full results of the 36 experiments below, but to highlight the confusion parts
Perplexity of the training set is higher on the expert than the base. When I looked in WandB I found a run titled pythia-1b-deduped-arxiv
so assuming that is the run that goes with this model, loss was going down as you can see in this chart
![Screenshot 2023-05-17 at 11 06 32 AM](https://private-user-images.githubusercontent.com/8527894/238994218-a5149ebc-74c0-49a6-84be-e4e6e8c9227b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNzcxODcsIm5iZiI6MTcyMjI3Njg4NywicGF0aCI6Ii84NTI3ODk0LzIzODk5NDIxOC1hNTE0OWViYy03NGMwLTQ5YTYtODRiZS1lNGU2ZThjOTIyN2IucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcyOSUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MjlUMTgxNDQ3WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MWQ0MDQyN2IxMmMyYmJkYWIxZGIwOTAwZjFkZDUxN2JhZDk4NWY5N2Y2NmRhYmVjNjM3MWQ1Zjc3YjFjNzVlMyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.3CU3Zc3a8LU6X4IsBT6pzXJpZCbgwdS4VtoUR95MKWU)
All of the datasets showed the expert having higher complexity than the base on both the training split and the validation_domain split.
distilgpt2 had very high perplexity on the datasets as expected.
Here are the complete results
{"date": 1684281437.6640549, "runtime": 58.9195, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.1806483251580175}
{"date": 1684281438.6176283, "runtime": 59.0754, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.589675635229376}
{"date": 1684281442.473437, "runtime": 59.5134, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.7452095055781855}
{"date": 1684281445.6582773, "runtime": 59.3279, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.255214207031588}
{"date": 1684281446.2353806, "runtime": 59.2127, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.366141232556662}
{"date": 1684281446.580657, "runtime": 63.7466, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.561219412349417}
{"date": 1684281450.5335333, "runtime": 62.0645, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.331658572574814}
{"date": 1684281465.1216207, "runtime": 78.9296, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.827975770776606}
{"date": 1684281498.398999, "runtime": 118.9044, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.036974293841685}
{"date": 1684281501.5622356, "runtime": 118.0725, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.012206926101761}
{"date": 1684281503.5901659, "runtime": 118.9363, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.075443609481571}
{"date": 1684281504.2154381, "runtime": 119.622, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 6.14366348878873}
{"date": 1684281505.1637416, "runtime": 118.9924, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.332868620596422}
{"date": 1684281507.152357, "runtime": 119.2627, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.367354075345387}
{"date": 1684281516.2976973, "runtime": 133.1081, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.18129673498018}
{"date": 1684281521.0476763, "runtime": 136.0308, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.25646632845322}
{"date": 1684281792.4260871, "runtime": 412.6193, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 5.603551498049122}
{"date": 1684281796.6836188, "runtime": 416.5038, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 5.679911295033531}
{"date": 1684281798.704142, "runtime": 418.6623, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 5.672741968171792}
{"date": 1684281801.901714, "runtime": 415.2171, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_domain", "max_length": 1024, "dataset_key": "text", "perplexity": 5.675035573566157}
{"date": 1684282123.3902223, "runtime": 742.2889, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.301404276401705}
{"date": 1684282128.2113533, "runtime": 745.8973, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.335512205839025}
{"date": 1684282135.2589061, "runtime": 749.2396, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.226164572185025}
{"date": 1684282153.224812, "runtime": 766.9294, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/github", "split": "validation_pile", "max_length": 1024, "dataset_key": "text", "perplexity": 6.151672905395438}
{"date": 1684284992.2658331, "runtime": 228.1556, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/github", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 5.625481166685019}
{"date": 1684284993.4468331, "runtime": 229.1428, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/github", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 5.703393393203626}
{"date": 1684284993.8094637, "runtime": 229.3736, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/github", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 5.696060657477567}
{"date": 1684284994.4632788, "runtime": 229.3474, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/github", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 5.698515176536461}
{"date": 1684285044.694415, "runtime": 228.2315, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.013907204773048}
{"date": 1684285045.2155526, "runtime": 229.1972, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.0760585282682245}
{"date": 1684285045.391916, "runtime": 229.2373, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.144929907359411}
{"date": 1684285045.5122395, "runtime": 229.2458, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/freelaw", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.038489375419607}
{"date": 1684285211.8575993, "runtime": 229.2016, "model": "EleutherAI/pythia-1b-deduped", "tokenizer": "EleutherAI/pythia-1b-deduped", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.539700840094572}
{"date": 1684285236.932937, "runtime": 228.7231, "model": "Multi-Domain-Expert-Layers/expert-arxiv", "tokenizer": "Multi-Domain-Expert-Layers/expert-arxiv", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.572989707328704}
{"date": 1684285238.87368, "runtime": 229.1865, "model": "Multi-Domain-Expert-Layers/expert-github", "tokenizer": "Multi-Domain-Expert-Layers/expert-github", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.801440338123199}
{"date": 1684285240.2605352, "runtime": 229.3909, "model": "Multi-Domain-Expert-Layers/expert-freelaw", "tokenizer": "Multi-Domain-Expert-Layers/expert-freelaw", "dataset": "Multi-Domain-Expert-Layers/arxiv", "split": "train", "max_length": 1024, "dataset_key": "text", "perplexity": 6.722750104407284}
from mdel.
Thanks @Stillerman! Let's close this issue?
from mdel.
Related Issues (20)
- Dataset generation open issues
- Report val loss aggregated by data origin
- Fix HF Hub Upload Error
- Add script for merging expert models via weight averaging HOT 7
- Integrate with LLM evaluation frameworks HOT 3
- Expert merging: c-BTM HOT 3
- Training instruction followers as composable layers and expert layers HOT 1
- Train baseline models for evaluation HOT 10
- inputs_ids cast to fp16 in deeperspeed bug
- Setup separate environments on Redmond.ai box HOT 2
- Automatic Training Scripts for All Expert Models
- Stabilize Training on Redmond Box
- Investigate Expert Models Having High Perplexity HOT 1
- Create template for HF dataset config
- Train 2nd batch of expert models
- Get all relevant data for StarCoder into LUMI
- Tokenize the StarCoder dataset HOT 1
- Set up the training configuration
- Do a small test run
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mdel.