We would like to create a for creating a merged model by averaging expert weigh

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add script for merging expert models via weight averaging about mdel HOT 7 OPEN

huu4ontocord commented on July 30, 2024

Add script for merging expert models via weight averaging

from mdel.

Comments (7)

kenhktsui commented on July 30, 2024

@mrcabbage972 I am interested to help! We could use lm-evaluation-harness to benchmark the merged model.
The seedLM EleutherAI/pythia-1b-deduped will be a great baseline.

from mdel.

mrcabbage972 commented on July 30, 2024

@kenhktsui Great, please assign the ticket to yourself!

Regarding lm-evaluation-harness, can you please create a separate issue for that and add the details (e.g. on which tasks we are going to test)?

from mdel.

kenhktsui commented on July 30, 2024

@mrcabbage972 I had added the evaluation ticket.

For the merge, let's align and define terminology as I see there are different implementations so that we could assign different tickets to different contributors:

c-BTM - which is a weighted logits of next token prediction
element-wise averaging/ blending of model parameters
mixture-of-experts

from mdel.

mrcabbage972 commented on July 30, 2024

@kenhktsui Let's keep this ticket as element-wise averaging.
I created a separate one for c-BTM.

from mdel.

kenhktsui commented on July 30, 2024

@mrcabbage972 I think ticket has been done by Concedo and TeH_Venom. I would like to work on the c-BTM ticket.

from mdel.

mrcabbage972 commented on July 30, 2024

@kenhktsui The version of Concedo's script that I saw only merges two experts, we need a solution to merge N.

To close the ticket, I think what is needed is a PR that:

adds the script to the repo
Extends it to support merging of N experts
Adds a section in the readme with usage instructions

If you prefer to focus on the c-BTM ticket, I can take this one.

from mdel.

mrcabbage972 commented on July 30, 2024

May be able to load the models layer by layer

from mdel.

Recommend Projects

Add script for merging expert models via weight averaging about mdel HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent