Is there a scientific paper?,about xai-org/grok-1

JudeDavis1 commented on August 28, 2024 9

A model card would make sense. If there weren't any new per-say techniques then I wouldn't see the need for yet another paper. If there is, then sure!

from grok-1.

Qu3tzal commented on August 28, 2024 5

Is there a scientific paper accompanying this release? I've searched but couldn't find one. I find it odd that the weights would be released but not the research.

Because there's no research underlying it? Nothing new or surprising in the model so far, it's just the same architecture as other MoE LLMs with different data and training compute.
Not every software needs a paper that's going to be rejected at conferences and stay at pre-print stage on ArXiV. :)

from grok-1.

AlexanderPuckhaber commented on August 28, 2024 3

A model card would make sense. If there weren't any new per-say techniques then I wouldn't see the need for yet another paper. If there is, then sure!

Agree that model card (what data Grok was trained on) is crucial for this being truly "open source"

It looks like Grok had a model card back from last November: https://x.ai/model-card/

Training data The training data used for the release version of Grok-1 comes from both the Internet up to Q3 2023 and the data provided by our AI Tutors.

I doubt we'll get an answer any more detailed than "the Internet" and "whatever synthetic data our employees made"

from grok-1.

NatanFreeman commented on August 28, 2024 2

That's a technical report at best though.

Call it what you want, the issue is that it doesn't exist.

from grok-1.

yzlnew commented on August 28, 2024 2

Absolutely needs some experiment details on μTransfer of a MOE model that large, if someone noticed several 'weird' multiplier here.

grok-1/run.py

Lines 31 to 47 in 7050ed2

    
           output_multiplier_scale=0.5773502691896257, 
        
           embedding_multiplier_scale=78.38367176906169, 
        
           model=TransformerConfig( 
        
               emb_size=48 * 128, 
        
               widening_factor=8, 
        
               key_size=128, 
        
               num_q_heads=48, 
        
               num_kv_heads=8, 
        
               num_layers=64, 
        
               attn_output_multiplier=0.08838834764831845, 
        
               shard_activations=True, 
        
               # MoE. 
        
               num_experts=8, 
        
               num_selected_experts=2, 
        
               # Activation sharding. 
        
               data_axis="data", 
        
               model_axis="model",

from grok-1.

Explosion-Scratch commented on August 28, 2024 1

It would be nice to have:

Paper explaining the methodology
Benchmarks
Data this was trained on

from grok-1.

NatanFreeman commented on August 28, 2024 1

Is there a scientific paper accompanying this release? I've searched but couldn't find one. I find it odd that the weights would be released but not the research.

Because there's no research underlying it? Nothing new or surprising in the model so far, it's just the same architecture as other MoE LLMs with different data and training compute. Not every software needs a paper that's going to be rejected at conferences and stay at pre-print stage on ArXiV. :)

Disagree. I think @Explosion-Scratch did a good job pointing out why a paper would be useful in this case.

from grok-1.

JudeDavis1 commented on August 28, 2024

I'm happy as long as the code is up to date and the science is released even if not in an academic setting.

from grok-1.

Qu3tzal commented on August 28, 2024

That's a technical report at best though.

from grok-1.

AsureDay commented on August 28, 2024

from grok-1.

Is there a scientific paper? about grok-1 HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	output_multiplier_scale=0.5773502691896257,
	embedding_multiplier_scale=78.38367176906169,
	model=TransformerConfig(
	emb_size=48 * 128,
	widening_factor=8,
	key_size=128,
	num_q_heads=48,
	num_kv_heads=8,
	num_layers=64,
	attn_output_multiplier=0.08838834764831845,
	shard_activations=True,
	# MoE.
	num_experts=8,
	num_selected_experts=2,
	# Activation sharding.
	data_axis="data",
	model_axis="model",