Giter Site home page Giter Site logo

Is there a scientific paper? about grok-1 HOT 10 OPEN

xai-org avatar xai-org commented on August 28, 2024 23
Is there a scientific paper?

from grok-1.

Comments (10)

JudeDavis1 avatar JudeDavis1 commented on August 28, 2024 9

A model card would make sense. If there weren't any new per-say techniques then I wouldn't see the need for yet another paper. If there is, then sure!

from grok-1.

Qu3tzal avatar Qu3tzal commented on August 28, 2024 5

Is there a scientific paper accompanying this release? I've searched but couldn't find one. I find it odd that the weights would be released but not the research.

Because there's no research underlying it? Nothing new or surprising in the model so far, it's just the same architecture as other MoE LLMs with different data and training compute.
Not every software needs a paper that's going to be rejected at conferences and stay at pre-print stage on ArXiV. :)

from grok-1.

AlexanderPuckhaber avatar AlexanderPuckhaber commented on August 28, 2024 3

A model card would make sense. If there weren't any new per-say techniques then I wouldn't see the need for yet another paper. If there is, then sure!

Agree that model card (what data Grok was trained on) is crucial for this being truly "open source"

It looks like Grok had a model card back from last November: https://x.ai/model-card/

Training data The training data used for the release version of Grok-1 comes from both the Internet up to Q3 2023 and the data provided by our AI Tutors.

I doubt we'll get an answer any more detailed than "the Internet" and "whatever synthetic data our employees made"

from grok-1.

NatanFreeman avatar NatanFreeman commented on August 28, 2024 2

That's a technical report at best though.

Call it what you want, the issue is that it doesn't exist.

from grok-1.

yzlnew avatar yzlnew commented on August 28, 2024 2

Absolutely needs some experiment details on μTransfer of a MOE model that large, if someone noticed several 'weird' multiplier here.

grok-1/run.py

Lines 31 to 47 in 7050ed2

output_multiplier_scale=0.5773502691896257,
embedding_multiplier_scale=78.38367176906169,
model=TransformerConfig(
emb_size=48 * 128,
widening_factor=8,
key_size=128,
num_q_heads=48,
num_kv_heads=8,
num_layers=64,
attn_output_multiplier=0.08838834764831845,
shard_activations=True,
# MoE.
num_experts=8,
num_selected_experts=2,
# Activation sharding.
data_axis="data",
model_axis="model",

from grok-1.

Explosion-Scratch avatar Explosion-Scratch commented on August 28, 2024 1

It would be nice to have:

  • Paper explaining the methodology
  • Benchmarks
  • Data this was trained on

from grok-1.

NatanFreeman avatar NatanFreeman commented on August 28, 2024 1

Is there a scientific paper accompanying this release? I've searched but couldn't find one. I find it odd that the weights would be released but not the research.

Because there's no research underlying it? Nothing new or surprising in the model so far, it's just the same architecture as other MoE LLMs with different data and training compute. Not every software needs a paper that's going to be rejected at conferences and stay at pre-print stage on ArXiV. :)

Disagree. I think @Explosion-Scratch did a good job pointing out why a paper would be useful in this case.

from grok-1.

JudeDavis1 avatar JudeDavis1 commented on August 28, 2024

I'm happy as long as the code is up to date and the science is released even if not in an academic setting.

from grok-1.

Qu3tzal avatar Qu3tzal commented on August 28, 2024

That's a technical report at best though.

from grok-1.

AsureDay avatar AsureDay commented on August 28, 2024

image

from grok-1.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.