Giter Site home page Giter Site logo

Comments (7)

woctezuma avatar woctezuma commented on August 25, 2024 18

First, run:

import gpt_2_simple as gpt2

gpt2.download_gpt2(model_name='345M')

Then:

!mkdir -p checkpoint/
!cp -r models/345M checkpoint/run1

Finally:

import gpt_2_simple as gpt2

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess)

gpt2.generate(sess)

from gpt-2-simple.

woctezuma avatar woctezuma commented on August 25, 2024 3

With version 0.6, you should be able to directly run:

import gpt_2_simple as gpt2

model_name='774M'

gpt2.download_gpt2(model_name=model_name)

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, model_name=model_name)

gpt2.generate(sess, model_name=model_name)

from gpt-2-simple.

woctezuma avatar woctezuma commented on August 25, 2024 2

True. It seems that there is a bit of redundancy w.r.t. parameters being loaded by both load_gpt2 and generate. However, by looking at the code, the generate function seems to require the model name to be able to encode the prefix. Paging @minimaxir, because he knows better than me.

def load_gpt2(sess,
              [...]
              multi_gpu=False):
    """Loads the model checkpoint or existing model into a TensorFlow session
    for repeated predictions.
    """
   [...]

    hparams = model.default_hparams()
    with open(os.path.join(checkpoint_path, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

   [...]
def generate(sess,
             [...]
             include_prefix=True):
    """Generates text from a model loaded into memory.
    Adapted from:
    https://github.com/openai/gpt-2/blob/master/src/interactive_conditional_samples.py
    """
   [...]

    enc = encoder.get_encoder(checkpoint_path)
    hparams = model.default_hparams()
    with open(os.path.join(checkpoint_path, 'hparams.json')) as f:
        hparams.override_from_dict(json.load(f))

    if prefix:
        context = tf.compat.v1.placeholder(tf.int32, [batch_size, None])
        context_tokens = enc.encode(prefix)

   [...]

from gpt-2-simple.

chrisrytting avatar chrisrytting commented on August 25, 2024 1

I'm confused @woctezuma . Why do you have to indicate model_name to the generate method? Shouldn't gpt2 contain the parameters once you load it in your penultimate line?

from gpt-2-simple.

avinregmi avatar avinregmi commented on August 25, 2024

@minimaxir

from gpt-2-simple.

jielyugt avatar jielyugt commented on August 25, 2024

The redundancy is confusing. What happens if I load a checkpoint trained on 124M for load_gpt2(), and load 124M without any fine-tuning for generate()?

from gpt-2-simple.

chrisrytting avatar chrisrytting commented on August 25, 2024

Yeah this seems weird to me too because if @woctezuma is right, then you hypothetically could use different checkpoints of the same model and not have any errors thrown. We'll see if @minimaxir can shed some light.

from gpt-2-simple.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.