Giter Site home page Giter Site logo

gpt-2's Introduction

Status: Archive (code is provided as-is, no updates expected)

gpt-2

Code and models from the paper "Language Models are Unsupervised Multitask Learners".

You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post.

We have also released a dataset for researchers to study their behaviors.

* Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper). Thus you may have seen small referred to as 117M and medium referred to as 345M.

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

For basic information, see our model card.

Some caveats

  • GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
  • The dataset our GPT-2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
  • To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

Work with us

Please let us know if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying

  • Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
  • The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Development

See DEVELOPERS.md

Contributors

See CONTRIBUTORS.md

Citation

Please use the following bibtex entry:

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

License

Modified MIT

gpt-2's People

Contributors

albertwujj avatar armaanbhullar avatar christopherhesse avatar cookee12 avatar github30 avatar imgntn avatar jackclarksf avatar madisonmay avatar memo avatar minimaxir avatar mrene avatar natemurthy avatar rememberlenny avatar webproduktion01 avatar wuthefwasthat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt-2's Issues

Release The Full Model!

I understand your concerns but I still think it's better to release the full model now and let people poke at it's abilities and discover potential issues quicker.

generate_unconditional_samples.py Only return {text}

I used tensorflow_gpu (didn't have the patience to use the cpu)

No errors, but when I run generate_unconditional_samples.py with and without flags only return {text}

======================================== SAMPLE 1 ========================================
{text}
======================================== SAMPLE 2 ========================================
{text}
======================================== SAMPLE 3 ========================================
{text}
======================================== SAMPLE 4 ========================================
{text}
======================================== SAMPLE 5 ========================================
{text}
======================================== SAMPLE 6 ========================================
{text}

Can you please advise?

Any plans to release WebText corpus?

I've seen #16 and appreciate the valid concerns raised about releasing the model, but the WebText corpus could be a tremendous help to general research if you were able to release it.

Are there plans to do so?

I did wonder if this might simply enable people to recreate the unreleased GPT-2 but presumably this is no trivial matter, needing expertise and time/resources, thus deterring the causal mischief maker!

Anyway, whatever you end up doing, I wanted to thank you for what you have released already which is really interesting 🙂

License

Hi, is there a license associated with this/any plans towards MIT licensing this model?

Requirements on Arch Linux

These who use archlinux should not follow the instructions but try to
pacman -S python-regex tensorflow
aurman -S python-fire
After uninstall everything that will be install by pacman.
I did piping install the sample did nothing and exit

Sampling code flags descriptions (support for --help?)

Is there a list of the flags for both conditional and unconditional models with their definitions?
(I looked in the blog and paper and couldn't find any mention.)

In particular, for reproducibility purposes, it'd be great to know the definition of temperature and top_k and how choosing different values for these affect the results.

Thanks!

Integrate a training feature to pass JSON markup convo data

Other alternatives to this software have allowed such a system to exist problem free, cakechat by Replika.ai for example; allows and encourages people to check out the 40GB reddit corpus you guys build GPT-2 from. I was getting highly similar results with altering their token model and marking up the JSON with emotion compared to what I am seeing with your currently released solution.

Examples of things more dangerous than your software that turned out safer than imagined...:

Example #1: Fusor.net provides detailed instructions on how to build a nuclear fusion reactor, 12 year olds in south america have even made them. Yet Bogota hasn't become Chernobyl.

Example #2: 3D printed weapons have been around for years and actually are far more safe / less accessible than normal weapons. Nobody has ever been harmed by one.

Example #3: The early days of Bitcoin, although they were accompanied by some awful things - didn't do anything as bad as what the Sinaloa cartel managed under El Chapo daily for decades unchecked.

I'll protest this until I get my own copy of this software. It's a human right for you guys to release your full, open and honest production software instead of choose profit and proprietary.

sh doesn't do anything on Windows 10

Hello, what operating system do the instructions apply to? sh doesn't do anything on Windows 10. How would I install this on Win10?

Also, is the first step to clone the repo? The instructions don't seem to make sense otherwise.

Thanks.

Installation question

Is the Docker installation an alternative to Native installation or are both needed?

Can't install sh_download_model.sh

Noob here (linguist, with rudimentary knowledge of computers) I've installed the gcloud sdk but I can't get the command: sh download_model.sh 117M to run. I get: 'sh' is not recognized as an internal or external command.
Any help would be greatly appreciated.

How to download on windows?

I'm using Windows 10, 64x. Could someone please explain to a novice how to download this?

  1. Following the instructions, I downloaded gsutil and started a new configuration. Run:
    sh download_model.sh 117M
    But receive error:
    download_model.sh: download_model.sh: No such file or directory

  2. Tried both:
    gsutil cp -r dir gs://gpt-2/models/117M
    gsutil cp -r dir gs://gpt-2/models/models
    But receive error:
    AccessDeniedException: 403 [my gmail] does not have storage.objects.list access to gpt-2.

  3. Tried the solution from loretoparisi at
    https://github.com/loretoparisi/gpt-2/blob/master/download_model.sh
    But I think I'm doing something wrong here. I downloaded Curl and Grep, then created a .bat file in Notepad++ with his script. Executed the file, but it only opens and closes.

Any help would be greatly appreciated.
Thanks,
Pete

Syntax error in sh_download_model.sh

When running "sh download_model.sh 117M", I am told that there is a syntax error on line 14:

'ownload_model.sh: line 14: syntax error near unexpected token do download_model.sh: line 14: for filename in checkpoint encoder.json hparams.jso' model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta vocab.bpe; do

Unfortunately, I don't know how to write shell scripts and can't troubleshoot this myself, but I don't see this error reported anywhere else.

encoder.json download error

The download_model.sh script can't download encoder.json from online repository. help me! thanks in advance.

Issue with gsutil download_model.sh

Hi,

I'm not familiar with gsutil. Installed it freshly using the 6 steps of :
https://cloud.google.com/storage/docs/gsutil_install

Upon running the script :

When I'm not logged in on cloud.

ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//checkpoint.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//encoder.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//hparams.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.data-00000-of-00001.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.index.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.meta.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//vocab.bpe.

When I'm logged in on cloud :


AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.

Thanks

\\ issue in Windows

When running GitHub\gpt-2\src>generate_unconditional_samples.py
Getting this issue ? any -idea is it due to \ ?
FileNotFoundError: [Errno 2] No such file or directory: 'models\117M\encoder.json'

模型依然没法下载

0 0 0 0 0 0 0 0 --:--:-- 0:01:15 --:--:-- 0curl: (7) Failed to connect to drive.google.com port 443: Operation timed out

你好,我们(**)这里还是没法下载。真心请把模型文件放到一个网盘。或者是否可以直接把模型发送到我的邮箱:[email protected]

感谢感谢啦。

Is it fine not use Softmax after forwarding?

@WuTheFWasThat
image
I found something different from the paper.
https://github.com/openai/gpt-2/blob/master/src/model.py#L171
It may be mabe some error when tf.multinomial(logits, num_samples=1, output_dtype=tf.int32) function
because there will be value less than zero
I try with softmax but it does not shows good result for text-generator..
This is result when use softmax function
image
Not using softmax is your original meant?

I added softmax before multinomial sampling when implement to Pytorch to avoid encountering probability entry < 0
https://github.com/graykode/gpt-2-Pytorch/blob/master/sample.py#L43

Generate conditional output based on input keywords

I think it would be nice if it would be possible to generate an output sentence based on input keywords, where the lenght of the sentence, or the input/output words ratio could be fine tuned. In this way we could create a higher system (motive) which could use this model to generate a guided conversation.

Charmap error

File "src/generate_unconditional_samples.py", line 55, in
fire.Fire(sample_model)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 366, in _Fire
component, remaining_args)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "src/generate_unconditional_samples.py", line 52, in sample_model
print(text)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2015' in position 1410: character maps to

My CPU doesn't support Tensorflow AVX instructions

I was able to install all the requirements. However while generating samples, getting the following error. I have an Intel i3 First gen Processor and running Ubuntu 18.

2019-02-16 03:12:49.453982: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
Aborted (core dumped)

I then installed Tensorflow 1.5 (pip3 install tensorflow==1.5). The sample was generated, however another warning popped up as shown below. Will this affect the quality? Do I need to compile TensorFlow on my system?

2019-02-16 03:22:19.785441: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2

Window size error when trying to sample

I'm trying to sample from 117M using the following commands:

python3 src/generate_unconditional_samples.py
python3 src/generate_unconditional_samples.py | tee samples
python3 src/interactive_conditional_samples.py

And I get the following error:

File "src/interactive_conditional_samples.py", line 34
raise ValueError(f"can't get samples longer than window size: {hparams.n_ctx}")
^
SyntaxError: invalid syntax

Translation task

What was the format for translation task?
Do you provide sequence of pairs delimited by new lines, e.g. "sentence1 = translation_of_sentence1 \n sentence2 = translation_of_sentence2 \n ... \n testing_sentence = "?
Does the training dataset consist of similar format translations?

[PROPOSAL] Allow remote testing of larger models

Your paper and this implementation could mark a significant evolution in UMLs.
Especially (for what I'm concerned) in CoQa & translation.
It would be nice to kick the tires of the full model even remotely, without having direct access to it.
Would you consider setting up -say- a MQTT server with some channels to be able to interact with a couple of functionalities?
Abuse of this service could be moderated by simply pruning out suspicious requests &&/|| by sending authorization to validated mail.

--R

Help doing transfer learning to generate spanish-language text?

Hi! Amazing results 😮
I know this is an open-ended and lazy question, but I'd appreciate if you could give me some pointers into how to re-train the model with additional text in another language (e.g spanish). I already have a small (6 MB) dataset in spanish, and I'm not very well versed in ML but I'm curious about playing with your model.
Thanks! I'll be sure to report results back if I somehow figure it out :)

-

Download.sh fails: + gsutil cp gs://gpt-2/models//model.ckpt.meta models/
AccessDeniedException: 403 [user] does not have storage.objects.list access to gpt-2.

Error when models folder does not exist

When models folder does not exist, I am not able to set up using the instructions provided:

gpt-2 $sh download_model.sh 117M
mkdir: models: No such file or directory

Which python should be used? 3.7.2 catches exception

Using 3.7.2 on any tutorial cmd

python3 src/interactive_conditional_samples.py --top_k 40
/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
return f(*args, **kwds)
2019-02-19 17:51:24.116332: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "src/interactive_conditional_samples.py", line 68, in
fire.Fire(interact_model)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "src/interactive_conditional_samples.py", line 42, in interact_model
temperature=temperature, top_k=top_k
File "/Users/steinmacht/gpt-2/src/sample.py", line 76, in sample_sequence
back_prop=False,
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3291, in while_loop
return_same_structure)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3004, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2924, in _BuildLoop
c = ops.convert_to_tensor(pred(*packed_vars))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3259, in
math_ops.logical_and(i < maximum_iterations, orig_cond(*lv)))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4365, in logical_and
"LogicalAnd", x=x, y=y, name=name)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 542, in make_tensor_proto
append_fn(tensor_proto, proto_values)
File "tensorflow/python/framework/fast_tensor_util.pyx", line 134, in tensorflow.python.framework.fast_tensor_util.AppendBoolArrayToTensorProto
File "/usr/local/lib/python3.7/site-packages/numpy/lib/type_check.py", line 547, in asscalar
return a.item()
UnboundLocalError: local variable 'a' referenced before assignmen

Powershell script to do the same than the download_model.sh does

I have had to manually download the model in Windows10, because even when using bash, the path is not exported. I can contribute a powershell script to download the model without using bash, using powershell. I don't see any reason of this being bad.

Would that be desirable?

Errors during model downloading

When I try to download the model on my Ubuntu Linux 14.04 LTS box I get the following errors from gsutil:

$  bash download_model.sh 117M
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.

Also, do you need a CUDA card just to run the model?

bug in encoder.py

Add encoding="utf-8" line 111, encoder.py
or won't work on windows ...
with open(os.path.join('models', model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:

模型下载失败

你好。
我是python3的环境,可是gsutil requires python 2.7。除了通过gsutil,请问有没有其他的下载链接方式呢?谢谢。如果可以,是否可以发送到我邮箱:[email protected]。谢谢

ModuleNotFoundError: No module named 'src'

maxwoolf$ sudo python3 src/generate_unconditional_samples.py
Traceback (most recent call last):
  File "src/generate_unconditional_samples.py", line 9, in <module>
    from src import model, sample, encoder
ModuleNotFoundError: No module named 'src'

Python can't import from a subfolder unless it's on the Python path.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.