openai / gpt-2 Goto Github PK

View Code? Open in Web Editor NEW

22.1K 636.0 5.5K 4.38 MB

Code for the paper "Language Models are Unsupervised Multitask Learners"

Home Page: https://openai.com/blog/better-language-models/

License: Other

Python 100.00%

paper

gpt-2's Introduction

Status: Archive (code is provided as-is, no updates expected)

gpt-2

Code and models from the paper "Language Models are Unsupervised Multitask Learners".

You can read about GPT-2 and its staged release in our original blog post, 6 month follow-up post, and final post.

We have also released a dataset for researchers to study their behaviors.

^* Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper). Thus you may have seen small referred to as 117M and medium referred to as 345M.

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.

For basic information, see our model card.

Some caveats

GPT-2 models' robustness and worst case behaviors are not well-understood. As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.
The dataset our GPT-2 models were trained on contains many texts with biases and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.
To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination. Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.

Work with us

Please let us know if you’re doing interesting research with or working on applications of GPT-2! We’re especially interested in hearing from and potentially working with those who are studying

Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)
The extent of problematic content (e.g. bias) being baked into the models and effective mitigations

Development

See DEVELOPERS.md

Contributors

See CONTRIBUTORS.md

Citation

Please use the following bibtex entry:

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Future work

We may release code for evaluating the models on various benchmarks.

We are still considering release of the larger models.

License

Modified MIT

gpt-2's People

Contributors

Stargazers

Watchers

Forkers

amanrana20 ender1598 wurde cuuupid awokeknowing ilopezfr minimaxir epicfaace mannykayy github30 filipebrigas virgulvirgul maraoz rtvt123 pvk444 wrmsr mistobaan marketmuse malcolmgreaves yutarochan fmorales tonydero charlesmartin14 lucaslingle ml-lab madisonmay sanealytics sullivan-sean thunn authman jbdatascience waleking codeaudit dimmu grehce peternara julianser dantodor yyht henghuiz-zz merajat atomdog xennygrimmato kiranvodrahalli yucoian sjakos ranguera fancyerii shunsunsun wugh alanlusun mbyase ymiyu lst4ever xtdx locussam doddle7456 jqlts1 deel-learn divfor awesome-archive droidashish zys0070 hateregistry tomarraj008 charlottesean duanzhihua matrix10 tkhan3 ideaplexus qoboty azuretech yngtodd slidersun ioanszilagyi prashant-lbsim hal2001 swetmelon machineiearning kaizeonwong yespon glebalshanskii hornsey hylihitic df595149790 parety norbertstrzelecki jgreenemi rishisinha iamtomcheng muximuxi ilineicry hoangcuong2011 ginking lee2015new asi-sx longde123 ivannz pokbe zuiwufenghua

gpt-2's Issues

Python 3.6 requirement must be explicitly stated

You need to use exactly 3.6.x since 3.6 has f-strings (which this repo uses) and 3.7 doesn't support TensorFlow.

Release The Full Model!

I understand your concerns but I still think it's better to release the full model now and let people poke at it's abilities and discover potential issues quicker.

generate_unconditional_samples.py Only return {text}

I used tensorflow_gpu (didn't have the patience to use the cpu)

No errors, but when I run generate_unconditional_samples.py with and without flags only return {text}

======================================== SAMPLE 1 ========================================
{text}
======================================== SAMPLE 2 ========================================
{text}
======================================== SAMPLE 3 ========================================
{text}
======================================== SAMPLE 4 ========================================
{text}
======================================== SAMPLE 5 ========================================
{text}
======================================== SAMPLE 6 ========================================
{text}

Can you please advise?

Is there a plan to open a Chinese model?

Any plans to release WebText corpus?

I've seen #16 and appreciate the valid concerns raised about releasing the model, but the WebText corpus could be a tremendous help to general research if you were able to release it.

Are there plans to do so?

I did wonder if this might simply enable people to recreate the unreleased GPT-2 but presumably this is no trivial matter, needing expertise and time/resources, thus deterring the causal mischief maker!

Anyway, whatever you end up doing, I wanted to thank you for what you have released already which is really interesting 🙂

License

Hi, is there a license associated with this/any plans towards MIT licensing this model?

Requirements on Arch Linux

These who use archlinux should not follow the instructions but try to
pacman -S python-regex tensorflow
aurman -S python-fire
After uninstall everything that will be install by pacman.
I did piping install the sample did nothing and exit

vocab.bpe contains an invalid unicode char and can't be read in Windows

The reader causes an exception on Windows 10 when reading the vocab.bpe file. I have tried with Unix endlines and Windows endlines.

The byte that fails has a value of 0x81 and causes an exception when running the interactive mode.

how to train it

Sampling code flags descriptions (support for --help?)

Is there a list of the flags for both conditional and unconditional models with their definitions?
(I looked in the blog and paper and couldn't find any mention.)

In particular, for reproducibility purposes, it'd be great to know the definition of temperature and top_k and how choosing different values for these affect the results.

Thanks!

Integrate a training feature to pass JSON markup convo data

Other alternatives to this software have allowed such a system to exist problem free, cakechat by Replika.ai for example; allows and encourages people to check out the 40GB reddit corpus you guys build GPT-2 from. I was getting highly similar results with altering their token model and marking up the JSON with emotion compared to what I am seeing with your currently released solution.

Examples of things more dangerous than your software that turned out safer than imagined...:

Example #1: Fusor.net provides detailed instructions on how to build a nuclear fusion reactor, 12 year olds in south america have even made them. Yet Bogota hasn't become Chernobyl.

Example #2: 3D printed weapons have been around for years and actually are far more safe / less accessible than normal weapons. Nobody has ever been harmed by one.

Example #3: The early days of Bitcoin, although they were accompanied by some awful things - didn't do anything as bad as what the Sinaloa cartel managed under El Chapo daily for decades unchecked.

I'll protest this until I get my own copy of this software. It's a human right for you guys to release your full, open and honest production software instead of choose profit and proprietary.

sh doesn't do anything on Windows 10

Hello, what operating system do the instructions apply to? sh doesn't do anything on Windows 10. How would I install this on Win10?

Also, is the first step to clone the repo? The instructions don't seem to make sense otherwise.

Thanks.

Can you make site/blog that publish content from GPT ?

It will intersting and entertainment everyone (maybe ??)

Installation question

Is the Docker installation an alternative to Native installation or are both needed?

Can't install sh_download_model.sh

Noob here (linguist, with rudimentary knowledge of computers) I've installed the gcloud sdk but I can't get the command: sh download_model.sh 117M to run. I get: 'sh' is not recognized as an internal or external command.
Any help would be greatly appreciated.

How to download on windows?

I'm using Windows 10, 64x. Could someone please explain to a novice how to download this?

Following the instructions, I downloaded gsutil and started a new configuration. Run:
sh download_model.sh 117M
But receive error:
download_model.sh: download_model.sh: No such file or directory
Tried both:
gsutil cp -r dir gs://gpt-2/models/117M
gsutil cp -r dir gs://gpt-2/models/models
But receive error:
AccessDeniedException: 403 [my gmail] does not have storage.objects.list access to gpt-2.
Tried the solution from loretoparisi at
https://github.com/loretoparisi/gpt-2/blob/master/download_model.sh
But I think I'm doing something wrong here. I downloaded Curl and Grep, then created a .bat file in Notepad++ with his script. Executed the file, but it only opens and closes.

Any help would be greatly appreciated.
Thanks,
Pete

Syntax error in sh_download_model.sh

When running "sh download_model.sh 117M", I am told that there is a syntax error on line 14:

'ownload_model.sh: line 14: syntax error near unexpected token do download_model.sh: line 14: for filename in checkpoint encoder.json hparams.jso' model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta vocab.bpe; do

Unfortunately, I don't know how to write shell scripts and can't troubleshoot this myself, but I don't see this error reported anywhere else.

Wrong Scaling in multiattention?

Hello,

Not quite sure if this is intended or not,
https://github.com/openai/gpt-2/blob/master/src/model.py#L94

You are scaling by rsqrt(dim value) where as it is usual to scale by rsqrt( dim key ).
Is this a typo or does it work better?

Thanks

encoder.json download error

The download_model.sh script can't download encoder.json from online repository. help me! thanks in advance.

Issue with gsutil download_model.sh

Hi,

I'm not familiar with gsutil. Installed it freshly using the 6 steps of :
https://cloud.google.com/storage/docs/gsutil_install

Upon running the script :

When I'm not logged in on cloud.

ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//checkpoint.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//encoder.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//hparams.json.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.data-00000-of-00001.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.index.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//model.ckpt.meta.
ServiceException: 401 Anonymous caller does not have storage.objects.get access to gpt-2/models//vocab.bpe.

When I'm logged in on cloud :


AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.

Thanks

I dont know why 'while_loop' only loop one time?

hello.
I dont know why 'tf.while_loop' only loop one time Nevertheless return value of condition is always invarient on False.

gpt-2/src/sample.py

Line 61 in 9c3a78d

return True

\\ issue in Windows

When running GitHub\gpt-2\src>generate_unconditional_samples.py
Getting this issue ? any -idea is it due to \ ?
FileNotFoundError: [Errno 2] No such file or directory: 'models\117M\encoder.json'

Fixing seed and/or setting top_k to 1 don't make sampling deterministic

模型依然没法下载

0 0 0 0 0 0 0 0 --:--:-- 0:01:15 --:--:-- 0curl: (7) Failed to connect to drive.google.com port 443: Operation timed out

你好，我们（**）这里还是没法下载。真心请把模型文件放到一个网盘。或者是否可以直接把模型发送到我的邮箱：[email protected]

感谢感谢啦。

Is it fine not use Softmax after forwarding?

@WuTheFWasThat

I found something different from the paper.
https://github.com/openai/gpt-2/blob/master/src/model.py#L171
It may be mabe some error when tf.multinomial(logits, num_samples=1, output_dtype=tf.int32) function
because there will be value less than zero
I try with softmax but it does not shows good result for text-generator..
This is result when use softmax function

Not using softmax is your original meant?

I added softmax before multinomial sampling when implement to Pytorch to avoid encountering probability entry < 0
https://github.com/graykode/gpt-2-Pytorch/blob/master/sample.py#L43

Is there a plan to support PyTorch?

Thanks for the awesome work, hope to support PyTorch.

Implementation about gpt-2-pytorch here

https://github.com/graykode/gpt-2-Pytorch

This is my code about Implementation of gpt-2-pytorch
But in first Commit about BackBone Code, This is not running because model can not load *.ckpt file in Pytorch. Also, I wait license on original gpt-2 repository !!
I hope it's good for pythorch developers.

PS) Thanks for @WuTheFWasThat for allowing

Try in Colaboratory

For a quick run:
https://colab.research.google.com/drive/1UfBDy-UkONgfdER21VNUCnsf63LGldFA

Generate conditional output based on input keywords

I think it would be nice if it would be possible to generate an output sentence based on input keywords, where the lenght of the sentence, or the input/output words ratio could be fine tuned. In this way we could create a higher system (motive) which could use this model to generate a guided conversation.

AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.

Hello!
I am unable to download the model. When doing: sh download_model.sh I get this error: AccessDeniedException: 403 [email protected] does not have storage.objects.list access to gpt-2.. Any advice?

Greetings!

Abstractive summarization

How can I create an abstractive summary of a text document?

Charmap error

File "src/generate_unconditional_samples.py", line 55, in
fire.Fire(sample_model)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 366, in _Fire
component, remaining_args)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\site-packages\fire\core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "src/generate_unconditional_samples.py", line 52, in sample_model
print(text)
File "C:\Users\UKumar\AppData\Local\Continuum\anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2015' in position 1410: character maps to

My CPU doesn't support Tensorflow AVX instructions

I was able to install all the requirements. However while generating samples, getting the following error. I have an Intel i3 First gen Processor and running Ubuntu 18.

2019-02-16 03:12:49.453982: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine.
Aborted (core dumped)

I then installed Tensorflow 1.5 (pip3 install tensorflow==1.5). The sample was generated, however another warning popped up as shown below. Will this affect the quality? Do I need to compile TensorFlow on my system?

2019-02-16 03:22:19.785441: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2

Window size error when trying to sample

I'm trying to sample from 117M using the following commands:

python3 src/generate_unconditional_samples.py
python3 src/generate_unconditional_samples.py | tee samples
python3 src/interactive_conditional_samples.py

And I get the following error:

File "src/interactive_conditional_samples.py", line 34
raise ValueError(f"can't get samples longer than window size: {hparams.n_ctx}")
^
SyntaxError: invalid syntax

Translation task

What was the format for translation task?
Do you provide sequence of pairs delimited by new lines, e.g. "sentence1 = translation_of_sentence1 \n sentence2 = translation_of_sentence2 \n ... \n testing_sentence = "?
Does the training dataset consist of similar format translations?

It is painful to install gsutil, why not wget/curl

As shown in title

SentencePiece

Why we not used SentencePiece for BPE (https://github.com/google/sentencepiece)?
Can you provide SentencePiece unigram model from your dataset?

[PROPOSAL] Allow remote testing of larger models

Your paper and this implementation could mark a significant evolution in UMLs.
Especially (for what I'm concerned) in CoQa & translation.
It would be nice to kick the tires of the full model even remotely, without having direct access to it.
Would you consider setting up -say- a MQTT server with some channels to be able to interact with a couple of functionalities?
Abuse of this service could be moderated by simply pruning out suspicious requests &&/|| by sending authorization to validated mail.

--R

Is it fine to implemente gpt2 on Pytorch?

Thanks for the awesome work! 👍
Can I try to implement gpt2 on Pytorch?
Of course, I'll write down the sources of the papers and codes.

Help doing transfer learning to generate spanish-language text?

Hi! Amazing results 😮
I know this is an open-ended and lazy question, but I'd appreciate if you could give me some pointers into how to re-train the model with additional text in another language (e.g spanish). I already have a small (6 MB) dataset in spanish, and I'm not very well versed in ML but I'm curious about playing with your model.
Thanks! I'll be sure to report results back if I somehow figure it out :)

-

Download.sh fails: + gsutil cp gs://gpt-2/models//model.ckpt.meta models/
AccessDeniedException: 403 [user] does not have storage.objects.list access to gpt-2.

Error when models folder does not exist

When models folder does not exist, I am not able to set up using the instructions provided:

gpt-2 $sh download_model.sh 117M
mkdir: models: No such file or directory

Which python should be used? 3.7.2 catches exception

Using 3.7.2 on any tutorial cmd

python3 src/interactive_conditional_samples.py --top_k 40
/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.6 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.7
return f(*args, **kwds)
2019-02-19 17:51:24.116332: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Traceback (most recent call last):
File "src/interactive_conditional_samples.py", line 68, in
fire.Fire(interact_model)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 127, in Fire
component_trace = _Fire(component, args, context, name)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 366, in _Fire
component, remaining_args)
File "/usr/local/lib/python3.7/site-packages/fire/core.py", line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File "src/interactive_conditional_samples.py", line 42, in interact_model
temperature=temperature, top_k=top_k
File "/Users/steinmacht/gpt-2/src/sample.py", line 76, in sample_sequence
back_prop=False,
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3291, in while_loop
return_same_structure)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3004, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2924, in _BuildLoop
c = ops.convert_to_tensor(pred(*packed_vars))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3259, in
math_ops.logical_and(i < maximum_iterations, orig_cond(*lv)))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4365, in logical_and
"LogicalAnd", x=x, y=y, name=name)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py", line 542, in make_tensor_proto
append_fn(tensor_proto, proto_values)
File "tensorflow/python/framework/fast_tensor_util.pyx", line 134, in tensorflow.python.framework.fast_tensor_util.AppendBoolArrayToTensorProto
File "/usr/local/lib/python3.7/site-packages/numpy/lib/type_check.py", line 547, in asscalar
return a.item()
UnboundLocalError: local variable 'a' referenced before assignmen

Powershell script to do the same than the download_model.sh does

I have had to manually download the model in Windows10, because even when using bash, the path is not exported. I can contribute a powershell script to download the model without using bash, using powershell. I don't see any reason of this being bad.

Would that be desirable?

Errors during model downloading

When I try to download the model on my Ubuntu Linux 14.04 LTS box I get the following errors from gsutil:

$  bash download_model.sh 117M
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.
No command was given.

Choose one of -b, -d, -e, or -r to do something.
Try `/usr/bin/gsutil --help' for more information.

Also, do you need a CUDA card just to run the model?

maxwoolf$ sudo python3 src/generate_unconditional_samples.py
Traceback (most recent call last):
  File "src/generate_unconditional_samples.py", line 9, in <module>
    from src import model, sample, encoder
ModuleNotFoundError: No module named 'src'

Python can't import from a subfolder unless it's on the Python path.

openai / gpt-2 Goto Github PK

gpt-2's Introduction

gpt-2

Usage

Some caveats

Work with us

Development

Contributors

Citation

Future work

License

gpt-2's People

Contributors

Stargazers

Watchers

Forkers

gpt-2's Issues

Recommend Projects

Recommend Topics

Recommend Org