Comments (8)
The documentation is depreciated and needs to be updated. This is not a bug and the model should work without such an argument.
from gpt-neo.
The documentation is depreciated and needs to be updated. This is not a bug and the model should work without such an argument.
Does this mean you dont want people to file issues here anymore? Because that should be the first thing mentioned in the README if that's the case.
edit: apologies. this was far too salty.
from gpt-neo.
Something like:
The current README contains inaccuracies. The maintainers have moved on to getting GPU support in another repo. If you would like to run inference on this project, please know that this README contains information, code, snippets, etc. that may require you to understand the Python programming language in order to fix.
Or you could just fix the README
edit: Sorry, musta been in a bad mood this day.
from gpt-neo.
The documentation is depreciated and needs to be updated. This is not a bug and the model should work without such an argument.
Does this mean you dont want people to file issues here anymore? Because that should be the first thing mentioned in the README if that's the case.
No, please do continue to file issues here. We are currently working on fixing the README up. Frankly, we were taken by surprise by the surge of interest... we have been sitting on these models for two months because we didn't think they were big enough to excite people! Clearly we were wrong, and so we're scrambling a bit to get it up to snuff.
from gpt-neo.
@StellaAthena I personally was excited about this model since I was previously using gpt_2_simple
which could only fine-tune the 355M GPT-2 model in Colab for free. Being able to use a much larger model is exciting, although I have not (yet?) been able to reproduce the quality of results gpt_2_simple
was giving me.
from gpt-neo.
That sounds like something is going very wrong. Can you show me an example output? One user got the following
The team have made an interesting theory that the unicorns could possibly live among humans, either as a snack or via interactions with humans. This could be the reason why they have been able to communicate with humans in their strange language. The idea is not completely out of the question, as the tiny living quarters they live in are relatively small, and would be packed with food in a way that would force them to interact with humans without them wanting to eat them.
The fact that they the unicorns operated as a functional society with language would prove to be quite a feat considering that they also share a similar similar genetic structure with humans. The team theorized that they could use their language system to communicate with humans, after all, we do it all the time.
The Swiss scientist, Dr. Nikolaus Schmid, was also one of the researchers on the project. He talks about the possibility of aliens communicating with us and how they have intelligence that we do not have. He explained,
“They have consciousness that gives them a strange sense of humour, and I’m sure that they have all sorts of intellectual abilities. Unfortunately, we can only see a tiny fraction of the actual brain cells.”
Schmid also explained how the unicorns were able to communicate with each other and managed to survive it given that they had no cell phones but enhanced cellular technology. He said,
“I don’t believe that unicorns drive cars or recognize people by their name, they don’t get angry, they love children, and they have no nationality or religion.”
from gpt-neo.
@StellaAthena Inference from the base model works very well, I just had problem with fine-tuning. I tried changing the temperature for inference and experimented with different number of steps of fine-tuning, but the generated text is always either too repetitive, too random, or reproduces the training data verbatim. My data is only ~300kB so it may be prone to over-fitting? What parameters would you recommend to tune (for training/inference)? For GPT-2 (https://github.com/minimaxir/gpt-2-simple), I have used the 355M model, 1000 steps of fine-tuning (batch size 2, learning rate 1e-5) and inference with temperature 0.9, which gave pretty satisfactory results.
(I'm using it for my pet project. I'm training it on transcripts of Peppa Pig, and using the generated screenplays to make creepy YouTube cartoons: https://www.youtube.com/channel/UCeqPYAb-JZRqBvmcn3Fy3Rw)
from gpt-neo.
@JanPokorny i would expect those configs or similar ones to work well for our 1.3B model. If you are still having problems, please open a meeting issue.
from gpt-neo.
Related Issues (20)
- Exception: stream did not contain valid UTF-8
- Dataset preparation HOT 1
- Inferencing HOT 2
- Freeze Transformer Weight HOT 1
- Incosistent inference TPU vs GPU (huggingface) HOT 1
- Colab: Download of pre trained dataset not possible. the-eye.eu is offline HOT 1
- GPT3_1_3B configuration for a v3-32 TPU HOT 3
- the-eye.eu is down again, is there a mirror? HOT 6
- GPT-neo 350M weights? HOT 3
- Links in the readme to the-eye.eu don't work HOT 1
- Argument not a list with same length as devices
- The locally ran gpt-neo-2.7B is using CPU instead of GPU HOT 1
- Generation should allow user to specify max length of generated portion, rather than total HOT 5
- Not able to generate predicted text after `Done with copy master to slices.` with 1.3B pre-trained model
- The temperature at 0.0001 (or other arbitrarily small float) is still too high HOT 5
- The model should return just the generated text, not the prompt text + generated text. HOT 2
- TPU device does not support heartbeats.
- IndexError: index out of range in self HOT 1
- FYI:Japanese pre-trained gpt-neo implementation showcase by using PyTorch, Transformers, and Rust HOT 1
- Cannot Connect To Local TPU-VM HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gpt-neo.