Comments (11)
Thanks for the issue.
Torchtext needs to convert the string number to an int
or float
somewhere down the line and it currently doesn't do this. A quick fix would be to manually add a pipeline to the postprocessing
argument that converts everything in the TARGETS
field to int. With a slightly modified version of your code:
Edit: just noticed that your example uses doubles. changed my code accordingly
(tab separated file)
$ cat test.txt
1.1 test string
1.2 test string2
1.3 test string3
The following works on my machine in the meantime while we patch this:
In [1]: import torch
In [2]: from torchtext import data
In [3]: TEXT = data.Field(batch_first=True)
In [4]: TARGETS = data.Field(sequential=False, tensor_type=torch.DoubleTensor, batch_first=True, use_vocab=False, postprocessing=data.Pipeline(lambda x: float(x)))
In [5]: fields = [('targets', TARGETS), ('text', TEXT)]
In [6]: dataset = data.TabularDataset(path="test.txt", format="tsv", fields=fields)
In [7]: TEXT.build_vocab(dataset)
In [8]: train_iter = data.Iterator(dataset, batch_size=1, sort_key=lambda x: len(x.text), shuffle=True)
In [9]: batch = next(iter(train_iter))
In [10]: batch.targets
Out[10]:
Variable containing:
1.3000
[torch.cuda.DoubleTensor of size 1 (GPU 0)]
Hope that helps.
from text.
for me the above one, didn't work.
if anyone is still wondering,
change postprocessing=data.Pipeline(lambda x: float(x))
to preprocessing= lambda x: float(x)
that made it work for me (pytorch 0.4 and torchtext 0.2.3)
from text.
@greed2411 you don't even need the lambda. Field(use_vocab=False, preprocessing=float)
is enough.
edit: It seems to work for RawField
but not Field
. 😕
edit2: ah, forgot to set sequential=False
.
from text.
data.LabelField(dtype = torch.float, use_vocab=False, preprocessing=float)
does the trick as data.LabelField
already sets use_sequential=False
(and also removes <unk>
token)
from text.
Found use_vocab argument 😞
from text.
Even after setting use_vocab=False. I get
RuntimeError: already counted a million dimensions in a given sequence. Most likely your items are also sequences and there's no way to infer how many dimension should the tensor have
It is the same error that one gets when you try to do torch.DoubleTensor('1.2'). Is there something I am doing wrong?
from text.
Thanks for the solution @nelson-liu
from text.
could you leave this open for now --- there is a bug behind this that would be nice to track (the fact that we do not actually convert values with use_vocab=False
to numbers). Thanks!
from text.
Sure, I agree.
from text.
Yeah, I was originally imagining that values would be provided as Python numerical types -- but that isn't really consistent with the nature of the library as loading mostly text values. Certainly if it sees strings it should convert them!
from text.
If both my fields like target and source are sequences then also we get the same error , any idea on how to resolve this?
from text.
Related Issues (20)
- Insta Doxxxx HOT 1
- One of the three datasets returned by Multi30k seems to be bugged.
- Confusing docs for build_vocab_from_iterator
- how to run this code
- UTF-8 error with testing set of `torchtext.datasets.Multi30k(language_pair=("de", "en"))`. HOT 4
- Torch Text Transform Documentation Mismatch
- The Future of torchtext HOT 1
- BLEU_SCORE weird behaviour
- Fail to import torchtext KeyError: 'SP_DIR' HOT 1
- how to install libtorchtext for cpp project use? please give some operation .thanks
- Unable to download wikitext datasets HOT 4
- AttributeError: module 'torchtext' has no attribute 'legacy'
- # Liste von Namen und Alter personen = [ {"name": "Max", "alter": 30}, {"name": "Anna", "alter": 25}, {"name": "Lisa", "alter": 35} ] # Ausgabe der Liste for person in personen: print("Name:", person["name"]) print("Alter:", person["alter"]) print()
- [Release Blocking] TorchData is too old for PyTorch 2.3 HOT 1
- Remove SpaCy/NLTK as an optional dependency by creating our own tokenizer for a number of languages
- wikitext-2 is not available anymore HOT 1
- Why torchtext needs to reinstall torch
- [RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4 HOT 9
- PyTorch 2.4 is not supported by TorchText
- Wikitext-103 URL is down HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from text.