Comments (10)
@codelessricky I just started a quick implementation to bring your own dataset: You can make a start from this branch: https://github.com/awslabs/djl/tree/model-upload/model-uploader to try it out. Please just be aware this is WIP and feel free to raise PR if you find anywhere wrong and add more feature.
from djl.
We generally try to handle the preprocessing within the dataset class. There is a helper called TextData
that we use to configure the dataset to know how to pre-process textual data, which should be pretty helpful to you. In fact, the TextDataset
is just a wrapper around RandomAccessDataset
that uses the TextData
. You may want to take a look at the StanfordMovieReview
dataset which is a good example of a text dataset. In summary, you probably want to inherit TextDataset
but could also directly use RandomAccessDataset
.
I think there may also be some confusion about Record
. Record doesn't return an NDArray
, but an NDList
. We use the NDList
as a tuple of arrays as well. So, your tuple might be something like Record(data=NDList(contextPreprocessed, questionPreprocessed), labels=NDList(answerPreprocessed, indices))
.
from djl.
Hi @zachgk , can you please assign this to me? I am interested in taking this up.
from djl.
Hi @zachgk , I am working on adding this dataset and understanding the general framework. I have one basic question about the dataset artifact's metadata.json. I see that the artifacts's uris are pinting to djl mlrepo ,so how are the artifacts added to this repo? Also, please also share some resources that I can refer to, if any. Thanks!
from djl.
@ghost, once you have files ready, we can help you upload to s3 bucket. @lanking520 can provide more detail regarding format of metadata.json.
Maybe use our slack channel is a better way to work with you.
from djl.
Thanks @frankfliu ! Will post any other clarifications on Slack! :-)
from djl.
Hi @zachgk, I'm interested in this issue and I want to work on it, so I wonder if you can assign it to me? Thanks!
from djl.
You've got it @WHALEEYE. Feel free to post here or reach out to me if you have any questions
from djl.
Hi @zachgk, I've encountered some problem while adding SQuAD to the project.
- At first I want to inherit the
TextDataset
class, but then I found that theget()
method inherited fromRandomAccessDataset
returns aRecord
object containing two NDArrays, while in SQuAD each record will contain a context, a question, a list of answers and a list of indexes (we referred to PyTorch), so this seems not fit the structure ofRecord
, so now I get a little confused about which class should I inherit. - Should I preprocess the string in the data, like embedding, while preparing the data (like TensorFlow), or just give user a record with the context and questions being raw strings (like PyTorch)?
Thanks for your answer!
from djl.
I think I've got it. Thanks for answering!
from djl.
Related Issues (20)
- Avoid copying bundled ressources HOT 2
- Android libraries are not up to date HOT 1
- Add get methods for HuggingFaceTokenizer fields HOT 9
- support paddlepaddle for macOS aarch64
- Could not find artifact ai.djl.pytorch:pytorch-native-cu118:jar:linux-x86_64:2.1.1 HOT 2
- DJL bench on GPU fails using PyTorch Engine HOT 5
- RuntimeError: The size of tensor a (16) must match the size of tensor b (80) at non-singleton dimension 3 HOT 1
- Hundreds wrong detection on yolov8 HOT 14
- SIGSEGV faults
- When I build a project supporting DJL in Android Studio, I encounter the following error. How can I resolve it? HOT 3
- Cannot use trained gluonTS model
- Adding ppc64le support for PyTorch engine HOT 4
- Please report location when HuggingFaceTokenizer.newInstance fails with I/O error HOT 11
- How to support paddlepaddle on an aarch64 FT Phytium S2500/64 machine?
- Run tensorflow model on arm architecture HOT 2
- Could not initialize class ai.djl.onnxruntime.engine.0rtNDManager HOT 4
- NDArrays.concat has a different behaviour as np.concatenate HOT 4
- Potential Race Condition or Garbage Collection Issue during JNI Environment Usage HOT 1
- NumberFormatException: Cannot parse null string when loading inside of Docker HOT 1
- Extracting local model ZIP multiple times HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from djl.