Comments (1)
我们的数据处理流程是这样的:
-
先对原始数据集(比如ReDial和TG-ReDial)进行分词、实体链接等预处理工作,其中实体链接配备实体级别和词汇级别的知识图谱,处理得到的中间数据集存放在网盘上。
-
开发者在本地部署并运行CRSLab之后,会自动调用Download模块从云端下载中间数据集,并在本地进行进一步预处理。此时在Dataset中预处理得到的数据符合统一的规范。
-
上述得到统一格式的数据之后,将会交给DataLoader进行模型相关的数据处理和分发。
如果您希望将您的数据集适配到工具包中,需要完成如下工作:
-
完成分词工作。
-
如果需要用到知识图谱(比如KBRD、KGSF等模型),则需要进一步的实体链接工作。
-
构建适配您数据集的Dataset子类,并将数据处理成符合要求的格式。
后续工作将由目前的工具包自动完成。
补充:由于目前线下的工作流程(分词、实体链接等预处理工作)较为复杂,而它们又是进行人机交互(interact功能)必不可少的,我们正在考虑并计划将这部分工作纳入CRSLab;如果下一版本工作推进顺利,届时可直接下载原始数据集,并调用CRSLab提供的utility function即可完成数据预处理工作。
from crslab.
Related Issues (20)
- can'find redial_context_movie_id2crslab_entityId.json file in C2CRS HOT 1
- 关于使用TG-Redial模型在Redial数据集上训练出现的报错 HOT 3
- RuntimeError: Connection broken too many times. Stopped retrying when downloading cc.zh.300.zip HOT 1
- About the evaluation metric dist@K HOT 2
- OSError: libcudart.so.11.0: cannot open shared object file: No such file or directory HOT 1
- ReDial Recommender Results HOT 3
- Missing implementation for data preprocessing HOT 2
- Can't download redial data, file is not found. HOT 2
- 交互式如何使用啊? HOT 1
- 下载问题 HOT 1
- TGREDIAL数据集中的user2history文件里的"xxx/z":[yyyy,yyyyy,yyyyy,yyyyyyy,....] HOT 4
- How to manually check the testing result on each testing sample? HOT 1
- Unable to reproduce the results from paper using default config HOT 1
- 安装CRSLab报错
- python run_crslab.py --config config/crs/tgredial/tgredial.yaml --save_data --save_system
- NTRD HOT 4
- Is it normal for dataloader to run so slowly? HOT 3
- AssertionError: [ Checksum for cc.en.300.zip... HOT 1
- The range value of dist@k
- ntrd-redial
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from crslab.