Giter Site home page Giter Site logo

fruitify's Introduction

Fruitify

Dependencies

pip3 install transformers
pip3 install pytorch-lightning

Objective

Monolingual Reverse Dictionary

Given a description of a fruit, have an English BERT predict the fruits that best match with the description (out of apple, banana, orange, grape and strawberry).

  • e.g.1: a red fruit of round shape -> apple / strawberry / orange / grape / banana
  • e.g.2: a yellow fruit of round shape -> orange / banana / apple / grape / strawberry

Unaligned Cross-lingual Reverse Dictionary

Given a description of a fruit in Korean, have an mBERT predict the fruits in English that best match with the Korean description.

  • e.g.1: 동그랗고 빨간 과일 -> apple / strawberry / orange / grape / banana
  • e.g.2: 동그랗고 노란 과일 -> orange / banana / apple / grape / strawberry

Note that we attempt to do so with exactly the same training dataset as is used for the monolingual one. This is to explore to what degree mBERT can compensate for unaligned data.

Implementation

We follow the same architecture as what is presented in BERT for Monolingual and Cross-Lingual Reverse Dictionary(Yan et al., 2020)

Examples

### desc: The fruit that monkeys love ###
0: ('banana', -10.440376281738281)
1: ('grape', -10.463106155395508)
2: ('strawberry', -10.712398529052734)
3: ('orange', -10.870870590209961)
4: ('apple', -11.218637466430664)
5: ('pineapple', -16.276872634887695)

fruitify's People

Contributors

eubinecto avatar teang1995 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

fruitify's Issues

build a dataset

find at least 5 definitions for:

  • apple
  • banana
  • strawberry
  • orange
  • grape

use the following authoritative dictionaries:

  • Oxford dict
  • Cambridge dict
  • Merriam Webster
  • Longman dict
  • Macmillan dict

최종 커리큘럼 짜기

Why?

나는 주체적인 교육자가 될 것이다. 나만이 가르칠 수 있는 그런 교육과정을 만들어보고 싶다.

How?

일단 #10 (comment) 여기에 sprint 1은 해결했다.
이제 계속, sprint 2, sprint 3, sprint 4를 달리며 더 추가해나가자.

커리큘럼

  • week 1
    • inverted index & TFIDF로 구현해보기 : 장점 파악 / 문제 인식 (semantic search 불가. 정의에 포함되지 않은 단어는 검색
  • week 2. Word2Vec로 구현해보기 : inverted index 대비 장점 파악 / 문제 인식 - averaging vectors to get a sentence vector의 단점.
  1. RNN & LSTM으로 구현해보기: Word2Vec 대비 장점 파악 / 문제 인식 - 데이터가 많이 필요함 & 그래도 여전히... 긴 문장은 힘들다.
  2. Traansformer 로 구현해보기: RNN & LSTM 대비 장점 파악 / 문제 인식 - 여전히 데이터가 많이 필요하다.
  3. BERT로 구현해보기: Transformer 대비 장점 파악 / 문제 인식 - ...BERT의 문제가 뭐더라?
  4. GPT3 & The future of NLP .... (e.g. few-shot learning )
  5. different tasks in NLP other than RD - generation, speech recognition (특히 이것도 인공지능이 필요하다는 것),
  6. Practical tips - tokenisation, lemmatization, stemming, etc
  7. 이어서... 더 채워야 하는 것들.

Implement CrossLingRD

해야할 것?

구조는 MonoLingRD 와 정확히 동일하다. 한가지 다른점은 사용하는 사전훈련 모델이 mbert_mlm이라는 것.

이걸 클래스를 따로 정의를 해야하나?

따로 정의를 해서, 어떤 bert를 요구하는지 확실하게 표기하도록 하자.

Implement MonoFruit

Todo

Implement MonoFruit defined in fruitify/models.py; Implement its three member methods:

  • fruitify
  • forward
  • training_step

Testing (pseudo)

python3 -m fruitify.scripts.train --fruit_type="mono" --k=5 --max_epochs=10

Support

Explore the scripts in fruitfiy/examples for:

  • exploring bert_mlm
  • exploring bert_tokenizer
  • exploring the fruit2def dataset
  • implementing cross entropy in Pytorch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.