Giter Site home page Giter Site logo

brightmart / text_classification Goto Github PK

View Code? Open in Web Editor NEW
7.7K 299.0 2.6K 14.46 MB

all kinds of text classification models and more with deep learning

License: MIT License

Python 94.28% Jupyter Notebook 5.72%
classification nlp fasttext textcnn textrnn tensorflow multi-label multi-class attention-mechanism text-classification

text_classification's Introduction

Text Classification

The purpose of this repository is to explore text classification methods in NLP with deep learning.

Update:

Customize an NLP API in three minutes, for free: NLP API Demo

Language Understanding Evaluation benchmark for Chinese(CLUE benchmark): run 10 tasks & 9 baselines with one line of code, performance comparision with details.

Releasing Pre-trained Model of ALBERT_Chinese Training with 30G+ Raw Chinese Corpus, xxlarge, xlarge and more, Target to match State of the Art performance in Chinese, 2019-Oct-7, During the National Day of China!

Large Amount of Chinese Corpus for NLP Available!

Google's BERT achieved new state of art result on more than 10 tasks in NLP using pre-train in language model then

fine-tuning. Pre-train TexCNN: idea from BERT for language understanding with running code and data set

Introduction

it has all kinds of baseline models for text classification.

it also support for multi-label classification where multi labels associate with an sentence or document.

although many of these models are simple, and may not get you to top level of the task. but some of these models are very

classic, so they may be good to serve as baseline models. each model has a test function under model class. you can run

it to performance toy task first. the model is independent from data set.

check here for formal report of large scale multi-label text classification with deep learning

several models here can also be used for modelling question answering (with or without context), or to do sequences generating.

we explore two seq2seq model(seq2seq with attention,transformer-attention is all you need) to do text classification.

and these two models can also be used for sequences generating and other tasks. if your task is a multi-label classification,

you can cast the problem to sequences generating.

we implement two memory network. one is dynamic memory network. previously it reached state of art in question

answering, sentiment analysis and sequence generating tasks. it is so called one model to do several different tasks,

and reach high performance. it has four modules. the key component is episodic memory module. it use gate mechanism to

performance attention, and use gated-gru to update episode memory, then it has another gru( in a vertical direction) to

performance hidden state update. it has ability to do transitive inference.

the second memory network we implemented is recurrent entity network: tracking state of the world. it has blocks of

key-value pairs as memory, run in parallel, which achieve new state of art. it can be used for modelling question

answering with contexts(or history). for example, you can let the model to read some sentences(as context), and ask a

question(as query), then ask the model to predict an answer; if you feed story same as query, then it can do

classification task.

To discuss ML/DL/NLP problems and get tech support from each other, you can join QQ group: 836811304

Models:

  1. fastText

  2. TextCNN

  3. Bert:Pre-training of Deep Bidirectional Transformers for Language Understanding

  4. TextRNN

  5. RCNN

  6. Hierarchical Attention Network

  7. seq2seq with attention

  8. Transformer("Attend Is All You Need")

  9. Dynamic Memory Network

  10. EntityNetwork:tracking state of the world

  11. Ensemble models

  12. Boosting:

    for a single model, stack identical models together. each layer is a model. the result will be based on logits added together. the only connection between layers are label's weights. the front layer's prediction error rate of each label will become weight for the next layers. those labels with high error rate will have big weight. so later layer's will pay more attention to those mis-predicted labels, and try to fix previous mistake of former layer. as a result, we will get a much strong model. check a00_boosting/boosting.py

and other models:

  1. BiLstmTextRelation;

  2. twoCNNTextRelation;

  3. BiLstmTextRelationTwoRNN

Performance

(mulit-label label prediction task,ask to prediction top5, 3 million training data,full score:0.5)

Model fastText TextCNN TextRNN RCNN HierAtteNet Seq2seqAttn EntityNet DynamicMemory Transformer
Score 0.362 0.405 0.358 0.395 0.398 0.322 0.400 0.392 0.322
Training 10m 2h 10h 2h 2h 3h 3h 5h 7h

Bert model achieves 0.368 after first 9 epoch from validation set.

Ensemble of TextCNN,EntityNet,DynamicMemory: 0.411

Ensemble EntityNet,DynamicMemory: 0.403


Notice:

m stand for minutes; h stand for hours;

HierAtteNet means Hierarchical Attention Networkk;

Seq2seqAttn means Seq2seq with attention;

DynamicMemory means DynamicMemoryNetwork;

Transformer stand for model from 'Attention Is All You Need'.

Usage:

  1. model is in xxx_model.py
  2. run python xxx_train.py to train the model
  3. run python xxx_predict.py to do inference(test).

Each model has a test method under the model class. you can run the test method first to check whether the model can work properly.


Environment:

python 2.7+ tensorflow 1.8

(tensorflow 1.1 to 1.13 should also works; most of models should also work fine in other tensorflow version, since we

use very few features bond to certain version.

if you use python3, it will be fine as long as you change print/try catch function in case you meet any error.

TextCNN model is already transfomed to python 3.6

Sample data: cached file of baidu or Google Drive:send me an email

to help you run this repository, currently we re-generate training/validation/test data and vocabulary/labels, and saved

them as cache file using h5py. we suggest you to download it from above link.

it contain everything you need to run this repository: data is pre-processed, you can start to train the model in a minute.

it's a zip file about 1.8G, contains 3 million training data. although after unzip it's quite big, but with the help of

hdf5, it only need a normal size of memory of computer(e.g.8 G or less) during training.

we use jupyter notebook: pre-processing.ipynb to pre-process data. you can have a better understanding of this task and

data by taking a look of it. you can also generate data by yourself in the way your want, just change few lines of code

using this jupyter notebook.

If you want to try a model now, you can dowload cached file from above, then go to folder 'a02_TextCNN', run

 python  p7_TextCNN_train.py 

it will use data from cached files to train the model, and print loss and F1 score periodically.

old sample data source: if you need some sample data and word embedding per-trained on word2vec, you can find it in closed issues, such as: issue 3.

you can also find some sample data at folder "data". it contains two files:'sample_single_label.txt', contains 50k data

with single label; 'sample_multiple_label.txt', contains 20k data with multiple labels. input and label of is separate by " label".

if you want to know more detail about data set of text classification or task these models can be used, one of choose is below:

https://biendata.com/competition/zhihu/

Road Map

One way you can use this repository:

step 1: you can read through this article. you will get a general idea of various classic models used to do text classification.

step 2: pre-process data and/or download cached file.

  a. take a look a look of jupyter notebook('pre-processing.ipynb'), where you can familiar with this text 

       classification task and data set. you will also know how we pre-process data and generate training/validation/test 
       
       set. there are a list of things you can try at the end of this jupyter.

   b. download zip file that contains cached files, so you will have all necessary data, and can start to train models.

step 3: run some of models list here, and change some codes and configurations as you want, to get a good performance.

  record performances, and things you done that works, and things that are not.

  for example, you can take this sequence to explore: 
  
  1) fasttext---> 2)TextCNN---> 3)Transformer---> 4)BERT

additionally, write your article about this topic, you can follow paper's style to write. you may need to read some papers

   on the way, many of these papers list in the # Reference at the end of this article; or join  a machine learning 
   
   competition, and apply it with what you've learned. 

Use Your Own Data:

replace data in 'data/sample_multiple_label.txt', and make sure format as below:

'word1 word2 word3 __label__l1 __label__l2 __label__l3'

where part1: 'word1 word2 word3' is input(X), part2: '__label__l1 __label__l2 __label__l3'

representing there are three labels: [l1,l2,l3]. between part1 and part2 there should be a empty string: ' '.

for example: each line (multiple labels) like:

'w5466 w138990 w1638 w4301 w6 w470 w202 c1834 c1400 c134 c57 c73 c699 c317 c184 __label__5626661657638885119 __label__4921793805334628695 __label__8904735555009151318'

where '5626661657638885119','4921793805334628695',‘8904735555009151318’ are three labels associate with this input string 'w5466 w138990...c699 c317 c184'

Notice:

Some util function is in data_util.py; check load_data_multilabel() of data_util for how process input and labels from raw data.

there is a function to load and assign pretrained word embedding to the model,where word embedding is pretrained in word2vec or fastText.

Pretrain Work Embedding:

if word2vec.load not works, you may load pretrained word embedding, especially for chinese word embedding use following lines:

import gensim

from gensim.models import KeyedVectors

word2vec_model = KeyedVectors.load_word2vec_format(word2vec_model_path, binary=True, unicode_errors='ignore') #

or you can turn off use pretrain word embedding flag to false to disable loading word embedding.

Models Detail:

1.fastText:

implmentation of Bag of Tricks for Efficient Text Classification

after embed each word in the sentence, this word representations are then averaged into a text representation, which is in turn fed to a linear classifier.it use softmax function to compute the probability distribution over the predefined classes. then cross entropy is used to compute loss. bag of word representation does not consider word order. in order to take account of word order, n-gram features is used to capture some partial information about the local word order; when the number of classes is large, computing the linear classifier is computational expensive. so it usehierarchical softmax to speed training process.

  1. use bi-gram and/or tri-gram
  2. use NCE loss to speed us softmax computation(not use hierarchy softmax as original paper)

result: performance is as good as paper, speed also very fast.

check: p5_fastTextB_model.py

alt text

2.TextCNN:

Implementation of Convolutional Neural Networks for Sentence Classification

Structure:embedding--->conv--->max pooling--->fully connected layer-------->softmax

Check: p7_TextCNN_model.py

In order to get very good result with TextCNN, you also need to read carefully about this paper A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification: it give you some insights of things that can affect performance. although you need to change some settings according to your specific task.

Convolutional Neural Network is main building box for solve problems of computer vision. Now we will show how CNN can be used for NLP, in in particular, text classification. Sentence length will be different from one to another. So we will use pad to get fixed length, n. For each token in the sentence, we will use word embedding to get a fixed dimension vector, d. So our input is a 2-dimension matrix:(n,d). This is similar with image for CNN.

Firstly, we will do convolutional operation to our input. It is a element-wise multiply between filter and part of input. We use k number of filters, each filter size is a 2-dimension matrix (f,d). Now the output will be k number of lists. Each list has a length of n-f+1. each element is a scalar. Notice that the second dimension will be always the dimension of word embedding. We are using different size of filters to get rich features from text inputs. And this is something similar with n-gram features.

Secondly, we will do max pooling for the output of convolutional operation. For k number of lists, we will get k number of scalars.

Thirdly, we will concatenate scalars to form final features. It is a fixed-size vector. And it is independent from the size of filters we use.

Finally, we will use linear layer to project these features to per-defined labels.

alt text


3.BERT:

Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT currently achieve state of art results on more than 10 NLP tasks. the key ideas behind this model is that we can

pre-train the model by using one kind of language model with huge amount of raw data, where you can find it easily.

as most of parameters of the model is pre-trained, only last layer for classifier need to be need for different tasks.

as a result, this model is generic and very powerful. you can just fine-tuning based on the pre-trained model within

a short period of time.

however, this model is quite big. with sequence length 128, you may only able to train with a batch size of 32; for long

document such as sequence length 512, it can only train a batch size 4 for a normal GPU(with 11G); and very few people

can pre-train this model from scratch, as it takes many days or weeks to train, and a normal GPU's memory is too small

for this model.

Specially, the backbone model is Transformer, where you can find it in Attention Is All You Need. it use two kind of

tasks to pre-train the model.

Masked Languge Model

generally speaking, given a sentence, some percentage of words are masked, you will need to predict the masked words

based on this masked sentence. masked words are chosed randomly.

we feed the input through a deep Transformer encoder and then use the final hidden states corresponding to the masked

positions to predict what word was masked, exactly like we would train a language model.

source_file each line is a sequence of token, can be a sentence.

Input Sequence  : The man went to [MASK] store with [MASK] dog
Target Sequence :                  the                his

Next Sentence Prediction

many language understanding task, like question answering, inference, need understand relationship

between sentence. however, language model is only able to understand without a sentence. next sentence

prediction is a sample task to help model understand better in these kinds of task.

50% of chance the second sentence is tbe next sentence of the first one, 50% of not the next one.

given two sentence, the model is asked to predict whether the second sentence is real next sentence of

the first one.

Input : [CLS] the man went to the store [SEP] he bought a gallon of milk [SEP]
Label : IsNext

Input = [CLS] the man heading to the store [SEP] penguin [MASK] are flight ##less birds [SEP]
Label = NotNext

How to use BERT?

basically, you can download pre-trained model, can just fine-tuning on your task with your own data.

for classification task, you can add processor to define the format you want to let input and labels from source data.

Use BERT for multi-label classification?

run the following command under folder a00_Bert:

  python  train_bert_multi-label.py

It achieve 0.368 after 9 epoch. or you can run multi-label classification with downloadable data using BERT from

sentiment_analysis_fine_grain with BERT

Use BERT for online prediction

you can use session and feed style to restore model and feed data, then get logits to make a online prediction.

online prediction with BERT

originally, it train or evaluate model based on file, not for online.

How to get better model for BERT?

firstly, you can use pre-trained model download from google. run a few epoch on you dataset, and find a suitable

sequence length.

secondly, you can pre-train the base model in your own data as long as you can find a dataset that is related to

your task, then fine-tuning on your specific task.

thirdly, you can change loss function and last layer to better suit for your task.

additionally, you can add define some pre-trained tasks that will help the model understand your task much better.

as experienced we got from experiments, pre-trained task is independent from model and pre-train is not limit to

the tasks above.


4.TextRNN

Structure v1:embedding--->bi-directional lstm--->concat output--->average----->softmax layer

check: p8_TextRNN_model.py

alt text

Structure v2:embedding-->bi-directional lstm---->dropout-->concat ouput--->lstm--->droput-->FC layer-->softmax layer

check: p8_TextRNN_model_multilayer.py

alt text


5.BiLstmTextRelation

Structure same as TextRNN. but input is special designed. e.g.input:"how much is the computer? EOS price of laptop". where 'EOS' is a special token spilted question1 and question2.

check:p9_BiLstmTextRelation_model.py


6.twoCNNTextRelation

Structure: first use two different convolutional to extract feature of two sentences. then concat two features. use linear transform layer to out projection to target label, then softmax.

check: p9_twoCNNTextRelation_model.py


7.BiLstmTextRelationTwoRNN

Structure: one bi-directional lstm for one sentence(get output1), another bi-directional lstm for another sentence(get output2). then: softmax(output1Moutput2)

check:p9_BiLstmTextRelationTwoRNN_model.py

for more detail you can go to: Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow


8.RCNN:

Recurrent convolutional neural network for text classification

implementation of Recurrent Convolutional Neural Network for Text Classification

structure:1)recurrent structure (convolutional layer) 2)max pooling 3) fully connected layer+softmax

it learn represenation of each word in the sentence or document with left side context and right side context:

representation current word=[left_side_context_vector,current_word_embedding,right_side_context_vecotor].

for left side context, it use a recurrent structure, a no-linearity transfrom of previous word and left side previous context; similarly to right side context.

check: p71_TextRCNN_model.py

alt text


9.Hierarchical Attention Network:

Implementation of Hierarchical Attention Networks for Document Classification

Structure:

  1. embedding

  2. Word Encoder: word level bi-directional GRU to get rich representation of words

  3. Word Attention:word level attention to get important information in a sentence

  4. Sentence Encoder: sentence level bi-directional GRU to get rich representation of sentences

  5. Sentence Attetion: sentence level attention to get important sentence among sentences

  6. FC+Softmax

alt text

In NLP, text classification can be done for single sentence, but it can also be used for multiple sentences. we may call it document classification. Words are form to sentence. And sentence are form to document. In this circumstance, there may exists a intrinsic structure. So how can we model this kinds of task? Does all parts of document are equally relevant? And how we determine which part are more important than another?

It has two unique features:

1)it has a hierarchical structure that reflect the hierarchical structure of documents;

2)it has two levels of attention mechanisms used at the word and sentence-level. it enable the model to capture important information in different levels.

Word Encoder: For each words in a sentence, it is embedded into word vector in distribution vector space. It use a bidirectional GRU to encode the sentence. By concatenate vector from two direction, it now can form a representation of the sentence, which also capture contextual information.

Word Attention: Same words are more important than another for the sentence. So attention mechanism is used. It first use one layer MLP to get uit hidden representation of the sentence, then measure the importance of the word as the similarity of uit with a word level context vector uw and get a normalized importance through a softmax function.

Sentence Encoder: for sentence vectors, bidirectional GRU is used to encode it. Similarly to word encoder.

Sentence Attention: sentence level vector is used to measure importance among sentences. Similarly to word attention.

Input of data:

Generally speaking, input of this model should have serveral sentences instead of sinle sentence. shape is:[None,sentence_lenght]. where None means the batch_size.

In my training data, for each example, i have four parts. each part has same length. i concat four parts to form one single sentence. the model will split the sentence into four parts, to form a tensor with shape:[None,num_sentence,sentence_length]. where num_sentence is number of sentences(equal to 4, in my setting).

check:p1_HierarchicalAttention_model.py

for attentive attention you can check attentive attention


10.Seq2seq with attention

Implementation seq2seq with attention derived from NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE

I.Structure:

1)embedding 2)bi-GRU too get rich representation from source sentences(forward & backward). 3)decoder with attention.

alt text

II.Input of data:

there are two kinds of three kinds of inputs:1)encoder inputs, which is a sentence; 2)decoder inputs, it is labels list with fixed length;3)target labels, it is also a list of labels.

for example, labels is:"L1 L2 L3 L4", then decoder inputs will be:[_GO,L1,L2,L2,L3,_PAD]; target label will be:[L1,L2,L3,L3,_END,_PAD]. length is fixed to 6, any exceed labels will be trancated, will pad if label is not enough to fill.

III.Attention Mechanism:

  1. transfer encoder input list and hidden state of decoder

  2. calculate similarity of hidden state with each encoder input, to get possibility distribution for each encoder input.

  3. weighted sum of encoder input based on possibility distribution.

    go though RNN Cell using this weight sum together with decoder input to get new hidden state

IV.How Vanilla Encoder Decoder Works:

the source sentence will be encoded using RNN as fixed size vector ("thought vector"). then during decoder:

  1. when it is training, another RNN will be used to try to get a word by using this "thought vector" as init state, and take input from decoder input at each timestamp. decoder start from special token "_GO". after one step is performanced, new hidden state will be get and together with new input, we can continue this process until we reach to a special token "_END". we can calculate loss by compute cross entropy loss of logits and target label. logits is get through a projection layer for the hidden state(for output of decoder step(in GRU we can just use hidden states from decoder as output).

  2. when it is testing, there is no label. so we should feed the output we get from previous timestamp, and continue the process util we reached "_END" TOKEN.

V.Notices:

  1. here i use two kinds of vocabularies. one is from words,used by encoder; another is for labels,used by decoder

  2. for vocabulary of lables, i insert three special token:"_GO","_END","_PAD"; "_UNK" is not used, since all labels is pre-defined.


11.Transformer("Attention Is All You Need")

Status: it was able to do task classification. and able to generate reverse order of its sequences in toy task. you can check it by running test function in the model. check: a2_train_classification.py(train) or a2_transformer_classification.py(model)

we do it in parallell style.layer normalization,residual connection, and mask are also used in the model.

For every building blocks, we include a test function in the each file below, and we've test each small piece successfully.

Sequence to sequence with attention is a typical model to solve sequence generation problem, such as translate, dialogue system. most of time, it use RNN as buidling block to do these tasks. util recently, people also apply convolutional Neural Network for sequence to sequence problem. Transformer, however, it perform these tasks solely on attention mechansim. it is fast and achieve new state-of-art result.

alt text

It also has two main parts: encoder and decoder. below is desc from paper:

Encoder:

6 layers.each layers has two sub-layers. the first is multi-head self-attention mechanism; the second is position-wise fully connected feed-forward network. for each sublayer. use LayerNorm(x+Sublayer(x)). all dimension=512.

Decoder:

  1. The decoder is composed of a stack of N= 6 identical layers.
  2. In addition to the two sub-layers in each encoder layer, the decoder inserts a third sub-layer, which performs multi-head attention over the output of the encoder stack.
  3. Similar to the encoder, we employ residual connections around each of the sub-layers, followed by layer normalization. We also modify the self-attention sub-layer in the decoder stack to prevent positions from attending to subsequent positions. This masking, combined with fact that the output embeddings are offset by one position, ensures that the predictions for position i can depend only on the known outputs at positions less than i.

Main Take away from this model:

  1. multi-head self attention: use self attention, linear transform multi-times to get projection of key-values, then do ordinary attention; 2) some tricks to improve performance(residual connection,position encoding, poistion feed forward, label smooth, mask to ignore things we want to ignore).

Use this model to do task classification:

Here we only use encode part for task classification, removed resdiual connection, used only 1 layer.no need to use mask. we use multi-head attention and postionwise feed forward to extract features of input sentence, then use linear layer to project it to get logits.

for detail of the model, please check: a2_transformer_classification.py


12.Recurrent Entity Network

Input:1. story: it is multi-sentences, as context. 2.query: a sentence, which is a question, 3. ansewr: a single label.

Model Structure:

  1. Input encoding: use bag of word to encode story(context) and query(question); take account of position by using position mask

    by using bi-directional rnn to encode story and query, performance boost from 0.392 to 0.398, increase 1.5%.

  2. Dynamic memory:

a. compute gate by using 'similarity' of keys,values with input of story.

b. get candidate hidden state by transform each key,value and input.

c. combine gate and candidate hidden state to update current hidden state.

  1. Output moudle( use attention mechanism): a. to get possibility distribution by computing 'similarity' of query and hidden state

b. get weighted sum of hidden state using possibility distribution.

c. non-linearity transform of query and hidden state to get predict label.

alt text

Main take away from this model:

  1. use blocks of keys and values, which is independent from each other. so it can be run in parallel.

  2. modelling context and question together. use memory to track state of world; and use non-linearity transform of hidden state and question(query) to make a prediction.

  3. simple model can also achieve very good performance. simple encode as use bag of word.

for detail of the model, please check: a3_entity_network.py

under this model, it has a test function, which ask this model to count numbers both for story(context) and query(question). but weights of story is smaller than query.


13.Dynamic Memory Network

Outlook of Model:

1.Input Module: encode raw texts into vector representation

2.Question Module: encode question into vector representation

3.Episodic Memory Module: with inputs,it chooses which parts of inputs to focus on through the attention mechanism, taking into account of question and previous memory====>it poduce a 'memory' vecotr.

4.Answer Module:generate an answer from the final memory vector.

alt text

Detail:

1.Input Module:

a.single sentence: use gru to get hidden state b.list of sentences: use gru to get the hidden states for each sentence. e.g. [hidden states 1,hidden states 2, hidden states...,hidden state n]

2.Question Module: use gru to get hidden state

3.Episodic Memory Module:

use an attention mechanism and recurrent network to updates its memory.

a. gate as attention mechanism:

 two-layer feed forward nueral network.input is candidate fact c,previous memory m and question q. features get by take: element-wise,matmul and absolute distance of q with c, and q with m.

b.memory update mechanism: take candidate sentence, gate and previous hidden state, it use gated-gru to update hidden state. like: h=f(c,h_previous,g). the final hidden state is the input for answer module.

c.need for multiple episodes===>transitive inference.

e.g. ask where is the football? it will attend to sentence of "john put down the football"), then in second pass, it need to attend location of john.

4.Answer Module: take the final epsoidic memory, question, it update hidden state of answer module.

TODO

1.Character-level Convolutional Networks for Text Classification

2.Convolutional Neural Networks for Text Categorization:Shallow Word-level vs. Deep Character-level

3.Very Deep Convolutional Networks for Text Classification

4.Adversarial Training Methods For Semi-supervised Text Classification

5.Ensemble Models

Conclusion:

During the process of doing large scale of multi-label classification, serveral lessons has been learned, and some list as below:

  1. What is most important thing to reach a high accuracy? It depend the task you are doing. From the task we conducted here, we believe that ensemble models based on models trained from multiple features including word, character for title and description can help to reach very high accuarcy; However, in some cases,as just alphaGo Zero demonstrated, algorithm is more important then data or computational power, in fact alphaGo Zero did not use any humam data.

  2. Is there a ceiling for any specific model or algorithm? The answer is yes. lots of different models were used here, we found many models have similar performances, even though there are quite different in structure. In some extent, the difference of performance is not so big.

  3. Is case study of error useful? I think it is quite useful especially when you have done many different things, but reached a limit. For example, by doing case study, you can find labels that models can make correct prediction, and where they make mistakes. And to imporove performance by increasing weights of these wrong predicted labels or finding potential errors from data.

  4. How can we become expert in a specific of Machine Learning? In my opinion,join a machine learning competation or begin a task with lots of data, then read papers and implement some, is a good starting point. So we will have some really experience and ideas of handling specific task, and know the challenges of it. But what's more important is that we should not only follow ideas from papers, but to explore some new ideas we think may help to slove the problem. For example, by changing structures of classic models or even invent some new structures, we may able to tackle the problem in a much better way as it may more suitable for task we are doing.

Reference:

1.Bag of Tricks for Efficient Text Classification

2.Convolutional Neural Networks for Sentence Classification

3.A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

4.Deep Learning for Chatbots, Part 2 – Implementing a Retrieval-Based Model in Tensorflow, from www.wildml.com

5.Recurrent Convolutional Neural Network for Text Classification

6.Hierarchical Attention Networks for Document Classification

7.Neural Machine Translation by Jointly Learning to Align and Translate

8.Attention Is All You Need

9.Ask Me Anything:Dynamic Memory Networks for Natural Language Processing

10.Tracking the state of world with recurrent entity networks

11.Ensemble Selection from Libraries of Models

12.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding

13.google-research/bert


to be continued. for any problem, concat [email protected]

text_classification's People

Contributors

bikramkhastgir avatar brightmart avatar iofu728 avatar jannisborn avatar jason-cooke avatar maldil avatar phonism avatar rluvaton avatar schneiderl avatar yzy5630 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text_classification's Issues

Do we need a mask tensor for averaging

1.get emebedding of words in the sentence

    sentence_embeddings = tf.nn.embedding_lookup(self.Embedding,self.sentence)  #  [None,self.sentence_len,self.embed_size]

2.average vectors, to get representation of the sentence

    self.sentence_embeddings = tf.reduce_mean(sentence_embeddings, axis=1)  # [None,self.embed_size]

Since the length of sentences is variable, do we need a mask tensor for geting the average?
For example, tf.reduce_sum(tf.multiply(sentence_embeddings,mask), axis=1) / tf.reduce_sum(mask, axis=1)

TextRNN model details

Hello.

Is there any chance to get some reference to papers (or any other documents) to the TextRNN model?

Thanks in advance.

数据格式

你好,我想请问一下怎么把我自己的英文语料库的格式改成和你的数据一样的格式?

Is it possible to implement Hierarchical Attention Network with parsing real sentences?

Thank you a lot for your sharing.

I find that in your implementation of Hierarchical Attention Network (HAN), the sentences are separated through setting an equal sentence length. This is however not the true sentence length in the data.

I wonder if it is easy to change this to using a sentence parser to find the sentences? How would be the difference in performance?

Please kindly let me know if you have any idea on parsing the real sentences based on your HAN code. Many thanks!

the last ouput of Bi-RNN in TextRNN

self.output_rnn_last=tf.reduce_mean(output_rnn,axis=1) #[batch_size,hidden_size*2] #output_rnn_last=output_rnn[:,-1,:] ##[batch_size,hidden_size*2] #TODO

In the implementation, the final outputs of Bi-RNN are calculated as the reduce mean among all time stamps. Compared with output_rnn_last=output_rnn[:,-1,:], what is the difference between these two strategies on the impact of the final classification results?

hyperparameter in textRNN

def loss(self,l2_lambda=0.0001):
with tf.name_scope("loss"):
#input: `logits` and `labels` must have the same shape `[batch_size, num_classes]`
#output: A 1-D `Tensor` of length `batch_size` of the same type as `logits` with the softmax cross entropy loss.
losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits);#sigmoid_cross_entropy_with_logits.#losses=tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y,logits=self.logits)
#print("1.sparse_softmax_cross_entropy_with_logits.losses:",losses) # shape=(?,)
loss=tf.reduce_mean(losses)#print("2.loss.loss:", loss) #shape=()
l2_losses = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'bias' not in v.name]) * l2_lambda
loss=loss+l2_losses
return loss

  • the final loss is defined as the sum of cross_entropy_loss and L2-loss to penalty large variables, where a hyperparameter l2_lambda is used to balance these two items. In the algorithm, as the hyperparameter l2_lambda is directly set to 0.0001, would it be too small(or too large) under some circumstances such that one in the two items loses its contribution to the whole loss?
  • in general, is there any pratical methods to guide me to set values of hyperparameters, e.g. l2_lambda
    @brightmart

fastText cannot find p4_zhihu_load_data module

ModuleNotFoundError Traceback (most recent call last)
in ()
8 import numpy as np
9 from p5_fastTextB_model import fastTextB as fastText
---> 10 from p4_zhihu_load_data import load_data,create_voabulary,create_voabulary_label
11 from tflearn.data_utils import to_categorical, pad_sequences
12 import os

ModuleNotFoundError: No module named 'p4_zhihu_load_data'

您好,每次我运行过CNN的train,接着运行predict时出现问题

File "/home/wt/桌面/class/text_classification-master/a02_TextCNN/other_experiement/data_util_zhihu.py", line 28, in create_voabulary
model=word2vec.load(word2vec_model_path,kind='bin')
File "/usr/local/lib/python3.5/dist-packages/word2vec/io.py", line 18, in load
return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/word2vec/wordvectors.py", line 185, in from_binary
with open(fname, 'rb') as fin:
FileNotFoundError: [Errno 2] No such file or directory: 'zhihu-word2vec.bin-100'

not found data file

hdf5 is not supported on this machine (please install/reinstall h5py for optimal experience)
('cache_path:', 'cache_vocabulary_label_pik/_word_voabulary.pik', 'file_exists:', False)
('create vocabulary. word2vec_model_path:', 'zhihu-word2vec-title-desc.bin-100')
Traceback (most recent call last):
  File "p5_fastTextB_train.py", line 161, in <module>
    tf.app.run()
  File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "p5_fastTextB_train.py", line 44, in main
    vocabulary_word2index, vocabulary_index2word = create_voabulary()
  File "../aa1_data_util/data_util_zhihu.py", line 26, in create_voabulary
    model=word2vec.load(word2vec_model_path,kind='bin')
  File "/usr/lib64/python2.7/site-packages/word2vec/io.py", line 18, in load
    return word2vec.WordVectors.from_binary(fname, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/word2vec/wordvectors.py", line 154, in from_binary
    with open(fname, 'rb') as fin:
IOError: [Errno 2] No such file or directory: 'zhihu-word2vec-title-desc.bin-100'

please help.

attentive attention of hierarchical attention network

it seems that the way you implement of attention mechanism is different from original paper, can you give more ideas?

不好意思,读了你的HAN_model.py代码感觉你的代码不太完整,缺少了textRNN.accuracy, textRNN.predictions, textRNN.W_projection这些部分。而且textRNN.input_y:没有定义。还有Attention求权重的方法好像和论文原著不太一样,论文中好像接入了个softmax在和隐藏层相乘累加。
请问能大概介绍一下你文章的思路吗?有点云里雾里的。对word级别的为什么要写成每篇文章的第一句,每篇文章的第二句这样循环输入呢?最后的Loss是什么意思?

image

Training and validation accuracy not changing during training

Multiple models (a02_TextCNN, a04_TextRCNN) have training and validation accuracy fixed at 0.5, while training loss/validation loss stably drops over time (dramatically, from 10^8-9 digits down to 1 to 2 digits.)

Is it normal or something wrong?

One of models, aa6_TwoCNNTextRelation has training accuracy fluctuate (above 0.5).

Thx.

How to import word2vec?

interpreter report no module name this when i run fasttext.sorry, i realize it is a python package

Getting error during running fasttext_train

@brightmart
Hi Mr.Brightmart
I want to run fast text, but in first step for running train - during running train I've got below error :
error-fast text

Actually I have 2 datasets (one for economic_news and one for lifestyle -- totally there are 14000 documents) in both of them at the end of each document with label labels have been defined.
I attached them
train-zhihu4-only-title-all.txt
Also I made some changes on these 3 program :

data_util_zhihu.txt
p5_fastTextB_model.txt
p5_fastTextB_train.txt

import word2vec

i tried to run the sample. but i found the code" import word2vec" in data_util_zhihu.py couldn't work. i want to know where can i download the word2vec, thank you very mcuh.

sess.run() blocks

Hello! I am new to tensorflow and when I run your model TextCNN, I get a issue, that is, sess.run() blocks.
I can only get the print before the code: "curr_loss,curr_acc,_=sess.run([textCNN.loss_val,textCNN.accuracy,textCNN.train_op],feed_dict=feed_dict)" and then , the program blocks! I already make sure the input data exists and I fail to figure it out.
Hope you can give me the answer, thanks for your patience.

sample data is missing

if not elsewhere reported, when you run the "python -u p7_TextCNN_train.py", a sample data is missing:

FileNotFoundError: [Errno 2] No such file or directory: '../data/train_label_single100_merge.txt' - and as noted in code

By the way, in which way the sample data is expected to be ? I am speaking of a normal text that needs to be classified. How long should each of the lines in the sample data be and what are they allowed to contain / not contain? And how they are pre-processed for the classification ?

error while running p8_TextRNN_train.py from a03_TextRNN

@brightmart
Dear Mr.brightmart,
Hi,

While I run train a03_TextRNN with google_news_wor22vec.bin and a text file with my documents + labels, I've got these errors :

error

How can I solve this issue?

cache_path: cache_vocabulary_label_pik/rnn_word_voabulary.pik file_exists: False
create vocabulary. word2vec_model_path: GoogleNews-vectors-negative300.bin
rnn_model.vocab_size: 3000001
create_voabulary_label_sorted.started.traning_data_path: train-zhihu4-only-title-all.txt
length of list_label: 146
label: 8476641588870267502 count_value: 3
label: 3738968195649774859 count_value: 3
label: -3517637179126242000 count_value: 3
label: 810067918938531886 count_value: 2
label: 7476760589625268543 count_value: 2
label: 4313812860434517324 count_value: 2
label: 1462130073299421617 count_value: 2
label: -8377411942628634656 count_value: 2
label: -7046289575185911002 count_value: 2
label: -6259864339809244567 count_value: 2
count top10: 23
create_voabulary_label_sorted.ended.len of vocabulary_label: 146
load_data.started...
load_data_multilabel_new.training_data_path: train-zhihu4-only-title-all.txt
0 x0: w18476 w4454 w1674 w6 w25 w474 w1333 w1467 w863 w6 w4430 w11 w813 w4463 w863 w6 w4430 w111
0 x1: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
ys_index:
0 y: -6270130442784051389 ;ys_mulithot_list: 107
1 x1: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
ys_index:
1 y: 1945786109636206690 ;ys_mulithot_list: 70
ys_index:
2 y: 7792886053889220161 ;ys_mulithot_list: 22
ys_index:
3 y: 465065448523711562 ;ys_mulithot_list: 49
number_examples: 164
load_data.ended...
start padding & transform to one hot...
trainX[0]: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
end padding & transform to one hot...

**

Traceback (most recent call last):
File "p8_TextRNN_train.py", line 184, in
tf.app.run()
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "p8_TextRNN_train.py", line 68, in main
vocab_size, FLAGS.embed_size, FLAGS.is_training)
File "/home/eslami/Downloads/all-kind-text_classification-master/a03_TextRNN/p8_TextRNN_model.py", line 33, in init
self.instantiate_weights()
File "/home/eslami/Downloads/all-kind-text_classification-master/a03_TextRNN/p8_TextRNN_model.py", line 45, in instantiate_weights
self.Embedding = tf.get_variable("Embedding",shape=[self.vocab_size, self.embed_size],initializer=self.initializer) #[vocab_size,embed_size] tf.random_uniform([self.vocab_size, self.embed_size],-1.0,1.0)
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/home/eslami/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py", line 664, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable Embedding already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/home/eslami/Downloads/all-kind-text_classification-master/a03_TextRNN/p8_TextRNN_model.py", line 45, in instantiate_weights
self.Embedding = tf.get_variable("Embedding",shape=[self.vocab_size, self.embed_size],initializer=self.initializer) #[vocab_size,embed_size] tf.random_uniform([self.vocab_size, self.embed_size],-1.0,1.0)
File "/home/eslami/Downloads/all-kind-text_classification-master/a03_TextRNN/p8_TextRNN_model.py", line 33, in init
self.instantiate_weights()
File "/home/eslami/Downloads/all-kind-text_classification-master/a03_TextRNN/p8_TextRNN_model.py", line 123, in test
textRNN=TextRNN(num_classes, learning_rate, batch_size, decay_steps, decay_rate,sequence_length,vocab_size,embed_size,is_training)

**

missing word2vec package

I am wondering that the word2vec package is the user defined package, it is not include int the project


conda install word2vec

ValueError: Variable Embedding already exists

Traceback (most recent call last):
File "p5_fastTextB_train.py", line 163, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "p5_fastTextB_train.py", line 76, in main
fast_text=fastText(FLAGS.label_size, FLAGS.learning_rate, FLAGS.batch_size, FLAGS.decay_steps, FLAGS.decay_rate,FLAGS.num_sampled,FLAGS.sentence_len,vocab_size,FLAGS.embed_size,FLAGS.is_training)
File "/home/defy/text_classification-master/a01_FastText/p5_fastTextB_model.py", line 29, in init
self.instantiate_weights()
File "/home/defy/text_classification-master/a01_FastText/p5_fastTextB_model.py", line 42, in instantiate_weights
self.Embedding = tf.get_variable("Embedding", [self.vocab_size, self.embed_size])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 653, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable Embedding already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/home/defy/text_classification-master/a01_FastText/p5_fastTextB_model.py", line 42, in instantiate_weights
self.Embedding = tf.get_variable("Embedding", [self.vocab_size, self.embed_size])
File "/home/defy/text_classification-master/a01_FastText/p5_fastTextB_model.py", line 29, in init
self.instantiate_weights()
File "/home/defy/text_classification-master/a01_FastText/p5_fastTextB_model.py", line 104, in test
fastText=fastTextB(num_classes, learning_rate, batch_size, decay_steps, decay_rate,5,sequence_length,vocab_size,embed_size,is_training)

python 2.7 ,tensorlfow 1.1, run p5_fastTextB_model, but get this error @brightmart

No such file or directory error

I try to run p7_TextCNN_train.py in a02_TextCNN, and I've downloaded the zhihu-word2vec-title-desc.bin-100 and put it in the same place as p7_TextCNN_train.py, then the error occurs: [Errno 2] No such file or directory: 'cache_vocabulary_label_pik/cnn2_word_voabulary.pik'
I've noticed that someone met almost the same problem as me, but I don't understand now that cnn2_word_voabulary.pik does not exist, how can the program utilize it? I can't figure out how to debug it. Should I update somewhere in the program?

where is the _word_voabulary.pik file

when I run a01_FastText/p5_fastTextB_train.py, but I get a error "can not find p4_zhihu_load_data library".

Then, I modify the code as
"
sys.path.append('../aa1_data_util')
from data_util_zhihu import load_data,create_voabulary,create_voabulary_label
"

but I get:

cache_path: cache_vocabulary_label_pik/_word_voabulary.pik file_exists: False
create vocabulary. word2vec_model_path: zhihu-word2vec-title-desc.bin-100
Traceback (most recent call last):
File "p5_fastTextB_train.py", line 159, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "p5_fastTextB_train.py", line 42, in main
vocabulary_word2index, vocabulary_index2word = create_voabulary()
File "../aa1_data_util/data_util_zhihu.py", line 40, in create_voabulary
with open(cache_path, 'a') as data_f:
FileNotFoundError: [Errno 2] No such file or directory: 'cache_vocabulary_label_pik/_word_voabulary.pik'

So where is the _word_voabulary.pik file? how can I get it?

License?

I'm guessing that this code is intended to be open sourced otherwise it wouldn't be here, but is there a particular license that you'd like to choose for your work?

ModuleNotFoundError: No module named 'a02_TextCNN'

Issue when trying to run the p7_TextCNN_predict.py after training

ModuleNotFoundError: No module named 'a02_TextCNN'

Question: can you get it running with clean anaconda and Python 3.6 and additional such word2vec, etc. installs ? I think that there are a some code changes needed in the files.

FileNotFoundError

No such file or directory: '..test-zhihu-forpredict-title-desc-v6.txt'

About Average

2.average vectors, to get representation of the sentence

self.sentence_embeddings = tf.reduce_mean(sentence_embeddings, axis=1) # [None,self.embed_size]

When training, we pad all sentences as max sentence length, then tf.reduce_mean(sentence_embeddings, axis=1) means sum/(max sentence length). How could it know the length of each sentence?

Thanks!

pickle.UnpicklingError: could not find MARK

Hi @brightmart
When i run the fasttext train script on windows or centos machine, I've got these errors "pickle.UnpicklingError: could not find MARK", it puzzled me a few days,please help me

D:\Anaconda3\python.exe D:/text_classification/a01_FastText/p6_fastTextB_train_multilabel.py
started...
ended...
D:\Anaconda3\lib\site-packages\gensim\utils.py:862: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
Traceback (most recent call last):
File "D:/text_classification/a01_FastText/p6_fastTextB_train_multilabel.py", line 192, in
tf.app.run()
File "D:\Anaconda3\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "D:/text_classification/a01_FastText/p6_fastTextB_train_multilabel.py", line 38, in main
vocabulary_word2index, vocabulary_index2word = create_voabulary()
File "D:\text_classification\aa1_data_util\data_util_zhihu.py", line 31, in create_voabulary
model = Word2Vec.load(word2vec_model_path)
File "D:\Anaconda3\lib\site-packages\gensim\models\word2vec.py", line 1483, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "D:\Anaconda3\lib\site-packages\gensim\utils.py", line 282, in load
cache_path: cache_vocabulary_label_pik/_word_voabulary.pik file_exists: False
create vocabulary. word2vec_model_path: ../data/zhihu-word2vec-title-desc.bin-100
obj = unpickle(fname)
File "D:\Anaconda3\lib\site-packages\gensim\utils.py", line 935, in unpickle
return _pickle.load(f, encoding='latin1')
_pickle.UnpicklingError: could not find MARK

does not support multi-label classification

Thank you for sharing these wonderful codes!

Just one issue, it seems in a03 TextRNN, the result part (self.predictions) does not support multi-label classification. There is a lot to change if I want to adapt it to a multi-label classification task. Also the inference part is mostly about single label classification.

Best wishes,
Hang

sample data, pre-trained word embedding

I'm getting this issue when I run training on a08 entity network and a06 seq2seq models.

Can I get or train this file?

zhihu-word2vec-title-desc.bin-100

screenshot_20170720_171417

Also, do you have sample datasets compatible with these models?

Errors when debugging

As a noob,I find it hard to debug your programs.There are too many problems...
Dear author,would you please provide even one complete code that could run directly?

Besides,I would really appreciate it if you explain how to make personal files similar with 'zhihu-word2vec-title-desc.bin-100'.

reload() was moved to importlib in Python 3

As discussed in #6, these four lines (or something similar) need to be added to the following files for them to work in Python 3.

./a8_predict.py:5:1: F821 undefined name 'reload'
./a8_train.py:5:1: F821 undefined name 'reload'
./a01_FastText/p5_fastTextB_predict.py:5:1: F821 undefined name 'reload'
./a01_FastText/p5_fastTextB_predict_multilabel.py:5:1: F821 undefined name 'reload'
./a01_FastText/p5_fastTextB_train.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/p7_TextCNN_predict.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/p7_TextCNN_train.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_predict_exp.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_0609.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_predict_exp512_simple.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_train_exp.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_train_exp512.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p7_TextCNN_train_exp_512_0609.py:5:1: F821 undefined name 'reload'
./a02_TextCNN/other_experiement/p8_TextCNN_predict_exp.py:5:1: F821 undefined name 'reload'
./a03_TextRNN/p8_TextRNN_predict.py:5:1: F821 undefined name 'reload'
./a03_TextRNN/p8_TextRNN_train.py:5:1: F821 undefined name 'reload'
./a04_TextRCNN/p71_TextRCNN_predict.py:5:1: F821 undefined name 'reload'
./a04_TextRCNN/p71_TextRCNN_train.py:5:1: F821 undefined name 'reload'
./a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_predict.py:5:1: F821 undefined name 'reload'
./a05_HierarchicalAttentionNetwork/p1_HierarchicalAttention_train.py:5:1: F821 undefined name 'reload'
./a06_Seq2seqWithAttention/a1_seq2seq_attention_predict.py:5:1: F821 undefined name 'reload'
./a06_Seq2seqWithAttention/a1_seq2seq_attention_train.py:5:1: F821 undefined name 'reload'
./a07_Transformer/a2_predict.py:5:1: F821 undefined name 'reload'
./a07_Transformer/a2_train.py:5:1: F821 undefined name 'reload'
./a08_EntityNetwork/a3_predict.py:5:1: F821 undefined name 'reload'
./a08_EntityNetwork/a3_train.py:5:1: F821 undefined name 'reload'
./aa1_data_util/2_predict_zhihu_get_question_representation.py:3:1: F821 undefined name 'reload'
./aa1_data_util/3_process_zhihu_question_topic_relation.py:3:1: F821 undefined name 'reload'
./aa4_TextCNN_with_RCNN/p72_TextCNN_with_RCNN_train.py:5:1: F821 undefined name 'reload'
./aa5_BiLstmTextRelation/p9_BiLstmTextRelation_train.py:5:1: F821 undefined name 'reload'
./aa6_TwoCNNTextRelation/p9_twoCNNTextRelation_train.py:5:1: F821 undefined name 'reload'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.