Giter Site home page Giter Site logo

metapad's People

Contributors

mjiang89 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

metapad's Issues

Can MetaPAD works on Chinese corpus?

Hi,

I am very interested in how MetaPAD works, and I am interested in how it works on Chinese corpus. However, it seems I can't find some files when I changed them into Chinese corpus, so I need your help, can you tell me what should I do for Chinese corpus? Thank you very much.

when I did your test in Github, I get the result of your test. But when I did test run after change the corpus, it gives the following outputs:

rm -rf bin

mkdir -p bin

g++ -std=c++11 -Wall -O3 -msse2 -fopenmp -I.. -pthread -lm -Wno-unused-result -Wno-sign-compare -Wno-unused-variable -Wno-parentheses -Wno-format -o bin/segphrase_train src/main.cpp

Traceback (most recent call last):

File "metapad.py", line 1166, in

Encrypt(file_output_encrypted,file_output_label,file_output_positive,file_output_key,file_input_entitylinking,file_input_goodpattern,file_input_stopwords,10,LEVEL)

File "metapad.py", line 17, in Encrypt

if sentence[n-1][0] == 'PERIOD': sentence = sentence[0:n-1]

IndexError: list index out of range

Traceback (most recent call last):

File "metapad.py", line 1166, in

Encrypt(file_output_encrypted,file_output_label,file_output_positive,file_output_key,file_input_entitylinking,file_input_goodpattern,file_input_stopwords,10,LEVEL)

File "metapad.py", line 17, in Encrypt

if sentence[n-1][0] == 'PERIOD': sentence = sentence[0:n-1]

IndexError: list index out of range

=== Current Settings ===

Iterations = 2

Minimum Support Threshold = 30

Maximum Length Threshold = 20

POS-Tagging Mode Disabled

Discard Ratio = 0.050000

Number of threads = 15

Auto labels from knowledge bases

    Labeling Method = ByLengthByPositive

    Max Positive Samples = 100

    Negative Sampling Ratio = 2

=======

of total tokens = 6438

of total word tokens = 6438

max word token id = 1564

of documents = 810

of POS tags = 0

The number of sentences = 1

unigrams inserted

of frequent patterns of length-1 = 1566

of frequent patterns of length-2 = 2

of frequent patterns of length-3 = 2

of frequent patterns of length-4 = 2

of frequent patterns of length-5 = 2

of frequent patterns of length-6 = 2

of frequent patterns of length-7 = 2

of frequent patterns of length-8 = 2

of frequent patterns of length-9 = 2

of frequent patterns of length-10 = 2

of frequent patterns of length-11 = 2

of frequent patterns of length-12 = 2

of frequent patterns of length-13 = 2

of frequent patterns of length-14 = 2

of frequent patterns of length-15 = 2

of frequent patterns of length-16 = 2

of frequent patterns of length-17 = 2

of frequent patterns of length-18 = 2

of frequent patterns of length-19 = 2

of frequent patterns of length-20 = 2

of frequent patterns = 1584

total occurrence = 35810

feature extraction done!

=== Generate Labels ===

matched positives = 0

matched negatives = 19

selected positives = 0

selected negatives = 19

Loaded Truth = 19

Recognized Truth = 19

Feature Matrix = 1584 X 14

of threads = 15

Start Classifier Training...

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree![ERROR] empty node in decision tree![ERROR] empty node in decision tree!

[ERROR] empty node in decision tree![ERROR] empty node in decision tree![ERROR] empty node in decision tree![ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

cp: cannot stat `cseg/tmp/quality_phrases.txt': No such file or directory

=== Current Settings ===

Iterations = 2

Minimum Support Threshold = 30

Maximum Length Threshold = 20

POS-Tagging Mode Disabled

Discard Ratio = 0.050000

Number of threads = 15

Auto labels from knowledge bases

    Labeling Method = ByLengthByPositive

    Max Positive Samples = 100

    Negative Sampling Ratio = 2

=======

of total tokens = 6438

of total word tokens = 6438

max word token id = 1564

of documents = 810

of POS tags = 0

The number of sentences = 1

unigrams inserted

of frequent patterns of length-1 = 1566

of frequent patterns of length-2 = 2

of frequent patterns of length-3 = 2

of frequent patterns of length-4 = 2

of frequent patterns of length-5 = 2

of frequent patterns of length-6 = 2

of frequent patterns of length-7 = 2

of frequent patterns of length-8 = 2

of frequent patterns of length-9 = 2

of frequent patterns of length-10 = 2

of frequent patterns of length-11 = 2

of frequent patterns of length-12 = 2

of frequent patterns of length-13 = 2

of frequent patterns of length-14 = 2

of frequent patterns of length-15 = 2

of frequent patterns of length-16 = 2

of frequent patterns of length-17 = 2

of frequent patterns of length-18 = 2

of frequent patterns of length-19 = 2

of frequent patterns of length-20 = 2

of frequent patterns = 1584

total occurrence = 35810

feature extraction done!

=== Generate Labels ===

matched positives = 0

matched negatives = 19

selected positives = 0

selected negatives = 19

Loaded Truth = 19

Recognized Truth = 19

Feature Matrix = 1584 X 14

of threads = 15

Start Classifier Training...

[ERROR] empty node in decision tree![ERROR] empty node in decision tree![ERROR] empty node in decision tree![ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree!

[ERROR] empty node in decision tree![ERROR] empty node in decision tree!cp: cannot stat `cseg/tmp/quality_phrases.txt': No such file or directory

Traceback (most recent call last):

File "metapad.py", line 1236, in

SalientFast(file_output_salient,file_output_key,file_input_phrase,file_input_mapping)

File "metapad.py", line 1148, in SalientFast

fr = open(file_input_phrase,'rb')

IOError: [Errno 2] No such file or directory: 'output/top-token-phrase.txt'

Traceback (most recent call last):

File "metapad.py", line 1236, in

SalientFast(file_output_salient,file_output_key,file_input_phrase,file_input_mapping)

File "metapad.py", line 1148, in SalientFast

fr = open(file_input_phrase,'rb')

IOError: [Errno 2] No such file or directory: 'output/bottom-token-phrase.txt'

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.