ibrahimsharaf / doc2vec Goto Github PK
View Code? Open in Web Editor NEW:notebook: Long(er) text representation and classification using Doc2Vec embeddings
License: MIT License
:notebook: Long(er) text representation and classification using Doc2Vec embeddings
License: MIT License
Add CLI support for the following commands:
Depends on #14
Depends on #13
Trigger automated builds on Travis after every commit, to make sure that the model doesn't break.
First of all I would like to thank you for sharing with us this nice project,
I am getting accuracy and F1 score when running the code,
How I can get precision an recall?
thanks
Use PyUP bot for automated updating of model dependencies.
Depends on #15
[root@ip-172-31-39-31 doc2vec]# python text_classifier.py
Traceback (most recent call last):
File "text_classifier.py", line 1, in
from models.doc2vec_model import doc2VecModel
File "/home/ec2-user/doc2vec/models/doc2vec_model.py", line 1, in
from .model import Model
File "/home/ec2-user/doc2vec/models/model.py", line 1, in
from abc import ABC, abstractmethod
ImportError: cannot import name ABC
Python installed:
[root@ip-172-31-39-31 doc2vec]# python -v
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /usr/lib64/python2.7/site.pyc matches /usr/lib64/python2.7/site.py
import site # precompiled from /usr/lib64/python2.7/site.pyc
# /usr/lib64/python2.7/os.pyc matches /usr/lib64/python2.7/os.py
import os # precompiled from /usr/lib64/python2.7/os.pyc
import errno # builtin
import posix # builtin
# /usr/lib64/python2.7/posixpath.pyc matches /usr/lib64/python2.7/posixpath.py
import posixpath # precompiled from /usr/lib64/python2.7/posixpath.pyc
# /usr/lib64/python2.7/stat.pyc matches /usr/lib64/python2.7/stat.py
import stat # precompiled from /usr/lib64/python2.7/stat.pyc
# /usr/lib64/python2.7/genericpath.pyc matches /usr/lib64/python2.7/genericpath.py
import genericpath # precompiled from /usr/lib64/python2.7/genericpath.pyc
# /usr/lib64/python2.7/warnings.pyc matches /usr/lib64/python2.7/warnings.py
import warnings # precompiled from /usr/lib64/python2.7/warnings.pyc
# /usr/lib64/python2.7/linecache.pyc matches /usr/lib64/python2.7/linecache.py
import linecache # precompiled from /usr/lib64/python2.7/linecache.pyc
# /usr/lib64/python2.7/types.pyc matches /usr/lib64/python2.7/types.py
import types # precompiled from /usr/lib64/python2.7/types.pyc
# /usr/lib64/python2.7/UserDict.pyc matches /usr/lib64/python2.7/UserDict.py
import UserDict # precompiled from /usr/lib64/python2.7/UserDict.pyc
# /usr/lib64/python2.7/_abcoll.pyc matches /usr/lib64/python2.7/_abcoll.py
import _abcoll # precompiled from /usr/lib64/python2.7/_abcoll.pyc
# /usr/lib64/python2.7/abc.pyc matches /usr/lib64/python2.7/abc.py
import abc # precompiled from /usr/lib64/python2.7/abc.pyc
# /usr/lib64/python2.7/_weakrefset.pyc matches /usr/lib64/python2.7/_weakrefset.py
import _weakrefset # precompiled from /usr/lib64/python2.7/_weakrefset.pyc
import _weakref # builtin
# /usr/lib64/python2.7/copy_reg.pyc matches /usr/lib64/python2.7/copy_reg.py
import copy_reg # precompiled from /usr/lib64/python2.7/copy_reg.pyc
# /usr/lib64/python2.7/traceback.pyc matches /usr/lib64/python2.7/traceback.py
import traceback # precompiled from /usr/lib64/python2.7/traceback.pyc
# /usr/lib64/python2.7/sysconfig.pyc matches /usr/lib64/python2.7/sysconfig.py
import sysconfig # precompiled from /usr/lib64/python2.7/sysconfig.pyc
# /usr/lib64/python2.7/re.pyc matches /usr/lib64/python2.7/re.py
import re # precompiled from /usr/lib64/python2.7/re.pyc
# /usr/lib64/python2.7/sre_compile.pyc matches /usr/lib64/python2.7/sre_compile.py
import sre_compile # precompiled from /usr/lib64/python2.7/sre_compile.pyc
import _sre # builtin
# /usr/lib64/python2.7/sre_parse.pyc matches /usr/lib64/python2.7/sre_parse.py
import sre_parse # precompiled from /usr/lib64/python2.7/sre_parse.pyc
# /usr/lib64/python2.7/sre_constants.pyc matches /usr/lib64/python2.7/sre_constants.py
import sre_constants # precompiled from /usr/lib64/python2.7/sre_constants.pyc
dlopen("/usr/lib64/python2.7/lib-dynload/_localemodule.so", 2);
import _locale # dynamically loaded from /usr/lib64/python2.7/lib-dynload/_localemodule.so
# /usr/lib64/python2.7/_sysconfigdata.pyc matches /usr/lib64/python2.7/_sysconfigdata.py
import _sysconfigdata # precompiled from /usr/lib64/python2.7/_sysconfigdata.pyc
import encodings # directory /usr/lib64/python2.7/encodings
# /usr/lib64/python2.7/encodings/__init__.pyc matches /usr/lib64/python2.7/encodings/__init__.py
import encodings # precompiled from /usr/lib64/python2.7/encodings/__init__.pyc
# /usr/lib64/python2.7/codecs.pyc matches /usr/lib64/python2.7/codecs.py
import codecs # precompiled from /usr/lib64/python2.7/codecs.pyc
import _codecs # builtin
# /usr/lib64/python2.7/encodings/aliases.pyc matches /usr/lib64/python2.7/encodings/aliases.py
import encodings.aliases # precompiled from /usr/lib64/python2.7/encodings/aliases.pyc
# /usr/lib64/python2.7/encodings/utf_8.pyc matches /usr/lib64/python2.7/encodings/utf_8.py
import encodings.utf_8 # precompiled from /usr/lib64/python2.7/encodings/utf_8.pyc
Python 2.7.16 (default, Jul 19 2019, 22:59:28)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
dlopen("/usr/lib64/python2.7/lib-dynload/readline.so", 2);
import readline # dynamically loaded from /usr/lib64/python2.7/lib-dynload/readline.so
Add support to train the model on other text classification datasets, which will make it suitable for more use cases.
Convert the model code to OOP structure, for better readability and style.
Hi Ibrahim,
I have been using Scala/Spark for text analysis. I am in the process of switching over to Python as many new algorithms are available in Python. I am trying your doc2vec example mentioned here and able to understand the code. But When I run "python model.py", I was keep on getting basic errors like 'Import error: module not found' etc. I understood that I am missing some of the python modules. It would be very beneficial if you can update the code with 'Usage' as in the packages needed to run this code. For example, does this code needs python3/python? I really appreciate your response. Thanks for your help and time.
Add unit tests to trigger the model training and inference then make sure the performance doesn't worsen below a certain threshold (80%?)
Depends on #14
The script now only supports training and testing given an input dataset, we need to add a new function to support prediction given a new example.
Depends on #14
Hi, it is not clear for me why in the getVecs function,
if(vecs_type == 'Test'): index = index + 183891
Is there any reason to put that value?
Add a Jupyter notebook with the following modifications (more ideas are welcomed!) to better understand what is going on under the hood:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.