anttttti / wordbatch Goto Github PK

View Code? Open in Web Editor NEW

413.0 413.0 61.0 1.41 MB

Python library for distributed AI processing pipelines, using swappable scheduler backends.

License: GNU General Public License v2.0

C++ 4.85% C 4.20% Python 63.22% Dockerfile 0.23% Makefile 0.53% Cython 26.97%

wordbatch's People

Contributors

Stargazers

Watchers

Forkers

shashankg7 shrinba hengqujushi jekamalyshev xuelun dobatymo alexkruegger mars2018 puremath86 jiyanminoscar swpflow yuhai-china snaildm ab-be hellozgm satadru5 anirband kublai-jing sydatascience forestlzj dddzg horizonhao yyljlyy mangohero1985 hezhimin janzenliu chenghuige junman lyogavin sophie-greene yingxi320 bbzzzz yang-tradelab bweiher ml-lab baokunguo jinshi27 imrrahul randomthoughts2018 sunshineing leftthink xiaomaohoujiao2 shuiliwanwu leejuhui decpaul emediacode liybu36 samuelkdavis-mccarthyfinch jetou canivel yuhonghong7035 jerrycatleung emailhy binnong kobzol mathlf2015 sandy4321 0xcoda vic-ai iq-scm

wordbatch's Issues

Parralelization fail

Any idea how to fix? Has just started happening without changing code. Thought this was from running too many kernels at once but even if I run one kernel I get this error....

From kaggle script environment

WORBBAG_ITEM_DESC_PARAMS = {'hash_ngrams': 2, 'hash_ngrams_weights': [1.0, 1.0],
                            'hash_size': 2 ** 26, 'norm': 'l2', 'tf': 1.0, 'idf': None}
   wb = wordbatch.WordBatch(normalize_text, extractor=(WordBag, WORBBAG_ITEM_DESC_PARAMS),
                             procs=procs)
    wb.dictionary_freeze= True
    X_description = wb.fit_transform(full_df['item_description'])


2237.9s
477
Parallelization fail. Method: multiprocessing Task: <function batch_normalize_texts at 0x7f76ce912ea0>
Retrying, attempt: 1 timeout limit: 1200 seconds
2250.8s
478
Parallelization fail. Method: multiprocessing Task: <function batch_normalize_texts at 0x7f76ce912ea0>
Retrying, attempt: 2 timeout limit: 2400 seconds
2263.4s
479
Parallelization fail. Method: multiprocessing Task: <function batch_normalize_texts at 0x7f76ce912ea0>
Retrying, attempt: 3 timeout limit: 4800 seconds
2276.2s
480
Parallelization fail. Method: multiprocessing Task: <function batch_normalize_texts at 0x7f76ce912ea0>
Retrying, attempt: 4 timeout limit: 9600 seconds
2288.9s
481
Parallelization fail. Method: multiprocessing Task: <function batch_normalize_texts at 0x7f76ce912ea0>
Extract wordbags
2288.9s
482
Traceback (most recent call last):
  File "../src/script.py", line 573, in <module>
    sparse_mat = preprocess_for_fm(full_df)
  File "../src/script.py", line 552, in preprocess_for_fm
    X_description = wb.fit_transform(full_df['item_description'])
  File "/opt/conda/lib/python3.6/site-packages/wordbatch/wordbatch.py", line 230, in fit_transform
    return self.transform(texts, labels, extractor, cache_features, input_split)
  File "/opt/conda/lib/python3.6/site-packages/wordbatch/wordbatch.py", line 242, in transform
    texts= extractor.transform(texts, input_split= True, merge_output= True)
  File "wordbatch/extractors/extractors.pyx", line 185, in wordbatch.extractors.extractors.WordBag.transform
  File "/opt/conda/lib/python3.6/site-packages/wordbatch/wordbatch.py", line 305, in parallelize_batches
    paral_params=  [[data_batch]+ args for data_batch in data]
TypeError: 'NoneType' object is not iterable
2288.9s
483
2288.9s
484
Failed. Exited with code 1.

Error on trying to import FM_FTRL

I installed wordbatch on mac (OS X El Capitan).

import wordbatch doesn't give errors
but from wordbatch.models import FM_FTRL throws this error:

ImportError                               Traceback (most recent call last)
<ipython-input-5-6f5587655718> in <module>()
----> 1 from wordbatch.models import FM_FTRL

/Users/yuliamahtani/anaconda/lib/python3.5/site-packages/wordbatch/models/__init__.py in <module>()
      2 from .fm_ftrl import FM_FTRL
      3 from .nn_relu_h1 import NN_ReLU_H1
----> 4 from .nn_relu_h2 import NN_ReLU_H2

wordbatch/models/nn_relu_h2.pyx in init wordbatch.models.nn_relu_h2 (wordbatch/models/nn_relu_h2.c:24869)()

/Users/yuliamahtani/anaconda/lib/python3.5/site-packages/randomgen/__init__.py in <module>()
      1 from randomgen.dsfmt import DSFMT
      2 from randomgen.generator import RandomGenerator
----> 3 from randomgen.mt19937 import MT19937
      4 from randomgen.pcg32 import PCG32
      5 from randomgen.pcg64 import PCG64

/Users/yuliamahtani/anaconda/lib/python3.5/site-packages/randomgen/mt19937.pyx in init randomgen.mt19937()
      9 cimport numpy as np
     10 
---> 11 from randomgen.common import interface
     12 from randomgen.common cimport *
     13 from randomgen.distributions cimport brng_t

ImportError: cannot import name interface

please help

FM: model coefficients Nan after fit

Hi!
I'm running this kernel on kaggle:
kernel

After the fit on a valid matrix (without NA) the status of the model still contains Nan.
During the fit there are neither exceptions nor warnings.

To see the status of de model I have called :
model.__getstate__()

The result is:

model FM_FTRL status:
(0.01, 0.01, 1e-05, 0.1, 0.01, 0.0, 1.0, 0.0, 1020688, 200, 17, array([ nan, 0., 0., ..., 1., 1., 1.]), array([ nan, nan, nan, ..., nan, nan, nan]), array([ nan, nan, nan, ..., nan, nan, nan]), array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan]), array([ nan, nan, nan, ..., nan, nan, nan]), array([ nan, nan, nan, ..., nan, nan, nan]), 0, 0, True)

pip install wordbatch failed on python 3.5 on Mac

Traceback is ```

Command "/Users/carenv/bin/python3 -u -c "import setuptools, tokenize;file='/private/var/folders/h7/6h97dljx0n7bzvtttwv4jj6c0000gn/T/pip-build-7ibtzw0e/wordbatch/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /var/folders/h7/6h97dljx0n7bzvtttwv4jj6c0000gn/T/pip-0c39szdv-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/shleifer/Dropbox/projects/mercari/carenv/bin/../include/site/python3.5/wordbatch" failed with error code 1 in /private/var/folders/h7/6h97dljx0n7bzvtttwv4jj6c0000gn/T/pip-build-7ibtzw0e/wordbatch/

are this times normal?

First, thanks for this amazing tool!

My question. Is 3-4 second time normal for a single tiff-IDF calculation (a text of 300 words approx)?

I want to use "lime" (Explaining the predictions of any machine learning classifier) but this time is just too big for the amount of iterations that lime needs.

thanks!

About Wordbatch

I wrote in Dec/18 a post in medium about your awesome library, for anybody that is looking for more info please visit:
https://medium.com/@d.canivel/wordbatch-a-parallel-text-feature-extraction-for-machine-learning-eb3696f40996

Thanks

cross validation and grid search

I would like to use FM_FTRL in an sklearn cross-validation pipeline, e.g.,

from wordbatch.models import FM_FTRL

modelF = FM_FTRL(
      alpha=0.01,    # learning rate
      beta=0.1,
      L1=0.00001,
      L2=0.10,
      D=X_train.shape[1],
      alpha_fm=0.01,
      L2_fm=0.0,
      init_fm=0.01,
      D_fm=50,
      e_noise=0.0001,
      iters=5,
      inv_link='sigmoid',
      threads=4
  )

cv_scores = cross_val_score(modelF, X_train.tocsc(), y_train_fm.target.values, scoring='roc_auc', cv=time_split)

This throws

TypeError: Cannot clone object '<wordbatch.models.fm_ftrl.FM_FTRL object at 0x557056cfbfa0>' (type <class 'wordbatch.models.fm_ftrl.FM_FTRL'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

This error is also thrown when trying to pass a FM_FTRL model to GridSearchCV.

Can you provide some guidance on how to make this work?

I can see in this thread that you tuned hyperparameters with random search. Can you provide guidance on that?

Thank you!

predict() takes a very long time

Hi. I trained an FM_FTRL model for 5 iterations which took 5 hours on 120 million records data.
But when I try to output predictions on this trained model, it takes a very long time (I end up killing it after it runs for 1 hour).
Is this normal ? Is prediction supposed to take this long ?
I use the latest github version of wordbatch: 1.3.5.

By the way, pickle_model() does not seem to work, it uses get_params which is not implemented.
I ended up using regular pickle.dump()

Thanks,

from wordbatch.data_utils import *

Hi, I am getting an error when I access wordbatch.data_utils, can you please help.

ImportError Traceback (most recent call last)
in ()
9 import gc
10 from contextlib import contextmanager
---> 11 from wordbatch.data_utils import *

ImportError: No module named 'wordbatch.data_utils'

TypeError: only size-1 arrays can be converted to Python scalars (Windows, Python 3.5)

Hello,

I run the following code with WordBatch on windows:

import wordbatch
from wordbatch.extractors import WordBag
from wordbatch.models import FTRL, FM_FTRL
import pandas as pd
import pandas as pd
import numpy as np
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=10000, noise=0.7)
from sklearn.model_selection import train_test_split
X_t, X_v, y_t, y_v = train_test_split(X, y, stratify=y, test_size=0.2, random_state=8)

train_weight = np.array(pd.DataFrame(y_t).replace(1,400).replace(0,1).astype('float64'))

clf = FM_FTRL(alpha=0.5, beta=1, L1=10.0, L2=10.0, D=2 * 20, alpha_fm=0.02,
L2_fm=0.0, init_fm=0.01, weight_fm=1.0,
D_fm=2, e_noise=0.0, iters=3,
inv_link="sigmoid", threads=4
)

clf.fit(X_t, y_t, train_weight)
class_pred = clf.predict(X_v)

I get the following error:

TypeError Traceback (most recent call last)
in ()
---> 30 clf.fit(X_t, y_t, train_weight)
31 class_pred = clf.predict(X_v)

wordbatch\models\fm_ftrl.pyx in wordbatch.models.fm_ftrl.FM_FTRL.fit()

TypeError: only size-1 arrays can be converted to Python scalars

It appears only when WEIGHT is used.

Do you know possible cause?

Thanks,
Nikita

Licensing for commercial use without open source?

This is an amazing library and I'd like to use it for some commercial work, but we're not able to satisfy the GNU GPL conditions of stating changes or disclosing source. Would you be willing to distribute under an MIT, Apache, or BSD license so that we could use this work?

'tuple' object has no attribute 'transform'

hello i m using following code to transform features:

wb.fit(X_train_with_new_feature['name'].tolist())
X_train_name_wordbatch = wb.transform(X_train_with_new_feature['name'].tolist())

but keep getting error " 'tuple' object has no attribute 'transform' "

can anyone help me?

will it work for Windows ?

Does the WordSeq extractor support ngrams?

E.g. if I want sequences of integers, with ngrams appended to the end?

Illegal instruction (core dumped)

Any idea on why this error came out?
('memory GB:', 5.425804138183594)
Illegal instruction (core dumped)

Thanks,

pip install wordbatch on macos---error: command 'gcc-7' failed with exit status 1

I was unable to successfully install this wordbatch lib through the last issue solution，can anyone give me some advice?

last issue solution:

brew install gcc
export CC=gcc-7
export CXX=g++-7

error: command 'gcc-7' failed with exit status 1

WordVec extractor failing due to decode error

After trying out wordbatch using WordVec extractor, I am facing the following problem.

I have used the following code to initialize the wordbatch
wb= wordbatch.WordBatch(normalize_text,extractor=(Hstack, [(WordVec, {"wordvec_file": "../input/glove6b300dtxt/glove.6B.300d.txt", "normalize_text": normalize_text}), (WordVec, {"wordvec_file": "../../../data/word2vec/glove.6B.50d.txt.gz", "normalize_text": normalize_text})]))

Python = 3.6

can anyone suggest on installation issue for wordbatch

TypeError: object of type 'type' has no len()

My configuration -
wordbatch-1.3.0
pandas-0.22
python 3.6.2
ubuntu 14.04
Executing kaggle script without any changes
https://www.kaggle.com/anttip/wordbatch-ftrl-fm-lgb-lbl-0-42555

TypeError Traceback (most recent call last)
in ()
153 merge['name'] = merge['name'].astype(str)
154 print(len(merge['name']))
--> 155 X_name = wb.fit_transform(merge['name'])
156 del(wb)
157 X_name = X_name[:, np.array(np.clip(X_name.getnnz(axis=0) - 1, 0, 1), dtype=bool)]

~/lal/Kaggle/kaggleme/input/bkup/wordbatch/wordbatch.py in fit_transform(self, texts, labels, extractor, cache_features, input_split)
239
240 def fit_transform(self, texts, labels=None, extractor= None, cache_features= None, input_split= False):
--> 241 return self.transform(texts, labels, extractor, cache_features, input_split)
242
243 def partial_fit(self, texts, labels=None, input_split= False, merge_output= True):

~/lal/Kaggle/kaggleme/input/bkup/wordbatch/wordbatch.py in transform(self, texts, labels, extractor, cache_features, input_split)
248 if extractor== None: extractor= self.extractor
249 if cache_features != None and os.path.exists(cache_features): return extractor.load_features(cache_features)
--> 250 if not(input_split): texts= self.split_batches(texts)
251 texts= self.fit(texts, return_texts=True, input_split=True, merge_output=False)
252 if extractor!= None:

~/lal/Kaggle/kaggleme/input/bkup/wordbatch/wordbatch.py in split_batches(self, *args, **kwargs)
265
266 def split_batches(self, *args, **kwargs):
--> 267 return self.batcher.split_batches(*args, **kwargs)
268
269 def merge_batches(self, *args, **kwargs):

~/lal/Kaggle/kaggleme/input/bkup/wordbatch/batcher.py in split_batches(self, data, minibatch_size)
70 else: len_data= data.shape[0]
71 if minibatch_size> len_data: minibatch_size= len_data
---> 72 if data_type == pd.DataFrame:
73 data_split = [data.iloc[x * minibatch_size:(x + 1) * minibatch_size] for x in
74 range(int(ceil(len_data / minibatch_size)))]

~/anaconda2/envs/sdp/lib/python3.6/site-packages/pandas/core/ops.py in f(self, other)
1326 return self._compare_frame(other, func, str_rep)
1327 elif isinstance(other, ABCSeries):
-> 1328 return self._combine_series_infer(other, func, try_cast=False)
1329 else:
1330

~/anaconda2/envs/sdp/lib/python3.6/site-packages/pandas/core/frame.py in _combine_series_infer(self, other, func, level, fill_value, try_cast)
3946 def _combine_series_infer(self, other, func, level=None,
3947 fill_value=None, try_cast=True):
-> 3948 if len(other) == 0:
3949 return self * np.nan
3950

TypeError: object of type 'type' has no len()

AttributeError: Can't get attribute 'normalize_text' on <module 'main'>

Hello,

If I save the model into a pickle ( the wordbatch.worbatch object, TFIDF calculated) and then load in other file, it says "Can't get attribute 'normalize_text' on <module 'main'>" , even if I have initialized wordbatch given to it the normalize function when I saved in the pickle.

If I manually copy the normalize function in the main scope, the problem is solved, but this approach is not useful if I want to use whit Gunicorn server, for example.

Import FTRL fails

I have tried to install from source both 1.3.3 and 1.3.5 versions. When issuing the import FTRL commands I get a strange "Ilegal instruction" message.

I am working on an Ubuntu 14.01 system and install python3.6 in a virtualenv. Other than some warnings:

wordbatch/models/fm_ftrl.c:2916:10: note: ‘__pyx_v_d’ was declared here
   double __pyx_v_d;

I don't see any suspicious in the installation.
This is the message I get:

(python36) voglis:~$ python3
Python 3.6.5 (default, Mar 29 2018, 00:00:00)
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import wordbatch
>>> from wordbatch.models import FTRL
Illegal instruction (core dumped)

And this is my Python package list:

(python36) voglis:~$ pip list
Package            Version
------------------ -------
Cython             0.28.2
numpy              1.14.2
pandas             0.22.0
pip                10.0.1
py-lz4framed       0.11.0
python-dateutil    2.7.2
python-Levenshtein 0.12.0
pytz               2018.4
randomgen          1.14.4
randomstate        1.14.0
scikit-learn       0.19.1
scipy              1.0.1
setuptools         39.0.1
six                1.11.0
wheel              0.31.0
Wordbatch          1.3.3

cannot install on windows 8.1

Hi @anttttti, thanks for your great package and congrats for your 5th place in Mercair !
I'm trying to get wordbatch to work on my own PC (kaggle kernels are great but I prefer getting things done locally ;), however I can't seem to get the pip installtion right.

I'm using Anaconda with python 3.6 and VC++ build tools 2015 installed

Here is the full installation output :

pip install wordbatch
Collecting wordbatch
Using cached Wordbatch-1.3.0.tar.gz
Requirement already satisfied: cython in c:\users\olivier\anaconda3\lib\site-packages (from wordbatch)
Requirement already satisfied: scikit-learn in c:\users\olivier\anaconda3\lib\site-packages (from wordbatch)
Requirement already satisfied: python-Levenshtein in c:\users\olivier\anaconda3\lib\site-packages (from wordbatch)
Requirement already satisfied: py-lz4framed in c:\users\olivier\anaconda3\lib\site-packages (from wordbatch)
Requirement already satisfied: setuptools in c:\users\olivier\anaconda3\lib\site-packages (from python-Levenshtein->wordb
atch)
Building wheels for collected packages: wordbatch
Running setup.py bdist_wheel for wordbatch ... error
Complete output from command C:\Users\olivier\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\olivier\AppData\Local\Temp\pip-build-0xefvztp\wordbatch\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\olivier\AppData\Local\Temp\tmp7rkq7xj6pip-wheel- --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\wordbatch
copying wordbatch\wordbatch.py -> build\lib.win-amd64-3.6\wordbatch
copying wordbatch_init_.py -> build\lib.win-amd64-3.6\wordbatch
creating build\lib.win-amd64-3.6\wordbatch\extractors
copying wordbatch\extractors_init_.py -> build\lib.win-amd64-3.6\wordbatch\extractors
creating build\lib.win-amd64-3.6\wordbatch\models
copying wordbatch\models_init_.py -> build\lib.win-amd64-3.6\wordbatch\models
running build_ext
error: [WinError 2] The system cannot find the file specified

Failed building wheel for wordbatch
Running setup.py clean for wordbatch
Failed to build wordbatch
Installing collected packages: wordbatch
Running setup.py install for wordbatch ... error
Complete output from command C:\Users\olivier\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\U
sers\olivier\AppData\Local\Temp\pip-build-0xefvztp\wordbatch\setup.py';f=getattr(tokenize, 'open', open)(file)
;code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\olivier\A
ppData\Local\Temp\pip-07pzlhjs-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\wordbatch
copying wordbatch\wordbatch.py -> build\lib.win-amd64-3.6\wordbatch
copying wordbatch_init_.py -> build\lib.win-amd64-3.6\wordbatch
creating build\lib.win-amd64-3.6\wordbatch\extractors
copying wordbatch\extractors_init_.py -> build\lib.win-amd64-3.6\wordbatch\extractors
creating build\lib.win-amd64-3.6\wordbatch\models
copying wordbatch\models_init_.py -> build\lib.win-amd64-3.6\wordbatch\models
running build_ext
error: [WinError 2] The system cannot find the file specified

----------------------------------------

Command "C:\Users\olivier\Anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\olivier\AppData\Local\Temp\pip-build-0xefvztp\wordbatch\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\olivier\AppData\Local\Temp\pip-07pzlhjs-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\olivier\AppData\Local\Temp\pip-build-0xefvztp\wordbatch\

Would you have any idea?
Thanks.

Word batch installs BUT ImportError for FTRL

I installed wordbatch on MacOSX Sierra using PIP. The installation was successful. The following commands work in Jupyter notebook

import wordbatch
from wordbatch.extractors import WordBag, WordHash

However importing FTRL gives an Import errror (screen shot attached).
from wordbatch.models import FTRL

Please advise.

Regards
Shanth

Not able to install for Mac os High Sierra

Tried to use: pip install word batch but failed building wheel for py-lz4framed.

I do have python 3.6 installed under anaconda package.
The computer os: Mac os high sierra.

Does the library only work in linux operation system?

Thank you!

Multiprocessing Hanging in Python 3.6+

Hi, I'm building a very simple test script in jupyter using your own example dataset Tweets.csv with the same preprocessing and normalization.

If I use method="serial", it runs at the same time no error, but if I change to multiprocessing it hangs and stay there forever, does not matter the size of the corpus...

I'm pretty sure there is no error at the corpus since serial runs ok... looks like there is some lock or limit for multiprocessing in windows... I'm researching it, if you have any fix please let me know.

suport ffm_flrl algorithm?

Can I add documentation for you?

Hey dude, this library is amazing and I like it very much. Can I write documentation for it? I have gone through some of the codes in the process of using it and understanding it. I think such a great work will be greater with a documentation. Maybe we can collaborate?

Waiting for your reply : )

Tried to pickle the fitted wordbatch model, but bumped into this Error: AttributeError: 'function' object has no attribute 'im_self'

Thanks for this great work, hopefully this could incrementally importing all the NLP feature extraction tricks!

I tried to pickle the fitted model for future testing data, but bumped into this Error which says:

wordbatch/extractors/extractors.pyx in wordbatch.extractors.extractors._pickle_method()
AttributeError: 'function' object has no attribute 'im_self'

Awesome Library

I developed this, because at the time there was nothing. However I really like your api. So I'm going to try to use it in my next blog post.

Cheers!

IndexError: too many indices for array

I developed using your code for fmftrl, received this error when running fit on it.

Traceback (most recent call last):
File "../src/script.py", line 259, in
clf.fit(train_features, labels, 0.2, reset=False)
File "wordbatch/models/fm_ftrl.pyx", line 227, in wordbatch.models.fm_ftrl.FM_FTRL.fit
1263.7s
18
File "wordbatch/models/fm_ftrl.pyx", line 272, in wordbatch.models.fm_ftrl.FM_FTRL.fit_f
IndexError: too many indices for array

My train and test set shape is Shapes : (1503424, 10016) (508438, 10016)
any idea how to solve?

"Illegal operation" when importing wordbatch.extractors

I can "import wordbatch", but importing wordbatch.extractors kills the interpreter with "Illegal opration".

My environment:

CentOS Linux release 7.4.1708
Python 3.6.3
numpy 1.14.2
cython 0.28.2
scipy 1.0.1
pandas 0.22.0

The environment I'm working with is not Anaconda, however, I was able to reproduce it on the very same OS with Python 3.6.4 on Anaconda.