Giter Site home page Giter Site logo

few's People

Contributors

lacava avatar ohjeah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

few's Issues

ImportError: dlopen ... symbol not found

Hi, I've cloned few, built and installed on OS X 10.12 using:

CC=gcc-7 python setup.py install

But I'm getting a symbol not found error on import of the few module.

I note a few warnings during the build process beginning with: #warning "Using deprecated NumPy API, disable it by ...

and then finally:

g++ -bundle -undefined dynamic_lookup -L/Users/robertreynolds/anaconda3/envs/ml/lib -arch x86_64 -L/Users/robertreynolds/anaconda3/envs/ml/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.6/few/lib/few_lib.o -o build/lib.macosx-10.7-x86_64-3.6/few_lib.cpython-36m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]

Any advice what to check next?
Otherwise, I'm not entirely clear on why I'm seeing a clang message, so that, along with the indicated warning is my first avenue to explore.

Error with installation

Hello,

Trying to attempt this package but running into some issues, any idea? I have VS 14.16 now on PC and getting this error when typing 'pip install few'. At first it was asking for eigency but now after that installation this error popped up.

image

Sincerely,
G

installing few with pip

it cannot be installed with pip since you are importing eigency inside setup.py and the requirements are not installed (yet)!

GridSearchCV error

Concerning errors of the form

self.ml.named_steps = undefined
    205                   hasattr(self.ml.named_steps['ml'],'feature_importances_')):
    208                 coef = (self.ml.named_steps['ml'].coef_ if
AttributeError: 'SGDClassifier' object has no attribute 'named_steps'

when using FEW in GridSearchCV while changing the ML parameter. The pipeline object needs to be redefined in the fit method so that GridSearch can change self.ml and the pipeline gets updated.

Issues with current ML validation score

Hello,

Thanks for the help so far. I was able to get the tool up and running in windows.

However, 2 weird things I am observing.

  1. When I use Gradient Boost Regressor - my score gets worse by the generation even when I switched the scoring function sign. The first score is nearly my best score I have gotten by myself (no feature engineering done on data set).

https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_GB.ipynb

  1. When I use Random Forest - same scorer - current ML validation score returns as 0 and runs really fast

https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/FEW_RF.ipynb

I think I am missing something on how to use this tool but no idea what. I am trying to use this in tandem with TPOT as I am exploring feature creation GA/GP based tools. Sincerely appreciate any advice/guidance you can provide.

Sincerely,
G

If original features are found by FEW, transform() method fails with TypeError

If original features are found by FEW, transform() method fails with TypeError

eg:

print('Model: {}'.format(learner.print_model()))

Model: original features

Phi = learner.transform(X_test.values)


TypeError Traceback (most recent call last)
in ()
----> 1 Phi = learner.transform(X_test.values)

~/anaconda3/envs/ml/lib/python3.6/site-packages/FEW-0.0.38-py3.6-macosx-10.7-x86_64.egg/few/few.py in transform(self, x, inds, labels)
395 # return np.asarray(Parallel(n_jobs=10)(delayed(self.out)(I,x,labels,self.otype) for I in self._best_inds)).transpose()
396 return np.asarray(
--> 397 [self.out(I,x,labels,self.otype) for I in self._best_inds]).transpose()
398
399

TypeError: 'NoneType' object is not iterable

value error in lasso

occasional error:
File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/home/bill/anaconda3/lib/python3.5/runpy.py", line 85, in run_code
exec(code, run_globals)
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 506, in
main()
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 495, in main
learner.fit(training_features, training_labels)
File "/media/bill/Drive/Dropbox/PostDoc/code/few/few/few.py", line 181, in fit
self.ml.fit(pop.X.transpose(),y_t)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 1132, in fit
Lars.fit(self, X, y)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 671, in fit
return_n_iter=True, positive=self.positive)
File "/home/bill/anaconda3/lib/python3.5/site-packages/sklearn/linear_model/least_angle.py", line 260, in lars_path
sign_active[n_active] = np.sign(C
)
ValueError: cannot convert float NaN to integer

should not occur due to safe operator outputs

stall effects

track stalling in runs and act based on them.
stalling occurs when the engineered features are no longer improving either 1) the ML model CV score or 2) the median fitness of the features themselves.
if stalling occurs, there should be options to

  • exit
  • modify search to capture a different part of the search space. this could be achieved by increasing complexity of the features, increasing variation steps, or lowering selection presure.

cythonize evaluation routine

write evaluation routine for features in c++ with eigen and interface with main codebase via Cython. Include distutils changes to support package distribution.

few.model() and few.print_model()

Hello!

Thanks for sharing your work, this is really cool!

I was wondering if you could provide a bit of explanation as to the difference between these two outputs of the algorithm.

Also, is there any (outside) documentation on all this?

Thanks in advance!

Kind regards,
Theodore.

implement 3-fold cross validation for internal updating of best model

currently the training data is split into training and validation sets and the best model is updated when a model with a higher validation score is found. we could simplify quite a bit and have a more robust validation measure by removing train_test_split and the associated numpy arrays / fitting predicting code with a direct call to cross_val_score(self.ml,features,labels,cv=3) or cross_val_score(self.ml,self.X[self.valid_loc(),:].transpose(),labels,cv=3).

low GPU utilization with tf option

I'm getting low utilization of the GPU using the tensorflow evaluation strategy. There are a few things to try:

  • use this method to profile tensorflow and see where the inefficiencies lay.

  • according to this, using feed_dict is not a good idea. need to look into using pipelines or variables for feeding input data to the graphs.

add encoding operators for GWAS

add operators that re-encode input SNPs based on different encodings. include (add, dom, rec, het, sub-add, super-add). Need to resolve how underlying data would be represented; maybe assume the input is additive?

normalize feature transformations

normalize feature transformations automatically before feeding them into the ML fit method. store the transformer so that it can be used in prediction/transformation as well.

random numbers seed not working?

Greetings!

I have the following code:

feats_gen = FEW(
                ml=DecisionTreeClassifier(random_state=10, max_depth=None, min_samples_leaf=5), 
                population_size=100, tourn_size=2,                 
                mutation_rate=0.5, crossover_rate=0.5, 
                sel='epsilon_lexicase',   
                clean=True,                
                mdr=True, boolean=True, 
                random_state=10, verbosity=1, 
                scoring_function=roc_auc_score, 
                max_depth=10, min_depth=1, max_depth_init=1, 
                classification=True, 
                generations=50, max_stall=None, 
                names=list(X_train.select_dtypes(include=[np.number]).columns))

feats_gen.fit(X_train.select_dtypes(include=[np.number]).values, 
              y_train.astype(int).values)

test_ = preprocessing_pipeline.transform(e.test)

X_test = test_.X
y_test = test_[test_.target_name].astype(int)

roc_auc_score(y_test, feats_gen._best_estimator.predict_proba(feats_gen.transform(X_test.select_dtypes(include=[np.number]).values))[:, 1])

Everytime I run this code, I get different ROC AUC values in both training and test. I'm pretty sure preprocessing_pipeline is deterministic.

feat vs few?

Greetings!

I would like to know if there is any practical difference between the two projects. I'm asking this because testing feat would require a lot more effort than few and, as such, I need to know if it is worth it.

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.