Giter Site home page Giter Site logo

asafmanela / hurdledmr.jl Goto Github PK

View Code? Open in Web Editor NEW
23.0 23.0 13.0 5.25 MB

Hurdle Distributed Multinomial Regression (HDMR) implemented in Julia

License: Other

R 0.74% Julia 99.26%
count distributed hurdle machine-learning multinomial regression text-analysis text-selection

hurdledmr.jl's People

Contributors

asafmanela avatar jason-xuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hurdledmr.jl's Issues

Cannot load dataset from sotu.jl

Hi Professor Manela:
I'm running your tutorial of HurdleDMR for python in Jupyter Notebook, but in the data loading cell, an error pops out

---------------------------------------------------------------------------
JuliaError                                Traceback (most recent call last)
C:\Users\WILLIA~1\AppData\Local\Temp\15/ipykernel_4324/482917954.py in <module>
----> 1 covarsdf, counts, terms = jl.eval('include("D:/William/sotu.jl")')

C:\ProgramData\Anaconda3\lib\site-packages\julia\core.py in eval(self, src)
    603         if src is None:
    604             return None
--> 605         ans = self._call(src)
    606         if not ans:
    607             return None

C:\ProgramData\Anaconda3\lib\site-packages\julia\core.py in _call(self, src)
    536         # logger.debug("_call(%s)", src)
    537         ans = self.api.jl_eval_string(src.encode('utf-8'))
--> 538         self.check_exception(src)
    539 
    540         return ans

C:\ProgramData\Anaconda3\lib\site-packages\julia\core.py in check_exception(self, src)
    585         else:
    586             exception = sprint(showerror, self._as_pyobj(res))
--> 587         raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
    588                          .format(exception, src))
    589 

JuliaError: Exception 'LoadError: could not load library "libdSFMT"
The specified module could not be found. 
in expression starting at D:\William\sotu.jl:210' occurred while calling julia code:
include("D:/William/sotu.jl")

I think it maybe something wrong with the code in sotu.jl?

Occasional failures on windows

Fitting a DMR or HDMR on windows sometimes fails when trying to map memory (mmap), with parallel=true and local_cluster=true specified (the default).

A workaround is to specify local_cluster=false or to turn off parallelization altogether with parallel=false.

This problem should go away after #6 is resolved.

OverflowError in windows platform

In julia version 1.3, windows platform, there is an Error while running test case:

PositivePoisson: Error During Test at C:\Users\jason\.julia\dev\HurdleDMR\test\positive_poisson.jl:1
  Got exception outside of a @test
  OverflowError: 22 is too large to look up in the table; consider using `factorial(big(22))` instead

Multithreading

Currently parallelization on a local cluster uses SharedArrays to share memory between distributed cores. This is not super efficient and also sometimes fails on windows.

It would be better to replace it with multithreading, which should be getting more stable in Julia v1.2.

Standardize covars matrix only once

dmr and hdmr calls by default standardize the covars matrix, because they call fit(GammaLassoPath,...) on each column of counts, which standardizes its X (=covars) matrix upon entry by default.
This means we are needlessly repeating this Lasso.standardizeX call multiple times. See relevant part of Lasso.jl

A better solution would:

  1. check for the keyword argument standardize in dmr/hdmr calls
  2. standardize if requested before calling fit(GammaLassoPath,...) with standardize=false, keeping track of Xnorm
  3. multiply coefs by Xnorm as in [Lasso.jl] (https://github.com/JuliaStats/Lasso.jl/blob/55718966db53679e333d8a94749e8722b082796c/src/coordinate_descent.jl#L847). Specifically,
    3.1 If returning DMRCoefs/HDMRCoefs (called with dmr/hdmr), then we only keep the coeffcients, so just multiply these in place by Xnorm
    3.2 If returning DMRPaths/HDMRPaths (called with dmrpaths/hdmrpaths), then we need to modify path.coefs for each path.

Re-organize test code with Jive.jl

Currently, all test code is sequential, and therefore many test code for a specific function depends on the result produced by previous code.
I want to organize each test case and break their dependencies using Jive.jl.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Turn off Lasso during estimation and report fitted likelihood

Hi Professor Manela,

I am wondering whether the existing implementation enables an estimation without lasso. It seems that I could not find the switch in the API. Besides, I also could not find any info on the fitted likelihood. May I know how to obtain it? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.