Giter Site home page Giter Site logo

ccit's People

Contributors

rajatsen91 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ccit's Issues

Bootstrap data is not being used

In the function XGBOUT2, you have the following code:

num_samp = len(all_samples)
if bootstrap:
    np.random.seed()
    random.seed()
    I = np.random.choice(num_samp, size=num_samp, replace=True)
    samples = all_samples[I, :]
else:
    samples = all_samples
Xtrain, Ytrain, Xtest, Ytest, CI_data = CI_sampler_conditional_kNN(
    all_samples[:, Xcoords],
    all_samples[:, Ycoords],
    all_samples[:, Zcoords],
    train_samp,
    k,
)

You create the variable samples when bootstrap is True, but when you call the CI_sampler_conditional_kNN function, you use the variable all_samples. In my understanding, you should use the variable samples in this case. Am I right?

BTW, this is an excellent paper!

Add versions to package requirements in setup.py

I tried running CCIT on a clean virtualenv. While the installation works fine, there are numerous sklearn and XGBoost based errors which are mostly due to depreciation. Even the example in the README doesn't work.

Would it be possible to specify the exact versions of the requirements for CCIT? I believe that would solve most of the problems and make CCIT future-proof.

For example, in setup.py we have:

install_requires=[
          'markdown',
          'xgboost',
          'pandas',
          'numpy',
          'scikit-learn',
          'scipy',
          'matplotlib'
      ],

So the correct/working package versions of these 7 packages is needed. In fact, I think only xgboost and sklearn's correct package number should solve it.

Some question about the distribution of acc1-acc2

Hey there, CCIT contributors,
From line 389 in CCIT.py file, I think you believe that acc1-acc2 obeys the normal distribution N(0, 2\sigma(acc2)^2) where \sigma(acc2) is the standard variance of acc2. I think this is right, too. But based on this thought, there are two inconsistent points in the other part of the codes:

  1. In line 373, only "s2 = np.std(cleaned, axis = 0, doff = 1)[4]" is the sample variance, the unbiased estimator of \sigma(acc2) (the standard variance of acc2). "np.std(cleaned, axis = 0)[4]" is the population standard variance which is not the unbiased estimator of \sigma(acc2).
  2. In line 391, when bootstrap == False, why the standard variance is np.sqrt(2) * 1/np.sqrt(ntot) (np.sqrt(2) is multiplied in function "pvalue", line 325)? I think it should be np.sqrt(2) * np.sqrt(acc2 * (1-acc2)/ntot) since acc2 obeys the distribution N(acc2, acc2*(1-acc2)/ntot) (acc2 follows the normal distribution since it is generated from a Binomial Distribution where y_pret == y_test)

BTY, I appreciate your paper Model-powered Conditional Independence Test. It is great!

DeprecationWarning: The truth value of an empty array is ambiguous.

Hey there, CCIT contributors,

I tryed the following commands.

data = DataGen.generate_samples_cos(dx=1,dy=1,dz=20,sType='NI')  #non-CI dataset, pvalue should be low

X = data[:,0:1]
Y = data[:,1:2]
Z = data[:,2::]

pvalue = CCIT.CCIT(X,Y,Z) 

And received the following warning.

C:\Users\XXX\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use array.size > 0 to check that an array is not empty.
if diff:

The output pvalue is 0.014717305527501225, which looks fine. So is everything alright with my installation? Could you kindly let me know if there is anything I can do to with the warning?

Thank you,
gogotrace

Possible error in the implementation compared to paper?

Hi,

Cool paper and thanks for uploading the code. Really interesting concepts put forward.

I perused the code and noticed that there is possibly a mismatch between what the implementation and the proposed algorithm do?

Specifically,

CCIT/CCIT/CCIT.py

Lines 97 to 103 in 0b9dce9

nbrs = NearestNeighbors(n_neighbors=k + 1, algorithm="ball_tree", metric="l2").fit(
Z
)
distances, indices = nbrs.kneighbors(Z)
for i in range(len(train_2)):
index = indices[i, k]
Yprime[i, :] = Y[index, :]

builds a NearestNeighbor tree search using all of the "training data" or what is called (U = U_1 \cup U_2) in Algorithm 1 of the paper. Then it queries the nearest neighbor inside the entire U dataset again.

Instead Algorithm 1 proposes that one should be construct the NearestNeighbor tree on a distinct set, U_2 dataset, and then query elements in U_1, swapping y with y' for close by z' and z. @rajatsen91 is this an issue?

Import Error

Import of CCIT in Jupyter (after installing with pip install ccit) throws an error:

ModuleNotFoundError: No module named 'DataGen'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.