machenslab / dpca Goto Github PK
View Code? Open in Web Editor NEWAn implementation of demixed Principal Component Analysis (a supervised linear dimensionality reduction technique)
License: MIT License
An implementation of demixed Principal Component Analysis (a supervised linear dimensionality reduction technique)
License: MIT License
The order of the decision seems to be reversed when plotting.
One suspects this as making the firing rates for a particular decision be 0 between two time points results in components that show the abrupt jump in value for the opposite decision
To reproduce this, simply add a new line in dpca_demo.m after Line 63 with firingRates(:, 1, 1, 10:15, :) = 0;
the resulting plots will show that the decision 2 and stimulus 1 line show a jump between time 10 and 15 (it should have been decision 1 stimulus 1 that shows this behaviour)
I am running a fit on my neural data. I am wondering how can I access the noise component Xnoise as it is described in the paper of the fit?
I am trying to install using: pip install dPCA but I get the following error
...
Collecting dPCA
Downloading https://files.pythonhosted.org/packages/b1/e0/6a0b83a5c8f5f23bd0e77d48fe5dc63558c34852d87e5bd1caef91951be9/dPCA-0.1.tar.gz (117kB)
100% |████████████████████████████████| 122kB 1.6MB/s
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-2bcojpbi/dPCA/setup.py", line 2, in
from Cython.Build import cythonize
ModuleNotFoundError: No module named 'Cython'
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2bcojpbi/dPCA/
----------------------------------------
If I install the dependencies manually,
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-tjy39pjk/dPCA/setup.py", line 16, in
ext_module = cythonize("dPCA/nan_shuffle.pyx")
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 966, in cythonize
aliases=aliases)
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 810, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "/home/edgar/.local/lib/python3.6/site-packages/Cython/Build/Dependencies.py", line 109, in nonempty
raise ValueError(error_msg)
ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-tjy39pjk/dPCA/
----------------------------------------
Has anybody solved this? Thanks!
I am potentially interested in trying to extend this code for research purposes. This might involve porting some of this code to Julia, since I have been using that for some of my other projects. I also may play around with the MATLAB code a bit (specifically, I'd like to modify the dpca_marginalize
function).
Can I recommend that you add a permissive license (e.g. MIT) to this repo so that others can repurpose your code (under the condition that they give proper attribution/recognition to your work)?
Hello,
I just read your elife paper and your Matlab code. My understanding is that dPCA needs every neuron is recorded in all trials, but as I know, the Olfactory categorization task of Kepecs and his colleagues in 2008 used electrodes which could advance every day. In that way, each time the electrodes advanced, they recorded different neurons. So I think the analysis should only limit in every single day and repeat the analysis for every day, am I correct?
The second question is, I don't understand the way of "re-stretching", I know the first and second step, set the Ti and deltaT, but then, how it stretches?
Thank you very much.
I've tried installing dPCA on two Windows 10 x64 computers and run into the same error message when building 'dPCA.nan_shuffle' extension.
Python version is 2.7.14 64-bit. It is an Anaconda distribution.
I tried installing using python setup.py install
after confirming that all the required packages have been installed.
When computing significance, we found that the python code to test significance did not generalise to cases where a second task parameter (such as decision) was added. I have created a notebook that replicates this error for random data, as in your demo. See:
https://github.com/vdplasthijs/dPCA/blob/master/dPCA_significance_test.ipynb
Here, one can either set D=1 or D=2, the latter resulting in the error when dpca.significance_analysis() is called.
While (I think) I was able to resolve the initial error (please see just added pull request), I still think additional changes are needed, because although no errors are raised, the decision component is (unexpectedly) not deemed significant (see final fig of notebook for D=2, when using the updated dPCA code).
Thank you!
In line 121 of dpca_plot_default.m:
data = permute(data, [dims(end) dims(1:end-1)]);
Should I think be:
data = permute(data, [numel(dims) 1:numel(dims)-1]);
Hi,
I'm trying to run dPCA on this dataset. I suppose that my problem stems either from the shape of the input array or from my erroneous understanding of labels. The shape of the input array is (127, 2, 2, 6, 6). So I've got 127 average cell firing rates for trials with 4 variables having 2, 2, 6 and 6 levels. Is the shape correct? The labels defined below don't seem to cut it. Thanks a bunch!
from dPCA import dPCA
labels = ['cdfo'] # choice, decision, spatial frequency, orientation
dpca = dPCA.dPCA(labels, 5, regularizer = "auto")
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)
UPD: the same error is thrown when no labels are specified:
dpca = dPCA.dPCA()
demixed_mouse1 = dpca.fit_transform(dPCADataMouse1T5)
Hi. Just to make sure whether it's normal to be much less than 1 (about 0.5) when having all the explained_variance_ratio_ added up in dPCA (python)? if not, where should I make a mistake?
Hi!
I was trying to use the dPCA code to run a DPCA fit on some data, [ N x T x S ] where N is neurons, T is time, S is stimulus conditions. I ran the analysis previously in MATLAB and combined parameters related to stimulus/stimulus-time interactions using:
combinedParams = { { 1, [1 2] }, {2} };
To do the same in the Python code, I followed the example in the dPCA.py code docstring for the dPCA function to set:
dpca = dPCA.dPCA( labels = 'st' )
dpca.join = { 'st': [ 's', 'st' ] }
dpca.protect = ['t']
When I tried to run fit_transform
I ran into an error:
Z = dpca.fit_transform( mean_data_for_dpca )
File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 170, in fit_transform
return self.transform(X)
File "/snel/home/lwimala/bin/dPCA/python/dPCA/dPCA.py", line 963, in transform
X_transformed[key] = np.dot(self.D[key].T, X.reshape((X.shape[0],-1))).reshape((self.D[key].shape[1],) + X.shape[1:])
KeyError: 's'
I looked into the error a little bit and figured out that the issue had to do with the self.marginalizations
not being updated to join the combined parameters during the _marginalize
function. To fix the bug, I hacked together this quick fix that added handling of the marginalizations within the _marginalize
function (I used code from the get_parameter_combinations
function to handle updating the marignalizations properly).
Added to the end of _marginalize
in dPCA.py (Line 318):
# recompute self.marginalizations (needed if performing regularization optimization)
self.marginalizations = self._get_parameter_combinations(join=False)
# handle updating marginalization names if join is passed (taken from 'get_parameter_combinations' function)
if isinstance(self.join,dict):
for key, combs in self.join.items():
tmp = [self.marginalizations[comb] for comb in combs]
for comb in combs:
del self.marginalizations[comb]
self.marginalizations[key] = tmp
This fix enabled me to compute the dPCA transformation for my data and replicate my results that I found using the MATLAB package from your code distribution. Just wanted to raise an issue since this wasn't posted before. Thanks!
The new solution gives this error"[Errno 2] No such file or directory:
'/content/dPCA/python/'/Users/addison/Desktop/Online_Course/Neuromatch/Users/addison/Desktop/Online_Course/Neuromatch python: can't open file '/content/dPCA/python/setup.py': [Errno 2] No such file or directory"
I just installed the dPCA python package under Python 3.6 (Anaconda Python distribution). It took quite a bit of time to work out how so I though I would share how I got it to work in case other people have the same problems.
Trying to install using PIP I got an error ValueError: 'dPCA/nan_shuffle.pyx' doesn't match any files
Trying to install using python setup.py install
I got an error: error: Unable to find vcvarsall.bat
This was due to Cython not being able to find a C++ compiler. Following instructions here it created a file Anaconda\Lib\distutils\distutils.cfg
containing:
[build]
compiler=mingw32
[build_ext]
compiler = mingw32
However I still could not install with either PIP or setup.py, getting an error:
File "C:\Users\takam\Anaconda3\lib\distutils\cygwinccompiler.py", line 126, in
__init__
if self.ld_version >= "2.10.90":
TypeError: '>=' not supported between instances of 'NoneType' and 'str'
Using the Microsoft Visual Studio 2015 build tools as the compiler rather than mingw32 fixed this problem. I downloaded and installed the tools from here and edited the file Anaconda\Lib\distutils\distutils.cfg
to read:
[build]
compiler=msvc
[build_ext]
compiler = msvc
I could then install dPCA from the setup.py file with: python setup.py install
So I am implementing this on a Neurons * Time point * Category matrix. Below is some description of my inputs:
% size(binnedDataStack) = [276 44 8];
margNames = {'Time', 'Category'};
combinedParams = {{1, [1 2]}, {2, [2 1]}};
Running the following...
[W, V, whichMarg] = dpca(binnedDataStack, 15);
whichMarg has 3's present, despite me only defining 2 marginalizations. This leads to errors when a 3rd marginalization is being looked for in margNames. Am I using this correctly, and if so, what should I change?
Apologies if this is a misunderstanding on my part, but I think the example is slightly wrong here?
join : None or dict
...
e.g. if we are only interested in the time-modulated stimulus components.
In this case, we would pass {'ts' : ['t','ts']}.
Shouldn't we pass {'ts' : ['s','ts']}
instead?
I am wondering if I can use the code to merge effects and fit on the combined marginalization. Similar to what the paper describes describing a marginalization main effect (sensory) and its interaction with time in a single component.
I have been trying to use the join parameter but have stumbled on a couple of issues. Also I am not sure if the fact that the _marginalize method calls the parameter generator method with join=False by default
Hi,
How do we get the encoder and decoder matrices ?
Thank you
Is there a straightforward way to replicate the MATLAB implementation's 'combinedParams'
behavior using the Python dPCA
code? I would like to do a grouping similar to the stimulus, decision, interaction, and time grouping shown in the MATLAB demo.
For an example stimulus-group, would I simply add the s
and the st
components to get the "first stimulus-group component", and also add the explained variances?
(I know this project is not under maintenance anymore, so I can also use the MATLAB version if that is easier.)
It would be great to add titles to all the 10 figures produced in spca_demo.m.
Code works great, but it seems that the dPCA.components_ method was never implemented, although I think that dPCA.D contains exactly what components should?
It seems like explained_variance_ratio_, mean_, n_components_, noise_variance_ are not assigned/set in the python version of the code at all, although mentioned in the docs.
Right after extracting the repository and trying to run dpca_demo.m I ran into
`Elapsed time is 0.266575 seconds.
Iteration #1 out of 2.......................... [1 s]
Iteration #2 out of 2.......................... [1 s]
Repetition # 1 out of 5... [0 s]
Repetition # 2 out of 5... [0 s]
Repetition # 3 out of 5... [0 s]
Repetition # 4 out of 5... [0 s]
Repetition # 5 out of 5... [0 s]
Cell contents reference from a non-cell array object.
Error in dpca_classificationPlot (line 91)
title([options.marginalizationNames{i} ' # ' num2str(i)])
Error in dpca_demo (line 235)
dpca_classificationPlot(accuracy, [], [], [], decodingClasses)`
The doctstring for _randomized_dpca says that it returns P, encoding matrices used to transform data, and D, decoding matrices used to inverse transform data.
However, the actual implementation of transform(X)
uses D and the implementation of inverse_transform
uses P.
I'm a little confused about whether the dpca.m used 're-balancing' or not. I have thought that the dpca.m didn't use 're-balancing'(the default setting is to accept 'balanced' data), while the input of dpca function is "Xfull" with D+1 dimensions, just as 'firingRatesAverage'. According to the paper on eLife(Kobak, 2016) , if you want to 're-balance' the data, replace X with
I'm new to dPCA and MATLAB, and sorry about my poor English. Hope I make my question clear. Thanks a lot!
Hello,
I have been applying dPCA in matlab and now I am trying to do the same in Python (in order to learn). When I am trying to do the significant analysis I gt the following error:
Compute score of shuffled data: 0 / 100 Traceback (most recent call last):
File "", line 1, in
significance_masks = dpca.significance_analysis(Matr_CondT, firingRates, n_shuffles=100, n_splits=10, n_consecutive=10)
File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 875, in significance_analysis
self.shuffle_labels(trialX)
File "C:\Users\amengual\anaconda3\lib\site-packages\dPCA\dPCA.py", line 735, in shuffle_labels
nan_shuffle.shuffle2D(trialX)
File "dPCA\nan_shuffle.pyx", line 9, in dPCA.nan_shuffle.shuffle2D
ValueError: Buffer dtype mismatch, expected 'long' but got 'long long'
I am sorry in advance, I am still learning the language, and any help on this would be really appreciated.
Thanks a lot in advance
python version works fine, but don't have Matlab and need to use dSCA (Demixed shared component analysis) which is build on dpca Matlab version.
so I tried install octave and octave-statistics on colab,
add the following command in dpca_demo.m :
'''
pkg load statistics
cd /content/dPCA/matlab
'''
delete minimal plot and computing explained variance function in step one (they will encounter error)
but still get error in :
%% Step 2: PCA in each marginalization separately
dpca_perMarginalization(firingRatesAverage, @dpca_plot_default, ...
'combinedParams', combinedParams);
error: 'containers' undefined near line 86 column 20
error: called from
dpca_marginalize at line 86 column 18
dpca_perMarginalization at line 71 column 18
is it possible to run in octave?
I know the intention is for the user to write their own plotting function, but I came across a small fix that could make dpca_plot_default more generalizable. dpca_plot_default assumes the time dimension is in the data's 4th dimension. A simple fix would be to change this line to:
time = 1:size(data,ndims(data));
My trialX
is slightly unbalanced, so I followed the instructions If different combinations of features have different number of trials, then set n_samples to the maximum number of trials and fill unoccupied dat points with NaN.
, but this results in a ValueError: array must not contain infs or NaNs
.
Full traceback:
` File "/home/pietro/pythonprojects/starecase/DemixedPCA/my_dPCA.py", line 113, in
significance_masks = dpca.significance_analysis(trial_average_data,single_trial_data,axis='t',n_shuffles=10,n_splits=10,n_consecutive=10)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 864, in significance_analysis true_score = compute_mean_score(X,trialX,n_splits)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 821, in compute_mean_score trainZ = self.fit_transform(trainX)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 168, in fit_transform
self._fit(X,trialX=trialX)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 570, in _fit
self.P, self.D = self._randomized_dpca(regX,regmXs,pinvX=pregX)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/dPCA/dPCA.py", line 472, in _randomized_dpca
U,s,V = randomized_svd(np.dot(C,rX),n_components=self.n_components,n_iter=self.n_iter,random_state=np.random.randint(10e5))
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 364, in randomized_svd
power_iteration_normalizer, random_state)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/sklearn/utils/extmath.py", line 266, in randomized_range_finder
Q, _ = linalg.qr(safe_sparse_dot(A, Q), mode='economic')
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/scipy/linalg/decomp_qr.py", line 126, in qr
a1 = numpy.asarray_chkfinite(a)
File "/home/pietro/Envs/basic3/lib/python3.5/site-packages/numpy/lib/function_base.py", line 1215, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs`
Dear Dmitry,
thanks for the code and the elife paper on dPCA. I am trying to get started with it for my data set, however I struggle in understanding how to generate the firingRates matrix (despite looking through the demo)
Lets say I have:
N= 30 neurons;
T= A trial length of 60 data points
maxTrialNum= 135.
So e.g. this is represented as a 3D 30x60x135 matrix already in my data structure.
And then I have two different conditions (S) and two different Decisions (D) in my behaviour which are represented as two different 1D 135x1 arrays (with 0 and 1 for right/left stimuli/decisions).
I struggle wrapping my head around how to produce from this your
firingRates: N x S x D x T x maxTrialNum
data structure to get started with the dPCA (should it end up being a 30x2x2x60x135 array?). I guess my head stops working at 3 dimensions.
Anyhow thanks a lot for the tool and have a nice weekend,
Eduardo
Hi,
I constructed the initial neural activity matrix (dF_dff) size (N, S, D, T, E):
N = number of neurons
S = number of stimuli
D = number of decision
T = time each trial
E = max number of trial per condition
In my case dF_dff shape is (1307, 2, 2, 83, 43), and I take the average on the first dimension (neurons), obtaining dF_dff_average, shape (2, 2, 83, 43).
To run the dPCA, I am doing the following thing:
label = 'tsd'
join = [{'s': ['s', 'ts']}, {'d': ['d', 'td']}, {'t'}, {'sd': ['tsd']}]
dpca = dPCA.dPCA(labels=label, join=join, n_components=2, regularizer='auto')
dpca.protect = ['t']
Z = dpca.fit_transform(dF_dff_average, dF_dff)
But this doesn't work. I get an index error:
IndexError: index 43 is out of bounds for axis 3 with size 43
Maybe I am doing something wrong. Hopefully you can help me figuring out what's the problem
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.