The blmm from tommaullin

Consistent filenames

The filenames in the lib folder should be consistent with one another. i.e. all should have the blmm_ prefix or none.

Dimension of `sigma2` and `dldsigma2`

Currently, all variables in the code are assumed to have 3 dimensions (i.e. [numvox,numrows,numcols]), apart from sigma2 and dldsigma2. It would be nice at some point to make these consistent by reshaping them to [numvox,1,1] throughout the code. This is, however, a low priority.

generalizing submission engine to non-fMRIB clusters

Right now, the beauty of the design cannot be enabled for non-fMRIB clusters. Fortunately, the scaled design enables some straightforward (though time consuming) revisions to ensure its ready.

Currently the submission enginge for batch_cluster.py relies on a specific configuration for the HPC system. I will revise the code in a separate fork to enable the system to generalize to SLURM clusters and without specific configurations. The code revisions will be scaled to be backwards compatible and work with prior builds.

Maximum iteration limit

The code should allow users to specify a maximum number of iterations if they wish.

Dask for BLMM compare

The BLMM compare script is currently in bash and not yet in dask.

Update to provide SLURM hooks

The BDI BMRC system is now using SLURM and has just wound down it's SGE system. Please revise to use SLURM (probably doesn't hurt to leave the SGE hooks commented out) and test to ensure it works with the new system.

PeLS

All PLS code should be renamed to PeLS to highlight it is not referring to some form of partial least squares methodology.

Add inference functions for 2D

The following 3D functions should have 2D counterparts:

And corresponding tests.

Updated Tests

The BLMM package should have an updated test suite resembling that of BLM, based on the SpeedUpdateWithSim branch.

Git ignore file

Potential computational speedups

It may be worth exploring the speedups obtained by:

Replacing expressions of the form np.sum(X**2) with np.linalg.norm(X)**2
Replacing pinvs with svds and simplifying all expressions whereever possible.

Warn user if matrices are incompatible

A warning should be put in place in case X is incompatible with L.

Extra Big mode

There should be an extra big mode which allows for parallelization over voxels even at the batch stage. This is for the case where the transfer cost is too high.

Inverse overflow

To help with overflow, where possible all matrix inversions should be replaced with pinv or if possible solve. It may be useful to add overflow checks but should be weighed against computation time costs.

Speed improvements with memory mapping

Currently, when working with large arrays, the BLMM code utilizes the numpy memory map. This is extremely fast at first but as more memory maps are read in and out, the code appears to slow down, despite said memory maps being removed from memory and flushed. This appears to be a common problem on stack overflow and, as a result, perhaps alternative packages should be used.

I have tried h5py but found it's performance was worse than the numpy memory map. As all of these objects are built on the python mmap object, it may be best to wait until better support exists for this before trying out any other packages.

This issue is however a low priority, as the "slowing down" mentioned above is only observed for extremely large designs (likely much larger than the average user would ever want).

FS methods main script

All Fisher Scoring methods should be combined into one central script, blmm_paramEstimation.py perhaps.

Implement setup for toolbox installation

Summary

In order to implement CI testing we need to be able to install the library with pip even if it is not ready to run as a whole package

Next Steps

Restructure the directory so it follows the standard toolbox organization
Add setup and all the necessary files for pip install

Accounting for sparsity

Currently, the BLMM code does not utilize the sparsity of a LMM. The primary reason for this is that, unfortunately, in Python, there is not a strong ecosystem for sparse matrix operations. The only sparse solve implementation is available in the package cvxopt and the only package working towards broadcasting sparse matrix operations is sparse. However, it is early days for sparse.

In future, as the python ecosytem develops, it would be good for BLMM to account for the sparse nature of the LMM. At time of writing, however, this does not seem possible.

`nparams` renaming

Technically, the random effects are not parameters so the variable name nparams is misleading. It should be renamed to nranfx throughout the code to signify "number of random effects".

REML

REML should be possible as it allows unbiased variance estimation. However, as the 3D code is designed for high n, this is a low priority.

Broaden imaging interface to include CIFTIs/GIFTIs

Currently, the pipeline will operate within the volumetric sphere. The interface for MRI data is largely volumetric. I will add options to recognize CIFTIs and integrate within the pipeline.

This will enable BLMM to leverage HCP-style pipelines -- which would make it one of the best possible options for ABCC/HBCD for linear regression.

Do unit-test with GitHub actions

Summary

After #55 we can try the first unittest in github actions even if they don't run

Next Steps

add github actions unittest
add github actions for lintin

Speeding up computation for multiple levels

When a random factor contains many multiple levels, the code currently slows due to the for loops used in the calculation of the covariance of dl/dDk and dl/dsigma, covariance of dl/dDk1 and dl/Dk2 and dl/dD itself. This can be resolved using matrix reshapes.

2D and 3D SFS and FS

The 2D and 3D versions of SFS and FS seem to give different results occasionally. This should be checked over.

Design rank check

All voxels with designs (or product matrices X'X, Z'Z) which have rank less than p should be discarded before parameter estimation.

Gibbs Sampling

At some point it would be nice to scale up and move over the Gibbs Sampler code from the notebooks for comparison purposes. This is, however, a low priority currently.

Voxel block partition

The code should partion NIFTIs by voxels as well in the case that large designs are specified. This should, at least, exist as a backdoor option.

Sattherthwaite Degrees of Freedom and inference

Currently the Sattherthwaite degrees of freedom, as well as inference techniques in general, are implemented but not integrated with the rest of the code. F statistics are also currently not implemented.

Computational gain for the one factor model

For the one factor model, large gains can be made by considering that Z'Z is block diagonal. Ideally it would be nice to account for this and implement speedier computation of D(I+Z'ZD)^(-1) for this use case.

Testing

The notebooks currently contain a lot of code for unit testing that does not exist in the repo. This should all be moved over.

Overflow in `llh3D`

Very occasionally the function llh3D experiences overflow errors in it's calculation of the determinant. This is likely easily fixed by using logdet instead of det.

Notebooks

At some point it would be good to create a branch and/or folder containing all the google colab notebooks.

Simulations

The simulations are currently spread all across the repo and have little documentation. They should all appear in the sim folder and be well documented.

Optional output

The volume blmm_resms.nii should be an optional output.

File locking system for batch outputs

Currently each batch job outputs to a seperate file. Ideally they should all add their contributions to the same file in order to reduce the storage cost. A file locking system would have to be added and the functions memorySafeAtB and in blmm_batch.py and memorySafeReadAndSumAtB in blmm_concat.py would have to be modified accordingly.

Overflow in FS and SFS

There appears to be some overflow in the FS and SFS methods, the cause of which is unclear. This is not a crucially important issue as the recommended method is pSFS anyway; but it would be nice to understand what is causing this overflow.

Use Dask

Ideally, it would be good to remove the dependency of BLMM on FSL. The only reason this dependency exists is that BLMM uses fsl_sub and the fslpython environment for job submission. If instead Dask could be used for job submission, this issue would be averted and the dependency would be removed. It would also remove the need for any shell scripts, and make the code much more portable. Currently, however, this is a low priority.

Record cause of voxel dropout

The code should also output a blmm_vox_dropout map which encodes what has caused voxels to drop out of an analysis. For example, if a voxel has;

value 1: It was removed by masks a user specified.
value 2: It was removed by a threshold the user specified.
value 3: It was removed because the design was too low rank.
value 4: It was removed as parameter estimation reached the maximum iteration limit.
value 0: It was not removed.

Pip release

The BLMM package is not currently pip released.

tommaullin / blmm Goto Github PK

blmm's People

Contributors

Stargazers

Watchers

Forkers

blmm's Issues

Summary

Next Steps

Summary

Next Steps

Recommend Projects

Recommend Topics

Recommend Org