juaml / nimgen Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 4.0 3.1 MB

License: GNU Affero General Public License v3.0

Python 98.25% R 1.75%

nimgen's People

Contributors

Stargazers

Watchers

Forkers

omidvarnia tyasird lesasse stnoronha

nimgen's Issues

update readme

When I have a bit of time, readme should be updated (after merging the significant changes that come with the new command line tool and the pipelining module)

D-CCA integration

integrate D-CCA from this paper for comparison of domain general and domain specific gene expression gradients.

There are no tests as of yet. I want to add at least a few basic unit tests. Not sure how easy this will be, in particular the AHBA infrastructure, but at least basic functions should be tested. I will try for now to bring coverage to something like 60 to 70% and add some basic CI infrastructure using pytest, tox and github actions.

[ENH]: Update/configure the logger

Which feature do you want to include?

At the moment it does not work as well as I'd like, but takes time I dont have currently.

How do you imagine this integrated in nimgen?

Later. Better.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[ENH]: Add MIC and TIC statistics

Which feature do you want to include?

For now, we facilitate use of Pearson and Spearman correlation for the mass-univariate correlation analysis between markers and gene expression. We also want to include TIC and MIC from the paper here:

https://academic.oup.com/gigascience/article/7/4/giy032/4958979?login=false

How do you imagine this integrated in nimgen?

make a new correlation function that can be dropped in

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[BUG]: Smashed parcellation applied to both marker and gene expression data

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

The parcellation is smashed, and the smashed parcellation is applied to both marker and gene expression data in the surrogate analysis for empirical p-value calculation.

Expected Behavior

The smashed parcellation should only be applied to the marker, not the gene expression data, which should use the real parcellation.

Steps To Reproduce

Run the pipeline.

Relevant log output

No response

Anything else?

No response

Output structure/format

There are a a few things that need adjusting for the output format:

dataframes should be saved as tsv's with the correct extension, with index column and header row
correlation matrices should be saved also for significant gene lists after partial correlation with gene expression PC's as covariates
output should also contain correlation matrices constructed using a random list of genes that is as long as the significant gene list
overall, i will have review naming of files and folders

[ENH]: Run correlation methods and gene enrichment analysis for different alpha levels in sequence

Which feature do you want to include?

The lists provided by these fields

correlation_method:
    - spearman
alpha:
    - 0.05

are run in parallel. Overall, they share a lot of the same data fetching and processing steps, so will likely be better/more efficient to simply run them in sequence.

How do you imagine this integrated in nimgen?

Each step can read in the configuration file as an argument and then run the pipeline in a for loop over the product of these lists.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

Issue formats/how to contribute

I should probably provide some templates for issues and some instructions on how people can contribute to the project.

Add more customisation options for null map generation.

Which feature do you want to include?

Currently most parameters for null map generation are hard coded to some implicit defaults in smash.py. I would like to add some fields to the configuration file to allow for customisation of this process using the neuromaps API (within some constraints regarding volumetric data and MNI space.)

How do you imagine this integrated in nimgen?

Add fields to configuration file and improve smash.py

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[ENH]: Outputs based on all genes should not be saved in marker specific directory

Which feature do you want to include?

Outputs based on all genes should not be saved in marker specific directory, since they typically take a lot of disk space and this way we save duplicates in different marker specific directories which is inefficient.

How do you imagine this integrated in nimgen?

Save it in a top level/general directory for all genes.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

Type hints

I want to add type hints everywhere as soon as I have the time for it.

Examples

Once I merge the command line tool, I will need to create some examples how to run a pipeline using a yaml file. I also need to create some examples on how to directly use some of the functions from the toolbox.

Pipelining/Command Line Tool

At the moment, nimgen consists mainly of two modules: "expressions" (for loading and processing gene expression data and performing statistical analyses with some parcellated marker) and "smash" (for creation of surrogate nii parcellation files used to perform n correlation analyses with a "smashed" brain and obtain empirical p-values). However, both of these modules take care of quite a bit of "pipelining" i.e. saving outputs etc, whereas other pipelining tasks are left to the user.

I think I would like to cleanly separate the purely analysis based modules from the pipelining. The idea is to have an additional "pipeline" module where individual jobs are defined based on the scheduling system (i.e. HTCondor/juseless, slurm/jureca, multiprocessing/pc etc.) in individual submodules.

Ideally I would then like to be able to configure pipelines in a yaml file (i.e. inputs, outputs, analysis options, etc.) and run it using nimgen as a command line tool, i.e.:

nimgen run config.yaml

or something along those lines. Nimgen can create submit files and submit them then, and perhaps prompt me if I am sure I want to submit this with x amount of jobs.

juaml / nimgen Goto Github PK

nimgen's People

Contributors

Stargazers

Watchers

Forkers

nimgen's Issues

Which feature do you want to include?

How do you imagine this integrated in nimgen?

Do you have a sample code that implements this outside of nimgen?

Anything else to say?

Which feature do you want to include?

How do you imagine this integrated in nimgen?

Do you have a sample code that implements this outside of nimgen?

Anything else to say?

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Anything else?

Which feature do you want to include?

How do you imagine this integrated in nimgen?

Do you have a sample code that implements this outside of nimgen?

Anything else to say?

Which feature do you want to include?

How do you imagine this integrated in nimgen?

Do you have a sample code that implements this outside of nimgen?

Anything else to say?

Which feature do you want to include?

How do you imagine this integrated in nimgen?

Do you have a sample code that implements this outside of nimgen?

Anything else to say?

Recommend Projects

Recommend Topics

Recommend Org