Giter Site home page Giter Site logo

nimgen's People

Contributors

dependabot[bot] avatar fraimondo avatar kaurao avatar lesasse avatar tyasird avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar

nimgen's Issues

update readme

When I have a bit of time, readme should be updated (after merging the significant changes that come with the new command line tool and the pipelining module)

D-CCA integration

integrate D-CCA from this paper for comparison of domain general and domain specific gene expression gradients.

Tests and CI

There are no tests as of yet. I want to add at least a few basic unit tests. Not sure how easy this will be, in particular the AHBA infrastructure, but at least basic functions should be tested. I will try for now to bring coverage to something like 60 to 70% and add some basic CI infrastructure using pytest, tox and github actions.

[ENH]: Update/configure the logger

Which feature do you want to include?

At the moment it does not work as well as I'd like, but takes time I dont have currently.

How do you imagine this integrated in nimgen?

Later. Better.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[ENH]: Add MIC and TIC statistics

Which feature do you want to include?

For now, we facilitate use of Pearson and Spearman correlation for the mass-univariate correlation analysis between markers and gene expression. We also want to include TIC and MIC from the paper here:

https://academic.oup.com/gigascience/article/7/4/giy032/4958979?login=false

How do you imagine this integrated in nimgen?

make a new correlation function that can be dropped in

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[BUG]: Smashed parcellation applied to both marker and gene expression data

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The parcellation is smashed, and the smashed parcellation is applied to both marker and gene expression data in the surrogate analysis for empirical p-value calculation.

Expected Behavior

The smashed parcellation should only be applied to the marker, not the gene expression data, which should use the real parcellation.

Steps To Reproduce

Run the pipeline.

Relevant log output

No response

Anything else?

No response

Output structure/format

There are a a few things that need adjusting for the output format:

  1. dataframes should be saved as tsv's with the correct extension, with index column and header row
  2. correlation matrices should be saved also for significant gene lists after partial correlation with gene expression PC's as covariates
  3. output should also contain correlation matrices constructed using a random list of genes that is as long as the significant gene list
  4. overall, i will have review naming of files and folders

[ENH]: Run correlation methods and gene enrichment analysis for different alpha levels in sequence

Which feature do you want to include?

The lists provided by these fields

correlation_method:
    - spearman
alpha:
    - 0.05

are run in parallel. Overall, they share a lot of the same data fetching and processing steps, so will likely be better/more efficient to simply run them in sequence.

How do you imagine this integrated in nimgen?

Each step can read in the configuration file as an argument and then run the pipeline in a for loop over the product of these lists.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

Add more customisation options for null map generation.

Which feature do you want to include?

Currently most parameters for null map generation are hard coded to some implicit defaults in smash.py. I would like to add some fields to the configuration file to allow for customisation of this process using the neuromaps API (within some constraints regarding volumetric data and MNI space.)

How do you imagine this integrated in nimgen?

Add fields to configuration file and improve smash.py

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

[ENH]: Outputs based on all genes should not be saved in marker specific directory

Which feature do you want to include?

Outputs based on all genes should not be saved in marker specific directory, since they typically take a lot of disk space and this way we save duplicates in different marker specific directories which is inefficient.

How do you imagine this integrated in nimgen?

Save it in a top level/general directory for all genes.

Do you have a sample code that implements this outside of nimgen?

No response

Anything else to say?

No response

Type hints

I want to add type hints everywhere as soon as I have the time for it.

Examples

Once I merge the command line tool, I will need to create some examples how to run a pipeline using a yaml file. I also need to create some examples on how to directly use some of the functions from the toolbox.

Pipelining/Command Line Tool

At the moment, nimgen consists mainly of two modules: "expressions" (for loading and processing gene expression data and performing statistical analyses with some parcellated marker) and "smash" (for creation of surrogate nii parcellation files used to perform n correlation analyses with a "smashed" brain and obtain empirical p-values). However, both of these modules take care of quite a bit of "pipelining" i.e. saving outputs etc, whereas other pipelining tasks are left to the user.

I think I would like to cleanly separate the purely analysis based modules from the pipelining. The idea is to have an additional "pipeline" module where individual jobs are defined based on the scheduling system (i.e. HTCondor/juseless, slurm/jureca, multiprocessing/pc etc.) in individual submodules.

Ideally I would then like to be able to configure pipelines in a yaml file (i.e. inputs, outputs, analysis options, etc.) and run it using nimgen as a command line tool, i.e.:

nimgen run config.yaml

or something along those lines. Nimgen can create submit files and submit them then, and perhaps prompt me if I am sure I want to submit this with x amount of jobs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.