neurodata / connectocross Goto Github PK

View Code? Open in Web Editor NEW

3.0 4.0 3.0 137.2 MB

Connectocross: statistical characterizations and comparisons of nanoscale connectomes across taxa (A paper in progress)

Home Page: https://docs.neurodata.io/connectocross/

License: MIT License

Python 21.64% Jupyter Notebook 78.36%

connectome

connectocross's Introduction

Connectocross: statistical characterizations and comparisons of nanoscale connectomes across taxa

Datasets

C. elegans male and hermaphrodite, full body


Paper	Link
Data	Link
Raw data location
# nodes	~300
# edges
# synapses
# graphs	2

Notes

has chemical and gap junction graphs
has some single-cell transcriptomics
has cell lineage

C. elegans timeseries, nerve ring


Paper	Link
Data
Raw data location
# nodes	~50 - 150 per graph?
# edges
# synapses
# graphs	8

Notes

time series of graphs (though from different animals)
2 animals at the last timepoint
I have code to pull data

Drosophila larva brain


Paper	not yet available
Data	we have it
Raw data location	CATMAID
# nodes	2971
# edges	~100k
# synapses ~300k
# graphs	1

Notes:

Have incomplete cell lineage
I think Marta's lab has some single cell scRNAseq
Have edge type split by axo, dendrite

Drosophila adult brain chunk (hemibrain)


Paper	Link
Data	Link
Raw data location	neuPrint
# nodes	20 - 25k, 67k more small objects
# edges
# synapses	64M
# graphs	1

Drosophila adult brain sparse (FAFB)


Paper	Link
Data	Link to overview, Link to CATMAID
Raw data location	CATMAID
# nodes
# edges
# synapses
# graphs	1

Platynereis larva full


Paper	Link
Data	not yet available (I think)
Raw data location	CATMAID
# nodes	2728
# edges	11437
# synapses
# graphs	1

MiCRONS

Cionia intestinalis


Paper	Link
Data
# nodes	~200?
# edges
# synapses
# graphs

Simple a priori models

a.k.a. look at the data, more or less

Simplest statistics

Things that we always want to know about a graph. Usually:

Number of nodes
Number of edges
For a connectome, maybe number of actual synapses

Density (ER)

compute the density (p) for each connectome, can simply plot each.

Left/right (SBM/DCSBM)

Test different hypotheses about $\hat{B}$ (see statistical connectomics)
- is it more densely connected within block than between? To what extent?
  - maybe can compare this for many of the connectomes. probably not all
- core-periphery
- etc.

Left/right + any known metadata (SBM/DCSBM)

If any putative cell types are known, use those
now we get a more refined SBM than the above, maybe interesting, maybe not?
- cell type data may not be available for all of the above
can do similar tests, results may or may not be different

General low rank (RDPG)

Scree plots
estimation of rank (ZG2)
not sure that this will be interesting to compare across connectome or not. would have to normalize for the number of nodes somehow, i'd think.

Distribution of weights, degrees

Can just look at distribution of edge weight for each, i guess where weight is number of synapses
in/out degree distribution, marginals and joint, is easy enough to plot.
- again, don't know whether it'll be meaningful to compare across connectome or not

More complicated a priori models

Homotypic affinity

can test for whether cell pairs (or blocks?) are more likely than chance to connect (homotypic affinity)
requires having cell pairs
- probably only maggot and c. elegans

Testing left vs right, quantify correlation, spectral similarity, GM performance, etc.

Testing for gaia's directedness (or just quantifying to what extent it happens)

degree of reciprocal feedback? had thought about something along the lines of testing for the difference between left and right latent positions. but maybe a simpler first statistic to compute is: P(edge from j to i | edge from i to j)

A posteriori models

Spectral clustering and estimating an SBM, DCSBM, DDSBM

can try to incorporate homotypic affinity also... or correlation L/R
figure 3 from maggot paper

Feedforward layout and proportion of feedforward edges

Models with biological metadata

Testing for Peter's rule via the contact graph

is the adjacency a noisy version of the contact graph?
how does rank change as we jitter xyz of synapses
could we also just swap synapses in an epsilon ball and see how structure changes?

Spectral clustering that uses morphology

Configuration models that swap synapses within an epsilon ball

Can we cluster edges via connectivity + space?

had talked about trying to cluster the line graph
spectral embedding of the line graph looked bad when I tried it. Need to follow up.

Niche models that may not work for all data

Different hypotheses for a multilayer SBM-like model

maggot data

Matching FAFB and hemibrain or either to maggot

could be spectral, could be GM
results maybe bad?
could use morphology, could not

Spectral coarsening between maggot and adult

connectocross's People

Contributors

Stargazers

Watchers

Forkers

jingyan230 caseyweiner pauladkisson

connectocross's Issues

define a file spec for attributed graphs

should be well suited to the data we care about here as a first use case
in the past we have discussed a csv edgelist for the graph itself with separate json(s) for metadata

L/R homotypic affinity?

to what extent are edges between L/R pairs more probable than any L/R connection?

(which datasets have L/R pairs?)

(If they don't have pairs, can we use graph matching to predict?)

adjacency matrix plots

will probably have to figure out how to sort each one

write data pulling functions for the connectomes of interest

for the connectomes listed in the readme (roughly, exact ones may change)

want a simple, clear script to pull necessary graph + metadata from wherever it's hosted online
saves that data to the format specified in #1

Note: obviously a bit downstream of #1, but some work on pulling the data could probably start concurrently. configuring how to save to whatever format we pick is likely not the bottleneck for this issue.

write data pulling functions for the connectomes of interest - CATMAID

for the connectomes listed in the readme (roughly, exact ones may change)

want a simple, clear script to pull necessary graph + metadata from wherever it's hosted online
saves that data to the format specified in #1

Note: Sub-issue of #2

Specifically, focus on datasets stored as CATMAID files.

write dataset fetching functions, include as lightweight package

see sklearn's dataset fetcher's https://scikit-learn.org/stable/datasets/index.html#real-world-datasets

they download a dataset if necessary, and then load from local storage to memory in Python (I think)

Would want something similar that pulls the nicely formatted data from #2, which is saved in the spec specified by #1, and then loads it into python

Write data pulling function for Cionia intestinalis

paper: https://elifesciences.org/articles/16962
data (xlsx): https://elifesciences.org/articles/16962/figures#content

data is stored in multiple xlsx files with different metadata

Consistent styling

Palettes

As a group, please decide on a consistent color palette for species/dataset:
https://seaborn.pydata.org/tutorial/color_palettes.html
https://matplotlib.org/stable/tutorials/colors/colormaps.html

It may make sense to do something like have two different shades of the same color for multiple related datasets (e.g. two C. elegans could be two shades of blue or something) as long as this is less distinct than the species annotation.

I'd like this palette to just be saved somewhere as a json that any script can just import

Style

As a group, please decide on a consistent style for matplotlib. This is also something that can be saved and import easily. One example is here which you are welcome to use or modify.

E.g. each notebook you each separately make can then just call set_theme() and everyone's plots will look the same

simple statistics

number of nodes
number of edges
max degree
graph density

chart of the above for each connectome we have

figure out how to extract contact graphs from CATMAID

email Albert, see if such a thing already exists
If it's not there currently, write code to do it based on skeletons

fit a priori SBMs to the connectomes

Spencer:

seems like something that is easy to do and we could do for all of them
could just use the metadata, find some features that we care about.

Ben:

maybe we fit using various node metadata columns and just report likelihood, number of parameters, bic or something like that
maybe also just plot them and show that we can fit these models

graph matching stuff

we could run graph matching on a lot of these connectomes
maybe even match some of them to other connectomes

write data pulling functions for the connectomes of interest - NeuPrint

for the connectomes listed in the readme (roughly, exact ones may change)

want a simple, clear script to pull necessary graph + metadata from wherever it's hosted online
saves that data to the format specified in #1

Note: Sub-issue of #2

Specifically, focus on datasets stored as NeuPrint files.

(decide if we need a) lightweight graph + metadata object

i often find something like this helpful.

Just a light graph object that stores adjacency matrix + pandas metadata on the nodes dataframe

havent done anything smart for multigraph or edges with features

likely a better way to implement the above

unsure if it is even necessary or networkx is enough. but end up using adjacency matrix representation so much that it was convenient

what connectomes specifically
what questions are biologically interesting

want a simple, clear script to pull necessary graph + metadata from wherever it's hosted online
saves that data to the format specified in #1

Note: Sub-issue of #2

Specifically, focus on datasets stored as .xlsx files.

Add list of available connectomes with references to a page in the documentation

References can be added as bibtex (see the current references file)

neurodata / connectocross Goto Github PK

connectocross's Introduction

connectocross's People

Contributors

Stargazers

Watchers

Forkers

connectocross's Issues

Palettes

Style

Recommend Projects

Recommend Topics

Recommend Org