dendrograms / astrodendro Goto Github PK

View Code? Open in Web Editor NEW

36.0 36.0 36.0 4.29 MB

Generate a dendrogram from a dataset

Home Page: https://dendrograms.readthedocs.io/

License: Other

Python 100.00%

astrodendro's People

Contributors

Stargazers

Watchers

astrodendro's Issues

More user-friendly way to figure out what metadata is needed where

There should be some nice way to lookup what statistic properties require what metadata items -- maybe this should be embedded into the docstring

Make BasicDendrogramViewer extensible

I'm working on dendrogramming Tom Dame's CO datacube, which is much longer along one axis than the other two, and so the default Viewer performs poorly for imagecube slices with that aspect ratio.

As a temporary fix, I've forked astrodendro to implement my own "workaround" in the style of this gist. In the long term, it would be interesting if I could write my own Viewer classes that extend BasicDendrogramViewer and can be used in place of it.

As an even better "bonus", it would be wonderful if the extensibility allowed for me to put in multiple "image" panels, so that (e.g.) I could see both an (l, b) and an (l, v) slice at the same time, or even add a GUI button that let me toggle between the two.

This is something that I'd be happy to help develop over the coming weeks, and could probably submit a pull request with my own idea of how to "extensibilize" BasicDendrogramViewer, in it if that would be helpful.

Package name

Just to check, are we happy with the name astrodendro?

Alternatives are for example dendrogram or dendrograms or dendro, with dendrogram.astro containing astronomy-specific routines and allowing people from other fields to contribute? Alternatively, we could even separate the astronomy-specific stuff to another repository, but then users have to install two packages, so not ideal.

Avoid recursionlimit in plotter

Currently, the DendrogramPlotter code is susceptible to recursion limits, so we need to make the code smarter to avoid this.

Ignore

Ignore; I hadn't pulled from the right master. Sorry.

Implement mass vs radius plotting, etc.

This is related to the PR I submitted on the old version of astrodendro, but basically: I typically want to run dendrograms on 2D maps to see, as a function of contour level, how much mass is in that contour. It's a very convenient way to make density-threshold based cutoffs (see, e.g., Jens Kauffman's recent papers).

Anyway, to do this, you need some tree-walking code, which I don't think is presently in astrodendro. Here's a sample that I used:

def get_mostest_children(item, props=('npix','f_sum'), mostest='f_sum'):
    """
    Return certain properties of all children selecting the mostest
    of something (e.g., brightest)
    """
    item.get_dens = lambda: item.values().sum()/item.get_npix()**(1.5)
    item.get_f_sum = lambda: item.values().sum()
    if item.is_leaf:
        d = dict([(p,[get(item,p)]) for p in props])
        d['branch'] = [item]
        return d
    else:
        brightest = item.children[0]
        brightest.get_dens = lambda: brightest.values().sum()/item.get_npix()**(1.5)
        brightest.get_f_sum = lambda: brightest.values().sum()
        for child in item.children[1:]:
            child.get_dens = lambda: child.values().sum()/item.get_npix()**(1.5)
            child.get_f_sum = lambda: child.values().sum()
            if get(child,mostest) > get(brightest,mostest):
                brightest = child
        brightest_props = get_mostest_children(brightest)
        d = dict([(p,[get(item,p)] + brightest_props[p])
                     for p in props])
        d['branch'] = [item] + brightest_props['branch']
        return d

(where I think get is a matplotlib-only convenience function...).

I'm not submitting this as a PR yet, in part because I think there are some "convenience functions" that ought to be implemented, e.g.:
Structure.get_sum(self): return self.values().sum()
and maybe some mechanism for computing other values, e.g. the density as I've done above, for a given branch/tree.

I've used this approach effectively already:

Show coordinate axes on image, rather than pixel axes

When I compute a dendrogram of the CO survey, using the syntax

>>> from astrodendro import Dendrogram
>>> d = Dendrogram.compute(array)

after using astropy.io.fits.getdata to extract the CO datacube, my dendrograms always are displayed with pixel values instead of (l, b, v) coordinates. I'm guessing this is at least partly because I'm passing the data in as a pure ndarray instead of giving the whole FITS object, hence the dendrogram doesn't know anything about the scaling factors.

example image in (l, v) space, where (l) should run from 180->0->360->180 from left-to-right:

Is there a good way to get the coordinate scaling back into the dendrogram so that I can show plots in (l, b, v)?

Allow manual seeding of leaves

A feature I would find useful is the ability to pass a list of seed pixels to compute. The dendrogram would then be constructed in such a way that each leaf contains exactly one seed pixel. This would be helpful for, e.g., resolution studies.

Make example plot with simplified cloud properties overlaid

It would be nice to extend an example using e.g.. pp_catalog which would show an overlay of the structures found with ellipses showing the principal axis, and the major/minor axis.

dendrograms.org "HTTPS" website doesn't work?

In the one-line description of this repo there's a link to the dendrograms.org website:

Generate a dendrogram from a dataset 
https://dendrograms.org

but the link seems to be broken unless I change "https" -> "http" and try

 http://dendrograms.org

as the other links on the homepage use. I tried a couple proxies and it seems to be not just me, dendrograms.org doesn't respond to https.

This is small potatoes but I figured I'd let someone know.

Wrap data at the edges when computing dendrograms

This is a simple question that may end up being deceptively difficult to handle.

I am working with a datacube that spans galactic longitude 0->360 and technically should "wrap around" on the edges. The dendrogram software doesn't know this, and so it naturally treats the left and right edges of the data unconnected. Hence, the red and blue structures in the following image correspond to different trees in the dendrogram -- but if I were to "rotate" the same data by 180 degrees and put the boundary at the galactic center, not anticenter, then they would match together in a merged tree.

Is it possible to compute a dendrogram where the data wraps at the edges? Could this be added as an optional input when making a dendrogram, so that I could specify e.g., wrapx=True and then have these structures be properly merged?

Interactive viewer crashes when dendrogram is loaded from file

Hi,
When I save a dendrogram object to a file, and then load it back, I have the following bug. I call up the viewer, then click on the intensity map, and get these errors:

In [8]: d
Out[8]: <astrodendro.dendrogram.Dendrogram at 0x106435f90>
In [9]: d.save_to('saved_dendrogram_test.hdf5')
In [12]: d2 = astrodendro.Dendrogram.load_from('saved_dendrogram_test.hdf5')
In [13]: d2.viewer()
# Then I interactively click on the map in the viewer and the following occurs:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/Users/tsrice/anaconda/lib/python2.7/site-packages/matplotlib/cbook.pyc in process(self, s, *args, **kwargs)
    525                     del self.callbacks[s][cid]
    526                 else:
--> 527                     proxy(*args, **kwargs)
    528
    529

/Users/tsrice/anaconda/lib/python2.7/site-packages/matplotlib/cbook.pyc in __call__(self, *args, **kwargs)
    403             mtd = self.func
    404         # invoke the callable and return the result
--> 405         return mtd(*args, **kwargs)
    406
    407     def __eq__(self, other):

/Users/tsrice/Documents/Code/astrodendro/astrodendro/viewer.pyc in select_from_map(self, event)
    219             # Select the structure
    220             structure = self.dendrogram.structure_at(indices)
--> 221             self.hub.select(input_key, structure)
    222
    223             # Re-draw

/Users/tsrice/Documents/Code/astrodendro/astrodendro/viewer.pyc in select(self, id, structure)
     51         self.selections[id] = structure
     52         for cb in self._callbacks:
---> 53             cb(id)
     54
     55

/Users/tsrice/Documents/Code/astrodendro/astrodendro/viewer.pyc in _on_selection_change(self, selection_id)
    178     def _on_selection_change(self, selection_id):
    179         self._update_lines(selection_id)
--> 180         self.update_contours()
    181
    182     def update_vmin(self, vmin):

/Users/tsrice/Documents/Code/astrodendro/astrodendro/viewer.pyc in update_contours(self)
    312             if struct is None:
    313                 continue
--> 314             mask = struct.get_mask(subtree=True)
    315             if self.array.ndim == 3:
    316                 mask = mask[self.slice, :, :]

/Users/tsrice/Documents/Code/astrodendro/astrodendro/structure.pyc in get_mask(self, shape, subtree)
    476         """
    477         if shape is None:
--> 478             shape = self._dendrogram.data.shape
    479         indices = self.indices(subtree=True) if subtree else self.indices
    480         mask = np.zeros(shape, dtype=bool)

AttributeError: 'NoneType' object has no attribute 'data'

I assume this is a bug in how the dendrogram object is reconstructed when loading from a file, or perhaps the .viewer() method should have a way of handling this case?

Link to other dendrogram implementations on dendrograms.org

Assign contiguous index values to structures

Currently, the set of Structure.idx values is a sparse subset of range(1, data.size). For the index code I am working on, it would be much simpler if they were just range(1, n_structure).

@astrofrog, is the particular value of Structure.idx used for anything right now? Would anything break if we switched schemes?

Add support for WCS

It would be nice to allow a WCS object to be passed for the data, and be able to return the world coordinates of elements inside structures - e.g. the world coordinates of all voxels inside a leaf from a 3-d cube. This could use a different attribute to what would normally be used to get the pixel coordinates (e.g. leaf.world_coords instead of leaf.coords, or something like this).

Trim index_map at the end of compute

Dendrogram.index_map gives the structure ID at each pixel. This map is created with a 1 pixel border within compute, for edge-checking reasons. However, this border should probably be removed at the end of compute. Otherwise, the node_at behavior is broken.

Release 0.1 package

GitHub will close the milestone if all issues are completed, so this ensures that the milestone stays open until the package is actually released. Leave any reminders here about anything release-related.

Collection of helpful is_indepndent functions

If there are specific pruning functions that would be generically useful to use for Dendrogram.compute(is_independent), we should provide them. Some ideas:

min_peak -- peak pixel passes a threshold
min_sum -- sum of values passes a threshold
contains_seeds -- contains at least one of a list of seed locations

It would be cool if these were composable, so that I could write something like

Dendrogram.compute(x, is_independent = min_peak(3) & min_sum(10))

And that would build an object that runs both tests. I'm not sure if &, and, or + makes the most sense to use.

Rename repository to ``astrodendro``

The repository name doesn't currently reflect the name of the package, which may be confusing - shall I rename the repository name to astrodendro? (since in the end, we included the analysis and visualization code in the same package?)

cc @ChrisBeaumont @low-sky

Should pixel statistics be quantities or floats

For statistics which are e.g. angles, it makes sense to return Astropy Quantity objects. However, for statistics in pixel units, I wonder whether maybe we should return simple Python floats rather than Quantity objects with u.pix objects. Otherwise we end up with this kind of problem: #71 - we can still have the heading in the catalog table be set to pix automatically, which is what this was intended for.

Implement min_delta, min_npix, etc as is_independent functions

Now that we have the is_independent option available, it makes more sense to implement the "built-in" pruning criteria using this. We could keep the keyword arguments if that is more user friendly, but the test inside Dendrogram.compute should just be a single call to an is_indendent function.

Debug method to assert consistent structure

The PR in #111 has me thinking about how to ensure that dendrograms remain in a valid state as they are constructed, and when they are modified after construction. It would be useful to have a check_invariants(dendrogram) method that asserts a bunch of properties that should never be violated (that the index map is consistent with the structures, that structure.parent and structure.children are always consistent, that vmax/vmin always increase from trunk to leaf, etc).

This method would probably be expensive to compute, so it should be tied to a DEBUG global variable that is False by default, but can be enabled by the unit tests or when working on new features. Then we should put calls to check_invariants inside the loops of compute, build_index, and post_pruning

Use ProgressBar class from Astropy

To avoid code duplication, we can make use of the ProgressBar class from astropy.utils.console. I plan to do this once #2 and #3 are resolved.

Viewer option for "integrated" datacube rather than datacube "slice"

When I'm working on the dendrogram of Tom Dame's CO survey, I often want to compare the structures identified by the dendrogram with the structures that were highlighted in these integrated maps: http://www.cfa.harvard.edu/mmw/MilkyWayinMolClouds.html

but the Viewer constrains me to look only at single slices (in velocity or latitude). Assuming that integrating along a single axis is a well-defined operation, would it be difficult to add an option that shows an "integrated" image rather than a single slice? The dendrogram computation should still use the full datacube, which is why I wouldn't want to feed Dendrogram an already-integrated array.

This may tie in neatly with Issue #94, wherein the best way to do this would be to write custom Viewers that take care of this.

make `fill_footprint` a private method in Dendrogram`

I don't see Structure.fill_footprint as having much utility outside the creation of the index map in Dendrogram.compute. Perhaps this should be a private Dendrogram method?

Correctly handle non-binary splitting

Add support for NDData objects

This is related to #4 - once we have WCS support, it would be trivial to then accept astropy NDData objects, which can carry around data with WCS and other meta-data.

Allow a custom user function to be passed for determining mergeability

As well as having 'standard' criteria for merging two structures (minimum number of pixels, etc.) it would be nice to allow users to be able to specify their own function based on their own criteria.

@bradenmacdonald @ChrisBeaumont - what do you think about this?

Consider providing a way to provide 'background subtracted flux' for structures

Can we use the same kind of approach as in height to find the 'background' of a structure and compute a background-subtracted flux? (in addition to a raw total flux)

Write utility function to convert from K to sensible units

Given the complexities involved with the K units, we could consider providing a utility function that pre-converts the data from K to less ambiguous units that can be then used by the analysis tools.

coveralls info out of date

The coveralls link still only works for the old name dendro-core

https://coveralls.io/r/dendrograms/dendro-core

As a result, the icon on the README is broken

Make final dendrograms immutable

As discussed in #20, part of the infrastructure in the Structure class (e.g. the tree index) relies on the dendrogram being immutable after the computation. One of the suggestions we mentioned there was to split Structure into two classes - one representing mutable structures that are used for the initial computation, and one that is used for the final structure, which is efficient but immutable.

One question that I'm wondering now is that if we do this, the tree can no longer be pruned by the user after the fact, so is that what we want?

If we decide we want an alternative that leaves the possibility of having a flexible tree after the computation, we can use the hash of the index map to check whether the tree has changed - Numpy arrays can be hashed, so this is trivial. If the hash of the index map changes, all cached variables (including the tree index) would get reset.

Review luminosity calculation

The luminosity calculation should take into account that flux units might be per pixel or per unit area, and I don't think we should default to a given pixel scale since there is no sensible default - we should just raise an exception.

Basic analysis in core package

I've started a wiki page discussion how to port over the "levelprops" functionality from IDL to python. @low-sky's input would be valuable here, since he best understands what quantities need to be extracted, and how they should be computed.

https://github.com/dendrograms/dendro-core/wiki/Structure-analysis-API-proposal

cc @astrofrog

Basic visualization in core package

@bradenmacdonald wrote a lot of visualization code, and we initially decided that this should stay out of the core package, especially the GTK viewer, which makes sense. However, I think that it would make sense to have a static Matplotlib-based viewer just to plot the basic dendrogram, and this won't add much code to the core-package. It will only require Matplotlib as an optional dependency when that method is called. Does that sound sensible? If so, I can start working on this.

Improve warnings for catalog generation

At the moment, the warnings for the catalog generation are very verbose and might scare users a little:

https://dendrograms-astrofrog-v2.readthedocs.org/en/latest/catalog.html

We should try making them more concise, e.g.

WARNING: bmaj (Beam major axis, sigma) missing, defaulting to 0 [astrodendro.analysis]

Show example of using contour plotting with APLpy

As indicated in #33, it would be nice to have a tutorial on using this with APLpy to show world coordinates.

Example data in docs

@ChrisBeaumont - I think it would be good to add a simple example dataset to the docs that we can use for all the examples (we can then use the Matplotlib sphinx directive to generate plots for the docs rather than having to include static plots). Do you have any interesting and small datasets that can be public? The extinction map of Perseus? Or anything else? Otherwise, I can check on my side.

Interface cleanup

I am in favor of the following interface tweaks. None are super important, so feel free to veto any, @astrofrog

Remove Dendrogram.all_nodes
Move min_data_value, min_delta, min_npix into a Dendrogram.params dict (this is the style that scikit-learn uses)
Remove Structure.merge_level
redefine Structure.height to suggestion #25 (comment)
implement Dendrogram.__getitem__ to fetch from node dict

Rename ``dx`` meta-data element to ``scale``

In the analysis tools, I think we should rename dx to scale as it is a more explicit name.

(see, I'm trying to not put everything in a single PR! ;) Easier to keep track of separate discussions this way...)

Index on restored dendrograms

The I/O utilities should take advantage of the new TreeIndex class

Performance issues in astrodendro.analysis

If anyone has a chance to help me understand the following performance issues I've been running into, I'd be grateful.

On one particular machine, I've been running into issues when calculating ppv_catalog on dendrogrammed data of modest to large size. (In particular, the file I'm running into trouble on is hosted here: http://www.cfa.harvard.edu/rtdc/CO/NumberedRegions/DHT21/DHT21_Taurus_mom.fits and is around 10 MB in size)

In [6]: d, catalog, x, y = perseus_analysis() # perseus_analysis loads the data, computes a dendrogram, and then calls ppv_catalog on the dendrogram using appropriate metadata.
loading data: ...
transposing, downsampling, and unit-converting data: ...
computing dendrogram: ...
Generating dendrogram using 307,278 of 5,296,671 pixels (5% of data)
[========================================>] 100%
# the above dendrogram computation takes 30 - 60 seconds using the following parameters:
#  min_value=0.01
#  min_delta=0.005
#  min_npix=20

# then the code hangs while running ppv_catalog

Based on the tracebacks I'm seeing, the resulting dendrogram seems to have around ~~100,000 indexed structures.~~ The dendrogram has 1029 structures, and for some of those structures, the struct.indices has over 100,000 items.

When I hit ctrl-c after about five or so minutes, the interrupt typically hits one of the following lines of code (below examples chosen as illustrative) :

# line number one
> /Users/tsrice/Documents/Code/astrodendro/astrodendro/analysis.py(65)mom1()
     64         m0 = self.mom0()
---> 65         return [np.nansum(i * self.values) / m0 for i in self.indices]
     66

# line number two
/Users/tsrice/Documents/Code/astrodendro/astrodendro/analysis.pyc in mom2(self)
     77
     78         for i in range(nd):
---> 79             result[i, i] = np.nansum(v * zyx[i] ** 2)
     80             for j in range(i + 1, nd):
     81                 result[i, j] = result[j, i] = np.nansum(v * zyx[i] * zyx[j])

# line number three
/Users/tsrice/Documents/Code/astrodendro/astrodendro/analysis.pyc in mom2(self)
     79             result[i, i] = np.nansum(v * zyx[i] ** 2)
     80             for j in range(i + 1, nd):
---> 81                 result[i, j] = result[j, i] = np.nansum(v * zyx[i] * zyx[j])
     82
     83         return result

The versions of my python packages:

Python 2.7.6 |Anaconda 2.0.0 (x86_64)| (default, May 27 2014, 14:58:54)
IPython 2.1.0 -- An enhanced Interactive Python.

In [4]: np.__version__
Out[4]: '1.8.1'

In [5]: astropy.__version__
Out[5]: '0.3.2'

Some questions:

Is this datacube too big to reasonably compute a catalog for? (I hope not!)
Are there better ways to profile the performance in situations like this, so I can better understand the code pinch points?
Is it possible or desirable for us to improve the performance of the analysis code?

Rename ``sky_major_sigma`` to ``major_sigma``

We had mentioned maybe renaming sky_major_sigma and sky_minor_sigma - I just realized just now that maybe the sky is not needed - if one is just looking at a collapsed simulation, the prefix sky is not really relevant.

Diagrams for `dendrogram.compute` documentation

To explain things like delta with pictures

Improve performance of dendrogram viewer

At the moment, the viewer implemented in #29 is quite slow when sliders are changed - this is because the slider automatically calls fig.canvas.draw, which re-draws the whole figure. It's possible to improve performance by making use of draw_artist to selectively re-draw only elements that are changing, though this requires re-implementing a slider class to allow this. I've done this all locally in a hacky way, but will try and tidy it up and submit a patch here.

Unfortunately, draw_artist doesn't work properly with the MacOS X backend, so it will have to be a solution for other backends only.

Release 0.2 package

Allow computation with manual specification of leaves

Anticipating that the astrodendro routines will be ultimately used as the core functionality for cataloging of structures based in python, it will become important to be able to specify the indices of the leaves that will define the dendrograms. Specifically, this is necessary functionality to recompute the dendrogram after filtering and pruning.

This is replicating the functionality lost by dropping friends and specfriends from the base API. I envision a use case where the dendrogram is computed, the leaves are then filtered based on delta or minimum/maximum separation in pixel space.

I think it would be okay if this functionality can insist that these leaves are a subset of an original dendrogram leaves.

Reference data should be easier to load

We should adopt scikit-learn's idea of putting the Perseus image and other files (like the scuba image referenced in the docs) into a data submodule, with easy loader functions. This would make it easier to run the examples in the docs

PPVStatistic.flux wrongly ignores velocity units

Hi,
I was working on getting integrated fluxes of PPV structures in a datacube using ppv_catalog. The unit handling for summing up the flux when the input units are (u.Jy/u.beam) seems to correctly handle integrating over 2d spatial units to produce u.Jy output, but the integration over the velocity axis should also contribute a velocity units term (usually I use u.km/u.s) to the computed flux (producing u.Jy * u.km / u.s) in order to be a physically meaningful sum over all 3 axes.

Currently I am doing a clumsy workaround that looks like this:

catalog = astrodendro.ppv_catalog(d, metadata)
flux_kms = catalog['flux'] * metadata['velocity_scale']

but it seems to me that the PPVStatistic and compute_flux code should be updated to correctly deal with this use case. Do you agree?

Viewer does not handle degenerate coordinates properly

Hi,

I run dendrograms on an integrated intensity map from JVLA successfully, which has two degenerate coordinates in the header (Stokes=1, and Frequency=1). The size of the array is:

In [14]: array.shape
Out[14]: (1, 1, 1024, 972)

However, the dendrogram viewer does not know how to handle this image:

In [9]: v = d2.viewer()
ERROR: ValueError: Only 2- and 3-dimensional arrays are supported at this time [astrodendro.viewer]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-3d0d5ce782bb> in <module>()
----> 1 v = d2.viewer()

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/astrodendro-0.1.0-py2.7.egg/astrodendro/dendrogram.pyc in viewer(self)
    454         """
    455         from .viewer import BasicDendrogramViewer
--> 456         return BasicDendrogramViewer(self)
    457 
    458 

/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/astrodendro-0.1.0-py2.7.egg/astrodendro/viewer.pyc in __init__(self, dendrogram)
     10 
     11         if dendrogram.data.ndim not in [2, 3]:
---> 12             raise ValueError("Only 2- and 3-dimensional arrays are supported at this time")
     13 
     14         self.array = dendrogram.data

ValueError: Only 2- and 3-dimensional arrays are supported at this time

I had to go back into CASA and export the data without the degenerate axes, however, since the 2 extra dimensions in the array have length=1 I think that this should be accepted .

Cheers,
Jaime

dendrograms / astrodendro Goto Github PK

astrodendro's People

Contributors

Stargazers

Watchers

Forkers

astrodendro's Issues

Recommend Projects

Recommend Topics

Recommend Org