astroml / astroml Goto Github PK

Machine learning, statistics, and data mining for astronomy and astrophysics

License: BSD 2-Clause "Simplified" License

Makefile 0.19% Python 99.81%

astroml's Introduction

AstroML: Machine Learning for Astronomy

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.

This project was started in 2012 by Jake VanderPlas to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray.

Important Links

HTML documentation: https://www.astroML.org
Core source-code repository: https://github.com/astroML/astroML
Figure source-code repository: https://github.com/astroML/astroML-figures
Issue Tracker: https://github.com/astroML/astroML/issues
Mailing List: https://groups.google.com/forum/#!forum/astroml-general

Installation

Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.

Core

To install the core astroML package in your home directory, use:

pip install astroML

A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:

conda install -c astropy astroML

The core package is pure python, so installation should be straightforward on most systems. To install from source, use:

python setup.py install

You can specify an arbitrary directory for installation using:

python setup.py install --prefix='/some/path'

To install system-wide on Linux/Unix systems:

python setup.py build
sudo python setup.py install

Dependencies

There are two levels of dependencies in astroML. Core dependencies are required for the core astroML package. Optional dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file.

Core Dependencies

The core astroML package requires the following (some of the functionality might work with older versions):

Python version 3.6+
Numpy >= 1.13
Scipy >= 0.19
Scikit-learn >= 0.18
Matplotlib >= 3.0
AstroPy >= 3.0

Optional Dependencies

Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts

HEALPy provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms.

Development

This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git, you can check out the latest sources from GitHub using:

git clone git://github.com/astroML/astroML.git

or if you have write privileges:

git clone [email protected]:astroML/astroML.git

Contribution

We strongly encourage contributions of useful astronomy-related code: for astroML to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution:

General

Any contribution should be done through the github pull request system (for more information, see the help page Code submitted to astroML should conform to a BSD-style license, and follow the PEP8 style guide.

Documentation and Examples

All submitted code should be documented following the Numpy Documentation Guide. This is a unified documentation style used by many packages in the scipy universe.

In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in astroML.datasets). These example scripts are in the examples subdirectory of the main source repository.

Authors

Package Author

Jake Vanderplas https://github.com/jakevdp http://jakevdp.github.com

Maintainer

Brigitta Sipocz https://github.com/bsipocz

Contributors

Alex Conley
Andreas Kopecky
Andrew Connolly
Asif Imran
Benjamin Alan Weaver
Brigitta Sipőcz
Chris Desira
Daniel Andreasen
Dino Bektešević
Edward Betts
Hans Moritz Günther
Hugo van Kemenade
Jake Vanderplas
Jeremy Blow
Jonathan Sick
Joris van Vugt
Juanjo Bazán
Julian Taylor
Lars Buitinck
Michael Radigan
Morgan Fouesneau
Nicholas Hunt-Walker
Ole Streicher
Pey Lian Lim
Rodrigo Nemmen
Ross Fadely
Vlad Skripniuk
Zlatan Vasović
Engineero
stonebig

astroml's People

Contributors

Stargazers

Watchers

Forkers

mfouesneau mqq ryanmaas joskid astroboxio flexlee keflavich mjonzen astrofanlee hamogu larsmans benmontet aimran nhuntwalker azizur77 invinciblejha rajatgupta0889 rbeaty88 ericchanbd jcibanezm agroener ckrawczyk jpank aconley mattgiguere kereturn suzhaolong fjaellet vicknick cmillion autocorr spie-hack-day-2014 stevenrjanssens a1m3i6n8-babaie jmbrewer bty2684 ronlygithub jimenbian omgitshongyu karenyyng drphilmarshall woodrowg42 rasmi rapternmn smirshekari jackieee saularyehkohn bsipocz cpshooter jeiranj dvartany surhudm eramirem jonathansick rossfadely glentner evizbiz salimoha lucatelli connolly ahsadeghi86 mwcraig himanshu-mittal miracode dr-dos-ok sandeepk17 sunr leolianger ajmendez sandy4321 aragilar lwang-astro nell-byler jvanvugt lelou13 happyyang mvdoc webbjj chookee efraintorlo mirca ppxie tornikemzhavia astrofrog aksshita bmorris3 peter-bssf tdoobw ingryan qliu4676 jeremyblow jeffbar tchanders gtrichards andreas-kopecky sinsun2002 dejunliu edwardjrhodes moeyensj humnaawan

astroml's Issues

Figure 9.12 incompatible with Scikit-learn 0.14

The tree module was re-done - this figure uses private attributes to visualize the tree. This should be fixed so as to work with both versions.

Wiener Filter fails for some data sets

(thanks to @acbecker)

In particular, data with large offsets should be centered, and the starting iteration for the fit should be estimated from the input data.

Also, a user-specified guess of the smoothing scale would be useful.

It might also be useful to have a static Wiener filter object, which could be instantiated for the optimal parameters, and then optimized if desired.

0.2 Release

Checklist:

Python 3 support
Separate astroML & astroML addons
Include tests in setup.py install
Fix filtering bug: #7
Add book figure captions to examples
Fix tree visualization incompatibility: #26
sklearn 0.14 enhancements: KDE (done in 50e0b17)
sklearn 0.14 enhancements: correlation function (done in 735af45)

Figure 6.5 error

Got an error while plotting figure 6.5:
ax.plot(t, true_pdf(t), ':', color='black', zorder=3,
label="Generating Distribution")

The error stated that x and y must have the same dimension. I looked and found that true_pdf returns a single value, while the array t had 1000 elements.

I corrected this myself by adding the following code after defining t:

t_true = []
for item in t:
    t_true.append(true_pdf(item))

and then plotting t_true instead of true_pdf(t). I don't know if that is the most efficient way of correcting the error, but it worked and produced the figure as it appears on the website.

nosetests runs zero tests

If you run nosetests astroML outside the source directory, zero tests are run. This is likely an issue with the setup.py file: I think it does not include the tests submodule in the installation.

Figure 3.19 y axis is off by a factor of 2

We should fix this figure.

See astroML/text_errata#29

Add dependency checks in setup.py

Installation fails in an opaque way if dependencies such as scikit-learn are not installed. We should add explicit checks to the setup script which give more informative failure messages.

Summer 2015 Roadmap

@nhuntwalker will be working on astroML part time this summer! Here I want to brainstorm some of the work that should be done

Maintenance:

remove astroML.cosmology; use astropy.cosmology instead
depend on gatspy for periodogram stuff.
move book_figures & associated test scripts to a separate repository; this will make it easier to work with and update astroML.
improve test coverage. We should focus on simple tests which reflect what is going on in the book figures without needing to run all the code.
KDE stuff can probably go, and use the scikit-learn functionality instead
perhaps depend on Bovy's extreme deconvolution? It's much faster than astroML's current version, but a bit of a pain to install. That said, it's much better than it was when I originally decided not to use it...

New Functionality

Time series stuff? I think we could build a better interface for structure functions, etc. and do some interesting science (perhaps put into gatspy, with examples in astroML?)
New Datasets: a lot has become available since 2012!
I've been wanting to do a general "scikit-learn for noisy data" type thing. Maybe factor out these routines into a separate package & add others? This would be a fun API/package-design project, but not as directly relevant to astronomy.
Other ideas?

Single block with bayesian blocks?

I'm attempting to use Bayesian Blocks to estimate the number of bins in a 2D histogram for a scatter plot (one dimension at a time) but I keep getting very few bins returned by the function, most of the times a single one.

Is this correct? (MWE below)

import numpy as np
from astroML.plotting import hist
import matplotlib.pyplot as plt


def rand_data(N):
    return np.random.uniform(low=1., high=20., size=(2,int(N)))

# Generate random data in 2D.
N = 500
P = rand_data(N)

# Obtain bin edges for each dimension separately.
b_rx = hist(P[0], bins='blocks')[1]
b_ry = hist(P[1], bins='blocks')[1]
print b_rx
print b_ry

# 2D histogram with the bin edges returned above.
d_1 = np.histogram2d(P[0], P[1], bins=[b_rx, b_ry])[0]

# Plot.
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.imshow(d_1.transpose(), origin='lower', aspect='auto')
ax2.scatter(P[0], P[1], c='r')
plt.show()

Wavelet transform power spectrum density shows power at a wrong spot.

Here is how to reproduce. It loos like the problem is really with FFT_continuous/IFFT_continous.

# I am not sure if it is a bug or a feature, but wavelet transform of 
# signals with non-centered x axis sure looks weird. Please check this file. 
# It correctly detects signals at frequency=1.0 and 1.5, but locates them in a wrong place. 


from astroML.fourier import\
    FT_continuous, IFT_continuous, sinegauss, sinegauss_FT, wavelet_PSD

import numpy as np

from matplotlib import pyplot as plt 

def testS1():
    x = np.arange(0,100,.1)
    y = x*0.0
    y[500:600] = np.sin(x[500:600]*1.0*2*np.pi) + np.sin(x[500:600]*1.5*2*np.pi)
    return x,y


def twv(x,y,f0,Q=5):
    # no rescaling, test signals
    # 
    wPSD = wavelet_PSD(x, y, f0, Q = Q)
    fig = plt.figure(figsize=(6, 8))
    fig.subplots_adjust(hspace=0.05, left=0.12, right=0.95, bottom=0.08, top=0.95)
    ax = fig.add_subplot(211)
    ax.plot(x,y, '-k', lw=1)
    ax.set_ylabel('Time signal')

    ax = plt.subplot(212)
    ax.set_ylim(np.min(f0),np.max(f0))

    ax.imshow(wPSD,
              origin='lower', aspect='auto', cmap=plt.cm.jet,
              extent=[x[0], x[-1], f0[0], f0[-1]]);   # vmin=
    ax.text(0.02, 0.95, ("Wavelet PSD"), color='w',
            ha='left', va='top', transform=ax.transAxes)


    ax.set_xlabel('$t$')
    ax.set_ylabel('$\omega_0$')


    plt.show()

if __name__ == '__main__':

    fr= np.arange(0.1,2,.01)
    xx,yy = testS1()
    twv(xx,yy,fr)

Fetch routine crash with new astropy version

I updated to astropy v1.0 today and found that now the fetch routines in astroML crash.

OError Traceback (most recent call last)
in ()
40 #------------------------------------------------------------
41 # Fetch the great wall data
---> 42 X = fetch_great_wall()
43
44 #------------------------------------------------------------

IOError: [Errno 13] Permission denied: '/Users/Kyle/.astropy/config/astropy.1.0.cfg'

README file broken

The README file on https://github.com/astroML/astroML is not rendered in HTML.

No module named sklearn.metrics

Hi,
sorry I am trying to install atroML in ubuntu 12.04 and after the command python <<setup.py build>> it gives this error
Traceback (most recent call last):
File "setup.py", line 14, in
import astroML
File "/home/astroML-0.1.1/astroML/init.py", line 3, in
from astroML.density_estimation import histogram
File "/home/astroML-0.1.1/astroML/density_estimation/init.py", line 1, in
from density_estimation import KDE, KNeighborsDensity
File "/home/astroML-0.1.1/astroML/density_estimation/density_estimation.py", line 11, in
from sklearn.metrics import pairwise_kernels, pairwise_distances
ImportError: No module named sklearn.metrics
can you please suggest some help ?
Best,
Narek

Spectrum of Vega no longer available

In fetch_vega_spectrum, the link points to a no-longer existing file:

ftp://ftp.stsci.edu/cdbs/current_calspec/ascii_files/1732526_nic_002.ascii

The closest alternative might be

ftp://ftp.stsci.edu/cdbs/current_calspec/1732526_stisnic_001.fits

but that won't be the same spectra -- and it's fits. If you still want ascii, perhaps:

ftp://ftp.stsci.edu/cdbs/calspec/ascii/alpha_lyr_stis_005.ascii

Extreme Deconvolution

Currently, extreme deconvolution in astroML.density_estimation.XDGMM is not full-featured: it cannot use the projection matrix R. Additionally, this should be improved so that diagonal covariances are handled efficiently.

Chapter 1: Wrong URL pointer?

I’m running throughout the coding examples for the Statistics Data Mining and Machine Learning in Astronomy text. I’ve found that this bit of code gives an error.

from astroML.datasets import tools
target = tools.TARGET_GALAXY
# main galaxy sample
plt, mjd, fib = tools.query_plate_mjd_fiber(5, primtarget=target)

I think that the issue might be that the URL is pointing to the wrong place now that DR12 is the public.

Any suggestions for a solution?

astroML_addons installed but still getting "using slow version" warnings

Hi,

I'm running Ubuntu 12.04 and I've got astroML running properly. From the output to the terminal (see below) it looks like astroML_addons were also installed properly using 'pip install'. However when I try to run 'fig_LS_sg_comparison.py' downloaded from the website I get a warning that it is still using the slow version of the LS. I know I am missing something here, perhaps something obvious. Any pointers would be greatly appreciated.

Thanks,
Dipankar

Here's the output of the installation:

$ sudo pip install astroML_addons
Downloading/unpacking astroML-addons
Downloading astroML_addons-0.1.1.tar.gz (257Kb): 257Kb downloaded
Running setup.py egg_info for package astroML-addons

Installing collected packages: astroML-addons
Running setup.py install for astroML-addons

Successfully installed astroML-addons
Cleaning up...
$

And here's what I get when I try to run the Lomb-Scargle routine afterwards

$ python ./fig_LS_sg_comparison.py
/usr/local/lib/python2.7/dist-packages/astroML/time_series/periodogram.py:8: UserWarning: Using slow version of lomb_scargle. Install astroML_addons to use an optimized version
warnings.warn("Using slow version of lomb_scargle. Install astroML_addons "

$

Silent mode for 'knuth' binning method?

I'm using astroML as part of a larger project and I'd like to be able to run:

from astroML.plotting import hist
hist(x, bins='knuth')

without it printing out to screen:

Optimization terminated successfully.
         Current function value: -68.087905
         Iterations: 17
         Function evaluations: 47
...

Is there a way to run this in "silent" mode?

HierarchicalClustering edge_cutoff

The edge_cutoff is passed as a percentile. It would be good to have the option of specifying the length of the cutoff instead.

Small quirks in the documentation

Hi,
thanks for providing astroML!
It looks as it it's going to help me a lot with my current project (I am just starting with the analysis).
Here are a few things that I discoveredm when I started reading the documenation:
(at http://www.astroml.org/examples/algorithms/plot_bayesian_blocks.html)

In the left sidebar, the link "citing astroML" is broken.
In the text it says " More fitness functions are available: see density_estimation".
It looks as if density_estimation should be link, but it is not. Maybe a typo in the sphinx document?

devectorize_axes

Hi Jake,

new versions of mpl are painful with unicode etc even when using python 2.x

:func:plotting.devectorize_axes does not work as is anymore.

here is an updated version

def devectorize_axes(ax=None, dpi=None, transparent=True):
    """Convert axes contents to a png.

    This is useful when plotting many points, as the size of the saved file
    can become very large otherwise.

    Parameters
    ----------
    ax : Axes instance (optional)
        Axes to de-vectorize.  If None, this uses the current active axes
        (plt.gca())
    dpi: int (optional)
        resolution of the png image.  If not specified, the default from
        'savefig.dpi' in rcParams will be used
    transparent : bool (optional)
        if True (default) then the PNG will be made transparent

    Returns
    -------
    ax : Axes instance
        the in-place modified Axes instance

    Examples
    --------
    The code can be used in the following way::

        import matplotlib.pyplot as plt
        fig, ax = plt.subplots()
        x, y = np.random.random((2, 10000))
        ax.scatter(x, y)
        devectorize_axes(ax)
        plt.savefig('devectorized.pdf')

    The resulting figure will be much smaller than the vectorized version.
    """
    from matplotlib.transforms import Bbox
    from matplotlib import image
    try:
        from io import BytesIO as StringIO
    except ImportError:
        try:
            from cStringIO import StringIO
        except ImportError:
            from StringIO import StringIO

    if ax is None:
        ax = plt.gca()

    fig = ax.figure
    axlim = ax.axis()

    # setup: make all visible spines (axes & ticks) & text invisible
    # we need to set these back later, so we save their current state
    _sp = {}
    _txt_vis = [t.get_visible() for t in ax.texts]
    for k in ax.spines:
        _sp[k] = ax.spines[k].get_visible()
        ax.spines[k].set_visible(False)
    for t in ax.texts:
        t.set_visible(False)

    _xax = ax.xaxis.get_visible()
    _yax = ax.yaxis.get_visible()
    _patch = ax.axesPatch.get_visible()
    ax.axesPatch.set_visible(False)
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)

    # convert canvas to PNG
    extents = ax.bbox.extents / fig.dpi
    sio = StringIO()
    plt.savefig(sio, format='png', dpi=dpi,
                transparent=transparent,
                bbox_inches=Bbox([extents[:2], extents[2:]]))
    sio.seek(0)
    im = image.imread(sio)

    # clear everything on axis (but not text)
    ax.lines = []
    ax.patches = []
    ax.tables = []
    ax.artists = []
    ax.images = []
    ax.collections = []

    # Show the image
    ax.imshow(im, extent=axlim, aspect='auto', interpolation='nearest')

    # restore all the spines & text
    for k in ax.spines:
        ax.spines[k].set_visible(_sp[k])
    for t, v in zip(ax.texts, _txt_vis):
        t.set_visible(v)
    ax.axesPatch.set_visible(_patch)
    ax.xaxis.set_visible(_xax)
    ax.yaxis.set_visible(_yax)

    if plt.isinteractive():
        plt.draw()

    return ax

ENH: take advantage of scikit-learn's new neighbors tools

Scikit-learn 0.14 includes a new Ball Tree and KD Tree which implement a number of generalized N-body routines. These could be used to enhance the astroML examples in the following cases:

Faster kernel density estimation
Faster cross-matching on the sphere
Faster correlation function computation

TLS_logL

The TLS_logL function returns the wrong result when dX is 2-dimensional. It should be dX * v_hat ** 2, not (dX * v_hat) ** 2.

Should fix and add a unit test.

plot_tissot_ellipse: failing when an array is given as input to radius parameter

While experimenting with plot_tissot_ellipse (astroML / plotting / ellipse.py), I run into a problem when instead of a single valued for radius I tried to give an array of values (each for each ellipse center).
I was able to do a workaround by modifying (in my local copy) line 29:

for long, lat, rad in np.broadcast(longitude, latitude, radius):
        el = Ellipse((long, lat), radius / np.cos(lat), radius, **kwargs)

I modified to:

for long, lat, rad in np.broadcast(longitude, latitude, radius):
        el = Ellipse((long, lat), rad / np.cos(lat), rad, **kwargs)

It seems that the problem was that within the for loop the original input (radius) to np.brodcast was given instead of the rad iterator.

setup.py should not import sklearn

Just wanted to install astroML and it failed because at the moment I have a broken sklearn installed.

Can you use another way to get the version info in setup.py so that this doesn't happen?

$ python setup.py install --user
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    import astroML
  File "/Users/deil/code/astroML/astroML/__init__.py", line 3, in <module>
    from astroML.density_estimation import histogram
  File "/Users/deil/code/astroML/astroML/density_estimation/__init__.py", line 2, in <module>
    from xdeconv import XDGMM
  File "/Users/deil/code/astroML/astroML/density_estimation/xdeconv.py", line 15, in <module>
    from sklearn.mixture import GMM
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/mixture/gmm.py", line 17, in <module>
    from .. import cluster
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/spectral.py", line 15, in <module>
    from .k_means_ import k_means
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/k_means_.py", line 29, in <module>
    from . import _k_means
ImportError: dlopen(/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so, 2): Symbol not found: _ATL_ddot
  Referenced from: /Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so
  Expected in: flat namespace
 in /Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so

Looking forward to your tutorials and book, thanks for making astroML!

Knuth binning algorithm fails for some datasets

AstroML users,
I've run into a strange issue using the knuth_bin_width provided in the density_estimation module.

When I apply the algorithm to certain datasets that I'm analyzing I get a strange error that I don't understand. Below is a minimum working example that produces the error. I've found that I can avoid the error by only sending data points below a threshold value to knuth_bin_width. (See code below.)

I am using Mac OS 10.9, Python 2.7, scipy 0.11.0, and astroML 0.2. Please let me know if you need additional information.

import numpy as np
from astroML import density_estimation as de

data = np.genfromtxt('http://www.lpl.arizona.edu/~bjackson/knuth_example_data.txt', delimiter = ', ')

#This one works
binwidth, bins = de.knuth_bin_width(data[data <= 1.], return_bins=True)
print "data binwidth: ", binwidth
#This one doesn't
binwidth, bins = de.knuth_bin_width(data, return_bins=True)
print "data binwidth: ", binwidth

And here is the Traceback:

Traceback (most recent call last):
  File "knuth_example.py", line 10, in <module>
    binwidth, bins = de.knuth_bin_width(data, return_bins=True)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 226, in knuth_bin_width
    M = optimize.fmin(knuthF, len(bins0))[0]
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 360, in fmin
    res = _minimize_neldermead(func, x0, args, callback=callback, **opts)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 452, in _minimize_neldermead
    fxr = func(xr)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 268, in function_wrapper
    return function(x, *args)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 155, in __call__
    return self.eval(M)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 173, in eval
    nk, bins = np.histogram(self.data, bins)
  File "/Library/Python/2.7/site-packages/numpy-1.8.0-py2.7-macosx-10.7-intel.egg/numpy/lib/function_base.py", line 199, in histogram
    sa.searchsorted(bins[-1], 'right')]
IndexError: index out of bounds

Thanks for any suggestions.

knuth rule bin width tries negative number of bins

function: knuth_bin_width in density_estimation

Problem: Some data will cause this function to crash in an IndexError: index out of bounds.

The error traces to calling np.histogram in the KnuthF class instance, where it was called with bins=[](an empty list). This could only occur if -1<int(M)<0 in the bins class method above.

Possible solution:
I tried a hack-y but effective fix of adding the line
if M<0: M = 1
right above the return line in the bins method of KnuthF.

Bug in cas_query?

When trying to run tools.query_plate_mjd_fiber(), I kept getting the error

ValueError: #Table1
plate,mjd,fiberid
2128,53800,575
2128,53800,577
2128,53800,578
2128,53800,579
2128,53800,581

I dug around in the code, and the error comes from line 78 of cas_query.py. The "output" array is

output = ['#Table1\n', 'plate,mjd,fiberid\n', '2128,53800,575\n', '2128,53800,577\n', '2128,53800,578\n', '2128,53800,579\n', '2128,53800,581\n'],

so the map() in line 78 is calling int('plate'), int('mjd'), etc, which raises an error.

I got around this by changing line 76 to
for i, line in enumerate(output[2:]):

But I think the problem is that for some reason, the comment '#Table1\n' is getting added to output by sql_query().

Weights for 2-point correlation function

Hi ,

Is there a way to include weights when using AstroML two-point correlation function ?

I checked the source code but it seems that is not possible to do so.

Thanks

Segmentation fault: 11 in nosetests on Mac OS 10.7.5

Hi,

I got a seg fault using nosetests astroML on my Mac with:

Enthought Python Distribution -- www.enthought.com
Version: 7.3-1 (64-bit)

Python 2.7.3 |EPD 7.3-1 (64-bit)| (default, Apr 12 2012, 11:14:05)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

I have a crash report available, let me know how to transmit it.

Cheers,
Jean-Baptiste

Figure 6.17

In figure 6.17, we should use the correlation from the full data rather than the mean of bootstrap samples as the best estimate.

Datasets not written in binary mode

Data set loaders should write files in binary mode. The current behavior causes a crash on some systems

Consistent example naming

In examples/datasets, plot_SDSS_SSPP.py should be plot_sdss_sspp.py to be consistent with other file names.

SDSSfits is incompatible with DR9

DR9 changed the FITS format, and SDSSfits fails when loading DR9 spectra. This should be fixed.

Using extreme_deconvolution library in XDGMM?

Hello. I've been using the XDGMM code to estimate density distributions in color-color diagrams. It worked well except that it took very long to converge. At some point I heard that Bovy et al. put a link in their paper to the code they wrote. This is the link

https://code.google.com/p/extreme-deconvolution/

I installed their extreme_deconvolution library and their python wrapper and did this

import extreme_deconvolution as xd
from sklearn.mixture import GMM
from astroML.density_estimation import XDGMM

# Load data and measurement covariates in XTrain and XerrTrain
# Choose number of gaussians `n_components`

# Stole this from XDGMM
gmm = GMM(n_components, n_iter=10, covariance_type='full').fit(XTrain)
amp = gmm.weights_; mean = gmm.means_; covar = gmm.covars_

# Results are saved in `amp`, `mean`, and `covar`
xd.extreme_deconvolution(XTrain,XerrTrain,amp,mean,covar)

# Load results in XDGMM object
clf = XDGMM(n_components)
clf.alpha = amp; clf.mu = mean; clf.V = covar

I found this to be much much faster than the current code in XDGMM, so I wanted to share this here. I thought it may be a good idea to use their library in XDGMM (The library also has the projection matrix feature which is another good thing). I guess that shipping astroML with another library may create more dependencies and other potential complications. In any case, I hope this is useful and let me know if I can be of any help.

Incompatibility with python3

Hi there,
I tried to install astroML via pip, but it failed due to the new print syntax of python3 with respect to python2.

Bug in fetch_and_shift_spectra()

I'm pretty sure there's a bug here: http://www.astroml.org/examples/datasets/compute_sdss_pca.html#example-datasets-compute-sdss-pca

This condition:

    if np.all(spec_rebin.spectrum == 0):
        num_skipped += 1
        print "%i, %i, %i is all zero" % (plate[i], mjd[i], fiber[i])
        continue

needs an "i += 1" added before the continue statement.

As it is now, if a bad spectrum (or one outside the given wavelength range) is found, no progress is made, because the function keeps trying to get the same spectrum over and over, hitting the same condition each time without incrementing i.

Automatic choice of dimensionality for PCA

Minka (2001), NIPS13 offers a Bayesian solution: http://papers.nips.cc/paper/1853-automatic-choice-of-dimensionality-for-pca.pdf

Cross-matching uses incorrect distances

The cross-matching utility does not take into account the cosine of the declination. For this reason, distances at large declinations will be incorrect.
The best fix would be to use the haversine distance within the new scikit-learn Ball Tree, for which there is a current pull request in scikit-learn: scikit-learn/scikit-learn#1732

Drop PyFITS in favor of astropy

PyFITS has stopped being maintained, in favor of astropy.io.fits. This should be updated within the astroML package.

Bayesian Blocks ported to astropy

Bayesian Blocks has been ported to astropy; when dependency is updated we should switch to using that version.

Fig 8.4 (Lasso/Ridge regression) does not take errors into account

Fairly easy to fix; probably need to define some convenience routines because sklearn does not provide this.

Bayesian blocks function should return N posterior samples

Thus making it Bayesian Bayesian blocks! :-)

sorted indices issues

Hey Jake.
I just noticed a subtle issue in the graph code and I wasn't sure where to report it.
Basically the euclidean MST with the ball tree (which is super-sweet by the way ;)
here does not do
what I thought it does. The mst code assumes has_sorted_indices which the ball tree does not seem to provide.
So I had an issue where the distance matrix was connected, but the MST was not - that's how I noticed. Anyhow, it was probably complete bogus, as has_sorted_indices was False.

Maybe we should add sort_indices into the BallTree? (in which case I should have reported this to sklearn?) or the MST algorithm should check if the indices are sorted and sort (in which case I should have reported to scipy?).

Anyhow, thanks for your work on the graph algorithms, it helps me a lot!

Figure for Bivariate Gaussian in Chapter 3

In chapter 3 the code for the bivariate Gaussian has

alpha = np.pi / 4

If alpha is changed (e.g. np.pi/3.) the density plot and the ellipses no longer align.

I believe this is a quirk of imshow and you need to transpose H

ax.imshow(H.transpose(), origin='lower', cmap=plt.cm.binary, interpolation='nearest',
extent=[bins[0][0], bins[0][-1], bins[1][0], bins[1][-1]])

Extreme Deconvolution Example error (n_iter)

I was trying to reproduce the Extreme Deconvolution found here: http://astroml.github.com/paper_figures/CIDU2012/fig_XD_example.html

And I got an error -- for some reason the __init__ of GMM can't handle being fed n_iter.

I've reproduced the code (which is simply the code up to the line where the error occurs).

import os
import cPickle

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.patches import Ellipse

from astroML.decorators import pickle_results
from astroML.density_estimation import XDGMM
from astroML.plotting.tools import draw_ellipse

#------------------------------------------------------------
# Sample the dataset
N = 2000
np.random.seed(0)

# generate the true data
x_true = (1.4 + 2 * np.random.random(N)) ** 2
y_true = 0.1 * x_true ** 2

# add scatter to "true" distribution
dx = 0.1 + 4. / x_true ** 2
dy = 0.1 + 10. / x_true ** 2

x_true += np.random.normal(0, dx, N)
y_true += np.random.normal(0, dy, N)

# add noise to get the "observed" distribution
dx = 0.2 + 0.5 * np.random.random(N)
dy = 0.2 + 0.5 * np.random.random(N)

x = x_true + np.random.normal(0, dx)
y = y_true + np.random.normal(0, dy)

# stack the results for computation
X = np.vstack([x, y]).T
Xerr = np.zeros(X.shape + X.shape[-1:])
diag = np.arange(X.shape[-1])
Xerr[:, diag, diag] = np.vstack([dx ** 2, dy ** 2]).T


#------------------------------------------------------------
# compute and save results
@pickle_results("XD_toy.pkl")
def compute_XD_results(n_components=10, n_iter=500):
    clf = XDGMM(n_components, n_iter=n_iter)
    clf.fit(X, Xerr)
    return clf

clf = compute_XD_results(10, 500)

Yields the following error for me:

@pickle_results: computing results and saving to 'XD_toy.pkl'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9ceea794cbda> in <module>()
     49 
     50 
---> 51 clf = compute_XD_results(10, 500)

/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/astroML/decorators.pyc in new_f(*args, **kwargs)
     88                         print "    - args match:   %s" % args_match
     89                         print "    - kwargs match: %s" % kwargs_match
---> 90                 retval = f(*args, **kwargs)
     91                 cPickle.dump(dict(funcname=f.__name__, retval=retval,
     92                                   args=args, kwargs=kwargs),

<ipython-input-6-9ceea794cbda> in compute_XD_results(n_components, n_iter)
     45 def compute_XD_results(n_components=10, n_iter=500):
     46     clf = XDGMM(n_components, n_iter=n_iter)
---> 47     clf.fit(X, Xerr)
     48     return clf
     49 

/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/astroML/density_estimation/xdeconv.pyc in fit(self, X, Xerr, R)
     72         # initialize components via a few steps of GMM
     73         # this doesn't take into account errors, but is a fast first-guess
---> 74         gmm = GMM(self.n_components, n_iter=10, covariance_type='full').fit(X)
     75         self.mu = gmm.means_
     76         self.alpha = gmm.weights_

TypeError: __init__() got an unexpected keyword argument 'n_iter'

scatter_contour function IndexError

The plotting function scatter_contour() takes in at least 2 arguments, which are supposed to be the x and y data (array-type). For small data arrays, the function throws an IndexError. I've tested the function a little bit myself and I've concluded that the main issue occurs on this line:

outer_poly = outline.allsegs[0][0]

Using hist forces plot to appear

I'm currently using the hist function (from astroML.plotting import hist) to obtain bin edges, not to generate a plot.

When a plt.show() line appears much further down the code, the histogram generated by hist is shown next to whatever I am plotting.

Can this behaviour be prevented somehow?

ACF_scargle fails

I'm trying to run the example from http://www.astroml.org/book_figures/chapter10/fig_autocorrelation.html, but the ACF_scargle function is failing and returning all NaNs. The issue seems to be coming up when computing the power of the window function, which returns all infs. I tried computing a similar periodogram (random times, y=1 everywhere), and had similar results from the astroML and gatspy functions:

from astroML import time_series
from gatspy import periodic

t = np.sort(1000*np.random.random_sample(100))
y = np.ones(100)
dy = y * 0.1

pw = time_series.lomb_scargle(t,y,dy,np.arange(0.1,100),generalized=False,subtract_mean=False)

pw = periodic.lomb_scargle.LombScargle(center_data=False).fit(t,y,dy)
pw.score(np.arange(0.1,100))

Those yield an array of infs and an array of NaNs, respectively. Is this coming from the functions themselves, or the inputs?

I'm using astroML v. 0.3 and gatspy version 0.2, for reference.

SDSS filters not available

Hi,

Thanks for a wonderful work and an amazing package.

I tried to run one of the pictures from the homepage

from astroML.datasets import fetch_sdss_filter
u = fetch_sdss_filter('u')

But I get an error. Basically it is because the URL parsed no longer exist:
http://www.sdss.org/dr7/instruments/imager/filters/u.dat

I do not know where the data is, so I don't have an solution.

astroml / astroml Goto Github PK

astroml's Introduction

AstroML: Machine Learning for Astronomy

Important Links

Installation

Core

Dependencies

Core Dependencies

Optional Dependencies

Development

Contribution

General

Documentation and Examples

Authors

Package Author

Maintainer

Contributors

astroml's People

Contributors

Stargazers

Watchers

Forkers

astroml's Issues

Maintenance:

New Functionality

$

Recommend Projects

Recommend Topics

Recommend Org