Giter Site home page Giter Site logo

astroml / astroml Goto Github PK

View Code? Open in Web Editor NEW
1.0K 1.0K 309.0 9.6 MB

Machine learning, statistics, and data mining for astronomy and astrophysics

Home Page: https://www.astroml.org/

License: BSD 2-Clause "Simplified" License

Makefile 0.19% Python 99.81%

astroml's Introduction

AstroML: Machine Learning for Astronomy

Reference proceedings

Github Actions CI Status

Latest PyPI version

PyPI download stat

License badge

AstroML is a Python module for machine learning and data mining built on numpy, scipy, scikit-learn, and matplotlib, and distributed under the BSD license. It contains a growing library of statistical and machine learning routines for analyzing astronomical data in python, loaders for several open astronomical datasets, and a large suite of examples of analyzing and visualizing astronomical datasets.

This project was started in 2012 by Jake VanderPlas to accompany the book Statistics, Data Mining, and Machine Learning in Astronomy by Zeljko Ivezic, Andrew Connolly, Jacob VanderPlas, and Alex Gray.

Installation

Before installation, make sure your system meets the prerequisites listed in Dependencies, listed below.

Core

To install the core astroML package in your home directory, use:

pip install astroML

A conda package for astroML is also available either on the conda-forge or on the astropy conda channels:

conda install -c astropy astroML

The core package is pure python, so installation should be straightforward on most systems. To install from source, use:

python setup.py install

You can specify an arbitrary directory for installation using:

python setup.py install --prefix='/some/path'

To install system-wide on Linux/Unix systems:

python setup.py build
sudo python setup.py install

Dependencies

There are two levels of dependencies in astroML. Core dependencies are required for the core astroML package. Optional dependencies are required to run some (but not all) of the example scripts. Individual example scripts will list their optional dependencies at the top of the file.

Core Dependencies

The core astroML package requires the following (some of the functionality might work with older versions):

Optional Dependencies

Several of the example scripts require specialized or upgraded packages. These requirements are listed at the top of the particular scripts

  • HEALPy provides an interface to the HEALPix pixelization scheme, as well as fast spherical harmonic transforms.

Development

This package is designed to be a repository for well-written astronomy code, and submissions of new routines are encouraged. After installing the version-control system Git, you can check out the latest sources from GitHub using:

git clone git://github.com/astroML/astroML.git

or if you have write privileges:

git clone [email protected]:astroML/astroML.git

Contribution

We strongly encourage contributions of useful astronomy-related code: for astroML to be a relevant tool for the python/astronomy community, it will need to grow with the field of research. There are a few guidelines for contribution:

General

Any contribution should be done through the github pull request system (for more information, see the help page Code submitted to astroML should conform to a BSD-style license, and follow the PEP8 style guide.

Documentation and Examples

All submitted code should be documented following the Numpy Documentation Guide. This is a unified documentation style used by many packages in the scipy universe.

In addition, it is highly recommended to create example scripts that show the usefulness of the method on an astronomical dataset (preferably making use of the loaders in astroML.datasets). These example scripts are in the examples subdirectory of the main source repository.

Authors

Package Author

Maintainer

Contributors

  • Alex Conley
  • Andreas Kopecky
  • Andrew Connolly
  • Asif Imran
  • Benjamin Alan Weaver
  • Brigitta Sipőcz
  • Chris Desira
  • Daniel Andreasen
  • Dino Bektešević
  • Edward Betts
  • Hans Moritz Günther
  • Hugo van Kemenade
  • Jake Vanderplas
  • Jeremy Blow
  • Jonathan Sick
  • Joris van Vugt
  • Juanjo Bazán
  • Julian Taylor
  • Lars Buitinck
  • Michael Radigan
  • Morgan Fouesneau
  • Nicholas Hunt-Walker
  • Ole Streicher
  • Pey Lian Lim
  • Rodrigo Nemmen
  • Ross Fadely
  • Vlad Skripniuk
  • Zlatan Vasović
  • Engineero
  • stonebig

astroml's People

Contributors

aimran avatar andreas-kopecky avatar bsipocz avatar connolly avatar dinobektesevic avatar edwardbetts avatar engineero avatar hamogu avatar hugovk avatar jakevdp avatar jeremyblow avatar jonathansick avatar juliantaylor avatar jvanvugt avatar larsmans avatar mfouesneau avatar michaelradigan avatar nhuntwalker avatar olebole avatar pllim avatar rossfadely avatar rsnemmen avatar sergiopasra avatar stonebig avatar vladskripniuk avatar xuanxu avatar zlatanvasovic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

astroml's Issues

Wiener Filter fails for some data sets

(thanks to @acbecker)

In particular, data with large offsets should be centered, and the starting iteration for the fit should be estimated from the input data.

Also, a user-specified guess of the smoothing scale would be useful.

It might also be useful to have a static Wiener filter object, which could be instantiated for the optimal parameters, and then optimized if desired.

0.2 Release

Checklist:

  • Python 3 support
  • Separate astroML & astroML addons
  • Include tests in setup.py install
  • Fix filtering bug: #7
  • Add book figure captions to examples
  • Fix tree visualization incompatibility: #26
  • sklearn 0.14 enhancements: KDE (done in 50e0b17)
  • sklearn 0.14 enhancements: correlation function (done in 735af45)

Figure 6.5 error

Got an error while plotting figure 6.5:
ax.plot(t, true_pdf(t), ':', color='black', zorder=3,
label="Generating Distribution")

The error stated that x and y must have the same dimension. I looked and found that true_pdf returns a single value, while the array t had 1000 elements.

I corrected this myself by adding the following code after defining t:

t_true = []
for item in t:
    t_true.append(true_pdf(item))

and then plotting t_true instead of true_pdf(t). I don't know if that is the most efficient way of correcting the error, but it worked and produced the figure as it appears on the website.

nosetests runs zero tests

If you run nosetests astroML outside the source directory, zero tests are run. This is likely an issue with the setup.py file: I think it does not include the tests submodule in the installation.

Add dependency checks in setup.py

Installation fails in an opaque way if dependencies such as scikit-learn are not installed. We should add explicit checks to the setup script which give more informative failure messages.

Summer 2015 Roadmap

@nhuntwalker will be working on astroML part time this summer! Here I want to brainstorm some of the work that should be done

Maintenance:

  • remove astroML.cosmology; use astropy.cosmology instead
  • depend on gatspy for periodogram stuff.
  • move book_figures & associated test scripts to a separate repository; this will make it easier to work with and update astroML.
  • improve test coverage. We should focus on simple tests which reflect what is going on in the book figures without needing to run all the code.
  • KDE stuff can probably go, and use the scikit-learn functionality instead
  • perhaps depend on Bovy's extreme deconvolution? It's much faster than astroML's current version, but a bit of a pain to install. That said, it's much better than it was when I originally decided not to use it...

New Functionality

  • Time series stuff? I think we could build a better interface for structure functions, etc. and do some interesting science (perhaps put into gatspy, with examples in astroML?)
  • New Datasets: a lot has become available since 2012!
  • I've been wanting to do a general "scikit-learn for noisy data" type thing. Maybe factor out these routines into a separate package & add others? This would be a fun API/package-design project, but not as directly relevant to astronomy.
  • Other ideas?

Single block with bayesian blocks?

I'm attempting to use Bayesian Blocks to estimate the number of bins in a 2D histogram for a scatter plot (one dimension at a time) but I keep getting very few bins returned by the function, most of the times a single one.

Is this correct? (MWE below)

import numpy as np
from astroML.plotting import hist
import matplotlib.pyplot as plt


def rand_data(N):
    return np.random.uniform(low=1., high=20., size=(2,int(N)))

# Generate random data in 2D.
N = 500
P = rand_data(N)

# Obtain bin edges for each dimension separately.
b_rx = hist(P[0], bins='blocks')[1]
b_ry = hist(P[1], bins='blocks')[1]
print b_rx
print b_ry

# 2D histogram with the bin edges returned above.
d_1 = np.histogram2d(P[0], P[1], bins=[b_rx, b_ry])[0]

# Plot.
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.imshow(d_1.transpose(), origin='lower', aspect='auto')
ax2.scatter(P[0], P[1], c='r')
plt.show()

Wavelet transform power spectrum density shows power at a wrong spot.

Here is how to reproduce. It loos like the problem is really with FFT_continuous/IFFT_continous.

# I am not sure if it is a bug or a feature, but wavelet transform of 
# signals with non-centered x axis sure looks weird. Please check this file. 
# It correctly detects signals at frequency=1.0 and 1.5, but locates them in a wrong place. 


from astroML.fourier import\
    FT_continuous, IFT_continuous, sinegauss, sinegauss_FT, wavelet_PSD

import numpy as np

from matplotlib import pyplot as plt 

def testS1():
    x = np.arange(0,100,.1)
    y = x*0.0
    y[500:600] = np.sin(x[500:600]*1.0*2*np.pi) + np.sin(x[500:600]*1.5*2*np.pi)
    return x,y


def twv(x,y,f0,Q=5):
    # no rescaling, test signals
    # 
    wPSD = wavelet_PSD(x, y, f0, Q = Q)
    fig = plt.figure(figsize=(6, 8))
    fig.subplots_adjust(hspace=0.05, left=0.12, right=0.95, bottom=0.08, top=0.95)
    ax = fig.add_subplot(211)
    ax.plot(x,y, '-k', lw=1)
    ax.set_ylabel('Time signal')

    ax = plt.subplot(212)
    ax.set_ylim(np.min(f0),np.max(f0))

    ax.imshow(wPSD,
              origin='lower', aspect='auto', cmap=plt.cm.jet,
              extent=[x[0], x[-1], f0[0], f0[-1]]);   # vmin=
    ax.text(0.02, 0.95, ("Wavelet PSD"), color='w',
            ha='left', va='top', transform=ax.transAxes)


    ax.set_xlabel('$t$')
    ax.set_ylabel('$\omega_0$')


    plt.show()

if __name__ == '__main__':

    fr= np.arange(0.1,2,.01)
    xx,yy = testS1()
    twv(xx,yy,fr)

Fetch routine crash with new astropy version

I updated to astropy v1.0 today and found that now the fetch routines in astroML crash.

OError Traceback (most recent call last)
in ()
40 #------------------------------------------------------------
41 # Fetch the great wall data
---> 42 X = fetch_great_wall()
43
44 #------------------------------------------------------------

IOError: [Errno 13] Permission denied: '/Users/Kyle/.astropy/config/astropy.1.0.cfg'

No module named sklearn.metrics

Hi,
sorry I am trying to install atroML in ubuntu 12.04 and after the command python <<setup.py build>> it gives this error
Traceback (most recent call last):
File "setup.py", line 14, in
import astroML
File "/home/astroML-0.1.1/astroML/init.py", line 3, in
from astroML.density_estimation import histogram
File "/home/astroML-0.1.1/astroML/density_estimation/init.py", line 1, in
from density_estimation import KDE, KNeighborsDensity
File "/home/astroML-0.1.1/astroML/density_estimation/density_estimation.py", line 11, in
from sklearn.metrics import pairwise_kernels, pairwise_distances
ImportError: No module named sklearn.metrics
can you please suggest some help ?
Best,
Narek

Spectrum of Vega no longer available

In fetch_vega_spectrum, the link points to a no-longer existing file:

ftp://ftp.stsci.edu/cdbs/current_calspec/ascii_files/1732526_nic_002.ascii

The closest alternative might be

ftp://ftp.stsci.edu/cdbs/current_calspec/1732526_stisnic_001.fits

but that won't be the same spectra -- and it's fits. If you still want ascii, perhaps:

ftp://ftp.stsci.edu/cdbs/calspec/ascii/alpha_lyr_stis_005.ascii

Extreme Deconvolution

Currently, extreme deconvolution in astroML.density_estimation.XDGMM is not full-featured: it cannot use the projection matrix R. Additionally, this should be improved so that diagonal covariances are handled efficiently.

Chapter 1: Wrong URL pointer?

I’m running throughout the coding examples for the Statistics Data Mining and Machine Learning in Astronomy text. I’ve found that this bit of code gives an error.

from astroML.datasets import tools
target = tools.TARGET_GALAXY
# main galaxy sample
plt, mjd, fib = tools.query_plate_mjd_fiber(5, primtarget=target)

I think that the issue might be that the URL is pointing to the wrong place now that DR12 is the public.

Any suggestions for a solution?

astroML_addons installed but still getting "using slow version" warnings

Hi,

I'm running Ubuntu 12.04 and I've got astroML running properly. From the output to the terminal (see below) it looks like astroML_addons were also installed properly using 'pip install'. However when I try to run 'fig_LS_sg_comparison.py' downloaded from the website I get a warning that it is still using the slow version of the LS. I know I am missing something here, perhaps something obvious. Any pointers would be greatly appreciated.

Thanks,
Dipankar


Here's the output of the installation:

$ sudo pip install astroML_addons
Downloading/unpacking astroML-addons
Downloading astroML_addons-0.1.1.tar.gz (257Kb): 257Kb downloaded
Running setup.py egg_info for package astroML-addons

Installing collected packages: astroML-addons
Running setup.py install for astroML-addons

Successfully installed astroML-addons
Cleaning up...
$

And here's what I get when I try to run the Lomb-Scargle routine afterwards

$ python ./fig_LS_sg_comparison.py
/usr/local/lib/python2.7/dist-packages/astroML/time_series/periodogram.py:8: UserWarning: Using slow version of lomb_scargle. Install astroML_addons to use an optimized version
warnings.warn("Using slow version of lomb_scargle. Install astroML_addons "

$

Silent mode for 'knuth' binning method?

I'm using astroML as part of a larger project and I'd like to be able to run:

from astroML.plotting import hist
hist(x, bins='knuth')

without it printing out to screen:

Optimization terminated successfully.
         Current function value: -68.087905
         Iterations: 17
         Function evaluations: 47
...

Is there a way to run this in "silent" mode?

Small quirks in the documentation

Hi,
thanks for providing astroML!
It looks as it it's going to help me a lot with my current project (I am just starting with the analysis).
Here are a few things that I discoveredm when I started reading the documenation:
(at http://www.astroml.org/examples/algorithms/plot_bayesian_blocks.html)

  • In the left sidebar, the link "citing astroML" is broken.
  • In the text it says " More fitness functions are available: see density_estimation".
    It looks as if density_estimation should be link, but it is not. Maybe a typo in the sphinx document?

devectorize_axes

Hi Jake,

new versions of mpl are painful with unicode etc even when using python 2.x

:func:plotting.devectorize_axes does not work as is anymore.

here is an updated version

def devectorize_axes(ax=None, dpi=None, transparent=True):
    """Convert axes contents to a png.

    This is useful when plotting many points, as the size of the saved file
    can become very large otherwise.

    Parameters
    ----------
    ax : Axes instance (optional)
        Axes to de-vectorize.  If None, this uses the current active axes
        (plt.gca())
    dpi: int (optional)
        resolution of the png image.  If not specified, the default from
        'savefig.dpi' in rcParams will be used
    transparent : bool (optional)
        if True (default) then the PNG will be made transparent

    Returns
    -------
    ax : Axes instance
        the in-place modified Axes instance

    Examples
    --------
    The code can be used in the following way::

        import matplotlib.pyplot as plt
        fig, ax = plt.subplots()
        x, y = np.random.random((2, 10000))
        ax.scatter(x, y)
        devectorize_axes(ax)
        plt.savefig('devectorized.pdf')

    The resulting figure will be much smaller than the vectorized version.
    """
    from matplotlib.transforms import Bbox
    from matplotlib import image
    try:
        from io import BytesIO as StringIO
    except ImportError:
        try:
            from cStringIO import StringIO
        except ImportError:
            from StringIO import StringIO

    if ax is None:
        ax = plt.gca()

    fig = ax.figure
    axlim = ax.axis()

    # setup: make all visible spines (axes & ticks) & text invisible
    # we need to set these back later, so we save their current state
    _sp = {}
    _txt_vis = [t.get_visible() for t in ax.texts]
    for k in ax.spines:
        _sp[k] = ax.spines[k].get_visible()
        ax.spines[k].set_visible(False)
    for t in ax.texts:
        t.set_visible(False)

    _xax = ax.xaxis.get_visible()
    _yax = ax.yaxis.get_visible()
    _patch = ax.axesPatch.get_visible()
    ax.axesPatch.set_visible(False)
    ax.xaxis.set_visible(False)
    ax.yaxis.set_visible(False)

    # convert canvas to PNG
    extents = ax.bbox.extents / fig.dpi
    sio = StringIO()
    plt.savefig(sio, format='png', dpi=dpi,
                transparent=transparent,
                bbox_inches=Bbox([extents[:2], extents[2:]]))
    sio.seek(0)
    im = image.imread(sio)

    # clear everything on axis (but not text)
    ax.lines = []
    ax.patches = []
    ax.tables = []
    ax.artists = []
    ax.images = []
    ax.collections = []

    # Show the image
    ax.imshow(im, extent=axlim, aspect='auto', interpolation='nearest')

    # restore all the spines & text
    for k in ax.spines:
        ax.spines[k].set_visible(_sp[k])
    for t, v in zip(ax.texts, _txt_vis):
        t.set_visible(v)
    ax.axesPatch.set_visible(_patch)
    ax.xaxis.set_visible(_xax)
    ax.yaxis.set_visible(_yax)

    if plt.isinteractive():
        plt.draw()

    return ax

ENH: take advantage of scikit-learn's new neighbors tools

Scikit-learn 0.14 includes a new Ball Tree and KD Tree which implement a number of generalized N-body routines. These could be used to enhance the astroML examples in the following cases:

  • Faster kernel density estimation
  • Faster cross-matching on the sphere
  • Faster correlation function computation

TLS_logL

The TLS_logL function returns the wrong result when dX is 2-dimensional. It should be dX * v_hat ** 2, not (dX * v_hat) ** 2.

Should fix and add a unit test.

plot_tissot_ellipse: failing when an array is given as input to radius parameter

While experimenting with plot_tissot_ellipse (astroML / plotting / ellipse.py), I run into a problem when instead of a single valued for radius I tried to give an array of values (each for each ellipse center).
I was able to do a workaround by modifying (in my local copy) line 29:

for long, lat, rad in np.broadcast(longitude, latitude, radius):
        el = Ellipse((long, lat), radius / np.cos(lat), radius, **kwargs)

I modified to:

for long, lat, rad in np.broadcast(longitude, latitude, radius):
        el = Ellipse((long, lat), rad / np.cos(lat), rad, **kwargs)

It seems that the problem was that within the for loop the original input (radius) to np.brodcast was given instead of the rad iterator.

setup.py should not import sklearn

Just wanted to install astroML and it failed because at the moment I have a broken sklearn installed.

Can you use another way to get the version info in setup.py so that this doesn't happen?

$ python setup.py install --user
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    import astroML
  File "/Users/deil/code/astroML/astroML/__init__.py", line 3, in <module>
    from astroML.density_estimation import histogram
  File "/Users/deil/code/astroML/astroML/density_estimation/__init__.py", line 2, in <module>
    from xdeconv import XDGMM
  File "/Users/deil/code/astroML/astroML/density_estimation/xdeconv.py", line 15, in <module>
    from sklearn.mixture import GMM
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/mixture/__init__.py", line 5, in <module>
    from .gmm import sample_gaussian, log_multivariate_normal_density
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/mixture/gmm.py", line 17, in <module>
    from .. import cluster
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/__init__.py", line 6, in <module>
    from .spectral import spectral_clustering, SpectralClustering
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/spectral.py", line 15, in <module>
    from .k_means_ import k_means
  File "/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/k_means_.py", line 29, in <module>
    from . import _k_means
ImportError: dlopen(/Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so, 2): Symbol not found: _ATL_ddot
  Referenced from: /Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so
  Expected in: flat namespace
 in /Users/deil/Library/Python/2.7/lib/python/site-packages/sklearn/cluster/_k_means.so

Looking forward to your tutorials and book, thanks for making astroML!

Knuth binning algorithm fails for some datasets

AstroML users,
I've run into a strange issue using the knuth_bin_width provided in the density_estimation module.

When I apply the algorithm to certain datasets that I'm analyzing I get a strange error that I don't understand. Below is a minimum working example that produces the error. I've found that I can avoid the error by only sending data points below a threshold value to knuth_bin_width. (See code below.)

I am using Mac OS 10.9, Python 2.7, scipy 0.11.0, and astroML 0.2. Please let me know if you need additional information.

import numpy as np
from astroML import density_estimation as de

data = np.genfromtxt('http://www.lpl.arizona.edu/~bjackson/knuth_example_data.txt', delimiter = ', ')

#This one works
binwidth, bins = de.knuth_bin_width(data[data <= 1.], return_bins=True)
print "data binwidth: ", binwidth
#This one doesn't
binwidth, bins = de.knuth_bin_width(data, return_bins=True)
print "data binwidth: ", binwidth

And here is the Traceback:

Traceback (most recent call last):
  File "knuth_example.py", line 10, in <module>
    binwidth, bins = de.knuth_bin_width(data, return_bins=True)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 226, in knuth_bin_width
    M = optimize.fmin(knuthF, len(bins0))[0]
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 360, in fmin
    res = _minimize_neldermead(func, x0, args, callback=callback, **opts)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 452, in _minimize_neldermead
    fxr = func(xr)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/optimize/optimize.py", line 268, in function_wrapper
    return function(x, *args)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 155, in __call__
    return self.eval(M)
  File "/Library/Python/2.7/site-packages/astroML/density_estimation/histtools.py", line 173, in eval
    nk, bins = np.histogram(self.data, bins)
  File "/Library/Python/2.7/site-packages/numpy-1.8.0-py2.7-macosx-10.7-intel.egg/numpy/lib/function_base.py", line 199, in histogram
    sa.searchsorted(bins[-1], 'right')]
IndexError: index out of bounds

Thanks for any suggestions.

knuth rule bin width tries negative number of bins

function: knuth_bin_width in density_estimation

Problem: Some data will cause this function to crash in an IndexError: index out of bounds.

The error traces to calling np.histogram in the KnuthF class instance, where it was called with bins=[](an empty list). This could only occur if -1<int(M)<0 in the bins class method above.

Possible solution:
I tried a hack-y but effective fix of adding the line
if M<0: M = 1
right above the return line in the bins method of KnuthF.

Bug in cas_query?

When trying to run tools.query_plate_mjd_fiber(), I kept getting the error

ValueError: #Table1
plate,mjd,fiberid
2128,53800,575
2128,53800,577
2128,53800,578
2128,53800,579
2128,53800,581

I dug around in the code, and the error comes from line 78 of cas_query.py. The "output" array is

output = ['#Table1\n', 'plate,mjd,fiberid\n', '2128,53800,575\n', '2128,53800,577\n', '2128,53800,578\n', '2128,53800,579\n', '2128,53800,581\n'],

so the map() in line 78 is calling int('plate'), int('mjd'), etc, which raises an error.

I got around this by changing line 76 to
for i, line in enumerate(output[2:]):

But I think the problem is that for some reason, the comment '#Table1\n' is getting added to output by sql_query().

Weights for 2-point correlation function

Hi ,

Is there a way to include weights when using AstroML two-point correlation function ?

I checked the source code but it seems that is not possible to do so.

Thanks

Segmentation fault: 11 in nosetests on Mac OS 10.7.5

Hi,

I got a seg fault using nosetests astroML on my Mac with:

Enthought Python Distribution -- www.enthought.com
Version: 7.3-1 (64-bit)

Python 2.7.3 |EPD 7.3-1 (64-bit)| (default, Apr 12 2012, 11:14:05)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

I have a crash report available, let me know how to transmit it.

Cheers,
Jean-Baptiste

Figure 6.17

In figure 6.17, we should use the correlation from the full data rather than the mean of bootstrap samples as the best estimate.

Consistent example naming

In examples/datasets, plot_SDSS_SSPP.py should be plot_sdss_sspp.py to be consistent with other file names.

Using extreme_deconvolution library in XDGMM?

Hello. I've been using the XDGMM code to estimate density distributions in color-color diagrams. It worked well except that it took very long to converge. At some point I heard that Bovy et al. put a link in their paper to the code they wrote. This is the link

https://code.google.com/p/extreme-deconvolution/

I installed their extreme_deconvolution library and their python wrapper and did this

import extreme_deconvolution as xd
from sklearn.mixture import GMM
from astroML.density_estimation import XDGMM

# Load data and measurement covariates in XTrain and XerrTrain
# Choose number of gaussians `n_components`

# Stole this from XDGMM
gmm = GMM(n_components, n_iter=10, covariance_type='full').fit(XTrain)
amp = gmm.weights_; mean = gmm.means_; covar = gmm.covars_

# Results are saved in `amp`, `mean`, and `covar`
xd.extreme_deconvolution(XTrain,XerrTrain,amp,mean,covar)

# Load results in XDGMM object
clf = XDGMM(n_components)
clf.alpha = amp; clf.mu = mean; clf.V = covar

I found this to be much much faster than the current code in XDGMM, so I wanted to share this here. I thought it may be a good idea to use their library in XDGMM (The library also has the projection matrix feature which is another good thing). I guess that shipping astroML with another library may create more dependencies and other potential complications. In any case, I hope this is useful and let me know if I can be of any help.

Incompatibility with python3

Hi there,
I tried to install astroML via pip, but it failed due to the new print syntax of python3 with respect to python2.

Bug in fetch_and_shift_spectra()

I'm pretty sure there's a bug here: http://www.astroml.org/examples/datasets/compute_sdss_pca.html#example-datasets-compute-sdss-pca

This condition:

    if np.all(spec_rebin.spectrum == 0):
        num_skipped += 1
        print "%i, %i, %i is all zero" % (plate[i], mjd[i], fiber[i])
        continue

needs an "i += 1" added before the continue statement.

As it is now, if a bad spectrum (or one outside the given wavelength range) is found, no progress is made, because the function keeps trying to get the same spectrum over and over, hitting the same condition each time without incrementing i.

Cross-matching uses incorrect distances

The cross-matching utility does not take into account the cosine of the declination. For this reason, distances at large declinations will be incorrect.
The best fix would be to use the haversine distance within the new scikit-learn Ball Tree, for which there is a current pull request in scikit-learn: scikit-learn/scikit-learn#1732

sorted indices issues

Hey Jake.
I just noticed a subtle issue in the graph code and I wasn't sure where to report it.
Basically the euclidean MST with the ball tree (which is super-sweet by the way ;)
here does not do
what I thought it does. The mst code assumes has_sorted_indices which the ball tree does not seem to provide.
So I had an issue where the distance matrix was connected, but the MST was not - that's how I noticed. Anyhow, it was probably complete bogus, as has_sorted_indices was False.

Maybe we should add sort_indices into the BallTree? (in which case I should have reported this to sklearn?) or the MST algorithm should check if the indices are sorted and sort (in which case I should have reported to scipy?).

Anyhow, thanks for your work on the graph algorithms, it helps me a lot!

Figure for Bivariate Gaussian in Chapter 3

In chapter 3 the code for the bivariate Gaussian has

alpha = np.pi / 4

If alpha is changed (e.g. np.pi/3.) the density plot and the ellipses no longer align.

I believe this is a quirk of imshow and you need to transpose H

ax.imshow(H.transpose(), origin='lower', cmap=plt.cm.binary, interpolation='nearest',
extent=[bins[0][0], bins[0][-1], bins[1][0], bins[1][-1]])

Extreme Deconvolution Example error (n_iter)

I was trying to reproduce the Extreme Deconvolution found here: http://astroml.github.com/paper_figures/CIDU2012/fig_XD_example.html

And I got an error -- for some reason the __init__ of GMM can't handle being fed n_iter.

I've reproduced the code (which is simply the code up to the line where the error occurs).

import os
import cPickle

import numpy as np
from matplotlib import pyplot as plt
from matplotlib.patches import Ellipse

from astroML.decorators import pickle_results
from astroML.density_estimation import XDGMM
from astroML.plotting.tools import draw_ellipse

#------------------------------------------------------------
# Sample the dataset
N = 2000
np.random.seed(0)

# generate the true data
x_true = (1.4 + 2 * np.random.random(N)) ** 2
y_true = 0.1 * x_true ** 2

# add scatter to "true" distribution
dx = 0.1 + 4. / x_true ** 2
dy = 0.1 + 10. / x_true ** 2

x_true += np.random.normal(0, dx, N)
y_true += np.random.normal(0, dy, N)

# add noise to get the "observed" distribution
dx = 0.2 + 0.5 * np.random.random(N)
dy = 0.2 + 0.5 * np.random.random(N)

x = x_true + np.random.normal(0, dx)
y = y_true + np.random.normal(0, dy)

# stack the results for computation
X = np.vstack([x, y]).T
Xerr = np.zeros(X.shape + X.shape[-1:])
diag = np.arange(X.shape[-1])
Xerr[:, diag, diag] = np.vstack([dx ** 2, dy ** 2]).T


#------------------------------------------------------------
# compute and save results
@pickle_results("XD_toy.pkl")
def compute_XD_results(n_components=10, n_iter=500):
    clf = XDGMM(n_components, n_iter=n_iter)
    clf.fit(X, Xerr)
    return clf

clf = compute_XD_results(10, 500)

Yields the following error for me:

@pickle_results: computing results and saving to 'XD_toy.pkl'
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-9ceea794cbda> in <module>()
     49 
     50 
---> 51 clf = compute_XD_results(10, 500)

/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/astroML/decorators.pyc in new_f(*args, **kwargs)
     88                         print "    - args match:   %s" % args_match
     89                         print "    - kwargs match: %s" % kwargs_match
---> 90                 retval = f(*args, **kwargs)
     91                 cPickle.dump(dict(funcname=f.__name__, retval=retval,
     92                                   args=args, kwargs=kwargs),

<ipython-input-6-9ceea794cbda> in compute_XD_results(n_components, n_iter)
     45 def compute_XD_results(n_components=10, n_iter=500):
     46     clf = XDGMM(n_components, n_iter=n_iter)
---> 47     clf.fit(X, Xerr)
     48     return clf
     49 

/Library/Frameworks/EPD64.framework/Versions/7.2/lib/python2.7/site-packages/astroML/density_estimation/xdeconv.pyc in fit(self, X, Xerr, R)
     72         # initialize components via a few steps of GMM
     73         # this doesn't take into account errors, but is a fast first-guess
---> 74         gmm = GMM(self.n_components, n_iter=10, covariance_type='full').fit(X)
     75         self.mu = gmm.means_
     76         self.alpha = gmm.weights_

TypeError: __init__() got an unexpected keyword argument 'n_iter'

scatter_contour function IndexError

The plotting function scatter_contour() takes in at least 2 arguments, which are supposed to be the x and y data (array-type). For small data arrays, the function throws an IndexError. I've tested the function a little bit myself and I've concluded that the main issue occurs on this line:

outer_poly = outline.allsegs[0][0]

Using hist forces plot to appear

I'm currently using the hist function (from astroML.plotting import hist) to obtain bin edges, not to generate a plot.

When a plt.show() line appears much further down the code, the histogram generated by hist is shown next to whatever I am plotting.

Can this behaviour be prevented somehow?

ACF_scargle fails

I'm trying to run the example from http://www.astroml.org/book_figures/chapter10/fig_autocorrelation.html, but the ACF_scargle function is failing and returning all NaNs. The issue seems to be coming up when computing the power of the window function, which returns all infs. I tried computing a similar periodogram (random times, y=1 everywhere), and had similar results from the astroML and gatspy functions:

from astroML import time_series
from gatspy import periodic

t = np.sort(1000*np.random.random_sample(100))
y = np.ones(100)
dy = y * 0.1

pw = time_series.lomb_scargle(t,y,dy,np.arange(0.1,100),generalized=False,subtract_mean=False)

pw = periodic.lomb_scargle.LombScargle(center_data=False).fit(t,y,dy)
pw.score(np.arange(0.1,100))

Those yield an array of infs and an array of NaNs, respectively. Is this coming from the functions themselves, or the inputs?

I'm using astroML v. 0.3 and gatspy version 0.2, for reference.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.