Giter Site home page Giter Site logo

daler / trackhub Goto Github PK

View Code? Open in Web Editor NEW
51.0 8.0 20.0 13.58 MB

create, manage, and upload track hubs for use in the UCSC genome browser

Home Page: https://daler.github.io/trackhub/

License: MIT License

Python 98.93% Shell 1.07%
python genomics-visualization

trackhub's Introduction

trackhub

image

See the documentation at https://daler.github.io/trackhub for more details.

Data visualization is critical at all steps of genomic data analysis, from QC through final figure preparation. A track hub is way of organizing large numbers of of genomic data "tracks" (data files in a supported format), configured with a set of plain-text files that determine the organization, UI, labels, color, and other details. The files comprising a track hub are uploaded to a server, and a genome browser (e.g., UCSC Genome Browser) is pointed to the served URL for viewing. For example, here is a track hub created by the ENCODE project. It is straightforward to write the configuration files and upload the tracks manually if you have a small number of tracks. For larger data sets however, this becomes tedious and error-prone.

trackhub is a Python package that enables the programmatic construction and upload of arbitrarily complex track hubs. It has no dependencies besides Python itself, the common Python package docutils, and the availability of rsync (a standard Unix command-line tool for remotely transferring files). It is availabe on PyPI, bioconda, and GitHub; an automated test suite and tested documentation ensure high-quality code and help.

Installation

Using pip: pip install trackhub

Using bioconda: conda install trackhub

Features

Validation

trackhub validates parameters against UCSC's documented options, so errors are caught early and less time is spent debugging in the Genome Browser.

Filename handling

The directory structure of an analysis rarely matches the organization you want for a track hub. trackhub symlinks track files to a staging area so the hub can be inspected locally before being uploaded, e.g., with rsync. Staging also enables rapid deployment and updating since only files that have changed will be uploaded on subsequent calls.

Flexibility

Sensible defaults make it easy to build a functioning track hub. However, these defaults can always be overridden for complex configurations or when more precise control is needed. For example, by default a track's name also becomes the shortLabel, longLabel and filename of the track in the hub unless any of these are overridden by the user.

Easy track documentation

Write track hub documentation in ReStructured Text, and it is converted to HTML, connected to the track and uploaded with the rest of the hub. This allows for programmatically including content without the tedium of writing HTML by hand.

Extensible

The framework provided by trackhub can be extended as new hub functionality is added to the UCSC Genome Browser.

Full documentation can be found at https://daler.github.io/trackhub. The code in the documentation is run as part of the test suite to guarantee correctness.

Basic example

The following code demonstrates a track hub built out of all bigWig files found in a directory. It is relatively simple; see these other examples from the documentation for complex usage.

This basic example is run automatically when the documentation is re-generated. You can view the uploaded files in the trackhub-demo GitHub repository, and load the hub directly into UCSC to see what it looks like.

import glob, os
import trackhub

# First we initialize the components of a track hub

hub, genomes_file, genome, trackdb = trackhub.default_hub(
    hub_name="myhub",
    short_label='myhub',
    long_label='myhub',
    genome="hg38",
    email="[email protected]")

# Next we add tracks for some bigWigs. These can be anywhere on the
# filesystem; symlinks will be made to them. Here we use some example data
# included with the trackhub package; in practice you'd point to your own
# data.

for bigwig in glob.glob('trackhub/test/data/sine-hg38-*.bw'):

    # track names can't have any spaces or special characters. Since we'll
    # be using filenames as names, and filenames have non-alphanumeric
    # characters, we use the sanitize() function to remove them.

    name = trackhub.helpers.sanitize(os.path.basename(bigwig))

    # We're keeping this relatively simple, but arguments can be
    # programmatically determined (color tracks based on sample; change scale
    # based on criteria, etc).

    track = trackhub.Track(
        name=name,          # track names can't have any spaces or special chars.
        source=bigwig,      # filename to build this track from
        visibility='full',  # shows the full signal
        color='128,0,5',    # brick red
        autoScale='on',     # allow the track to autoscale
        tracktype='bigWig', # required when making a track
    )

    # Each track is added to the trackdb

    trackdb.add_tracks(track)

# In this example we "upload" the hub locally. Files are created in the
# "example_hub" directory, along with symlinks to the tracks' data files.
# This directory can then be pushed to GitHub or rsynced to a server.

trackhub.upload.upload_hub(hub=hub, host='localhost', remote_dir='example_hubs/example_hub')

# Alternatively, we could upload directly to a web server (not run in this
# example):

if 0:
    trackhub.upload.upload_hub(
        hub=hub, host='example.com', user='username',
        remote_dir='/var/www/example_hub')

Copyright 2012-2020 Ryan Dale; MIT license.

trackhub's People

Contributors

daler avatar detrout avatar evajason avatar jgoldmann avatar lrowe avatar mgperry avatar siebrenf avatar vsmalladi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trackhub's Issues

symlink to staging area before uploading

currently, a separate rsync call is made for uploading every single file. For large hubs, a lot of time can be wasted on starting and stopping each connection.

A nicer approach would be to first symlink all files to a local temp dir used as a staging area, and then rsync the whole thing in one shot to the remote host (using the -L flag). This strategy has the additional benefits of 1) of identifying missing files before anything gets uploaded (currently, you have to pay close attention to the rsync output to see if something failed) and 2) seeing the full directory structure of the hub exactly as it will be uploaded.

Are there some features of trackDB that are not supported on trackhub?

Normally, when I want to add features or change options to my tracks, I check the specifications on the UCSC documentation (https://genome.ucsc.edu/goldenpath/help/trackDb/trackDbHub.html) and simply added the relative parameters to my trackhub.Track() initialization parameters with no issues.

Recently, I have come across fields that are not recognized by the module.
It raises errors if I use the labelFields, defaultLabelFields, or mouseOverField options when defining tracks.

trackhub.track.ParameterError: Unhandled keyword arguments: {'labelFields': 'postarID,dataSource,cellType,expSource,postarScore', 'defaultLabelFields': 'cellType', 'mouseOverField': 'expSource'}

Just thought I'd mention it here.

priority must be float not integer

Hi,
I realised that if I assign an integer priority to a track the priority does not get rendered. I tracked this down to it failing validation because the priority parameter should be a float, not an int. Although it's easy enough for me to assign a float priority to my tracks, it's also easy to forget to do this. Would it be possible to coerce an integer priority to a float inside add_params? Or produce a warning when adding an invalid parameter, or something else to alert the user early on.
Thanks!

x.add_params(priority = 1)
print(x)
# track peaks_mysample_Rep1
# bigDataUrl peaks_mysample_Rep1.bigBed
# shortLabel mysample
# longLabel Peaks mysample_Rep1
# type bigBed
# color 222,45,38
# autoScale off

x.add_params(priority = 1.0)
print(x)
# track peaks_mysample_Rep1
# bigDataUrl peaks_mysample_Rep1.bigBed
# shortLabel mysample
# longLabel Peaks mysample_Rep1
# type bigBed
# color 222,45,38
# priority 1.0
# autoScale off

how to setup and run

Hi Daler, thank you very much for the great effort .
I think this might be very useful for people like us, your documentation and tutorial is
quite good but I could not figure out how to set up this framework on my windows computer and how to run this code. (which version of python is required .etc)
it is greatly appreciated if you could provide some information on this.

test scripts

It would be nice to have some scripts to run examples and upload them, in order to make sure they play nicely with the browser.

For example:

  • a basic hub
  • a hub with composite track
  • a hub with composite and view tracks
  • a hub with a supertrack
  • ...

The script could read a config file of user/host/url (or pass from cmdline); maybe a cmdline flag to choose which kind of hub to create/upload; after running it should print the URL for the hub.

Ideally this would be pulled from sphinx doctests so that the creation of each hub is tested against the code, followed by testing against the browser upon upload. But I don't know how to do this other than copy/paste.

Or maybe a different kind of testing framework is needed?

ImportError: cannot import name Hub

conda list | grep hub
trackhub                  0.1.3                    py27_0    bioconda
$ python
>>> import trackhub
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "trackhub.py", line 1, in <module>
    from trackhub import Hub, GenomesFile, Genome, TrackDb, Track
ImportError: cannot import name Hub

Is trackhub still maintained?

Anyways, learnt a lot about UCSC trackhubs just from reading the docs so thanks either way :)

bigWig file names

Hi,

Thanks a lot for this package, it's super helpful.

Just a quick note that file names for bigWig files with the extension .bw are being changed to .bigWig when calling trackhub.upload.upload_hub(...). I'm not sure if this is done on purpose, but you may want to put it in the change log, or allow .bw files (which are common).

thanks again

example hubs don't go to a good location

Each of the example hubs that demonstrate the different kinds of track types have data that are on different chromosomes. These should either be moved (e.g., by downloading the example tracks, converting to plain text, manually editing them to be in the same genomic region, and converting back to big* format) or the example hubs should be edited to include a relevant default position.

The latter can be worked out by looking at the table schema for a track to see which chromosome at least the first data point is found on, and then zooming to an appropriate region from there.

ssh_askpass error

I am trying to upload my hub into a private server using this command

# Alternatively, we could upload directly to a web server (not run in this
# example):
trackhub.upload.upload_hub(
    hub=hub, host='host', user='user',
    remote_dir='remote_dir')

And I got the following error:

ssh_askpass: exec(/usr/bin/ssh-askpass): No such file or directory
Permission denied, please try again.

How can I fix the ssh password file?

ViewTrack url not mapped to bigDataUrl

I am trying to create a trackDb file with composite tracks and views however when I render the trackDb the url attribute is rendered as 'url' instead of 'bigDataUrl' as shown below:

print ViewTrack(name='foo', url='bar', view='foobar')
>>> track foo
... shortLabel foo
... longLabel foo
... url bar
... view foobar

However when I use a simple Track the url attribute is rendered as 'bigDataUrl'.

I tried to add the 'bigDataUrl' parameter manually with add_params, it seems to be a valid parameter but it is not rendered in the final file.

Is there an other way to add a 'bigDataUrl' to a ViewTrack ?

Thanks

`pip install` error

likely related to python3 incompatibility

MLovci-OSX:~ mlovci$ pip install trackhub
Collecting trackhub
  Downloading trackhub-0.1.3.tar.gz (557kB)
    100% |████████████████████████████████| 561kB 405kB/s 
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/private/var/folders/l0/b826yztn02b2mp7zt4g1_p440000gp/T/pip-build-t6v29t80/trackhub/setup.py", line 1, in <module>
        import ez_setup
      File "/private/var/folders/l0/b826yztn02b2mp7zt4g1_p440000gp/T/pip-build-t6v29t80/trackhub/ez_setup.py", line 106
        except pkg_resources.VersionConflict, e:
                                            ^
    SyntaxError: invalid syntax

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/l0/b826yztn02b2mp7zt4g1_p440000gp/T/pip-build-t6v29t80/trackhub

Track parent on/off

I would like to have some flexibility around whether tracks contained in a composite track are on or off by default. I can specify this by using one of the following options in the trackdb.txt file:
parent mycomposite on
parent mycomposite off

Would it be possible to add functionality to enable this? Or would you be open to receiving a modified version of the package that includes this functionality?

Inconsistent method names for adding tracks

Currently there is an inconsistent naming system for methods that add tracks. To add a track to an aggregate track or composite track, the method add_subtrack is used. When adding a track or a container track to a super track or directly to the trackdb, the method add_tracks is used.

Though these methods have different names, they have the same result of connecting the individual track to their parent container track. Additionally, when view tracks are added to a composite track, the method add_view is used.

The add_tracks convention can be extended to all situations where a track is added to another and add_view and add_subtrack would become obsolete. The class of a container track is declared when the track is made, therefore the package uses the appropriate method for appending it to the trackdb. Using a consistent name would make the package more user friendly.

Support URL for 2bit files

Currently the package does not support using a Url for a 2bit file. It should behave similarly to the "source" and "bigDataUrl" key-word arguments.

example data

currently have bigBed and bigWig example data, but need to make some BAM and vcfTabix example data as well

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.