Giter Site home page Giter Site logo

ababaian / biosyntax-archive Goto Github PK

View Code? Open in Web Editor NEW
16.0 9.0 2.0 23 MB

Syntax highlighting for computational biology

Home Page: http://bioSyntax.org

License: GNU General Public License v3.0

Shell 17.57% Vim Script 82.11% Awk 0.32%
bioinformatics computational-biology syntax-highlighting vim less sublime gedit sam bam vcf

biosyntax-archive's Introduction

biosyntax-archive's People

Contributors

alyeffy avatar cpan-chu avatar ebedthan avatar fransilvion avatar jwong684 avatar lazypanda10117 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biosyntax-archive's Issues

bioSyntax Release TODO

Hey team,
Our little project is really coming together! I know it's a busy time for all of coming up but I think pushing back the release deadline by a week and a half will let us finish this strong. Wednesday December 13th is the goal deadline.

VanBug

On Dec 14th there is a VanBug (Vancouver Bioinformatics) meeting and 3-minute lightning talks. We can use this to let all the bioinformaticians across Vancouver know about bioSyntax and to get the word out. This is a great opportunity to put yourself out there and get to meet more of the bioinformatics community, I'm hoping one of you will be keen to represent bioSyntax.

  • Prepare and Give 3 minute lightning talk on bioSyntax (Alyssa)
    ๐ŸŽ‰ Alyssa won 3rd place =D w00t

Manuscript

This past weekend I spent a few hours going over the manuscript and I've evenly divided the remaining work. As per ICMJE, being a scientific author means you must contribute a non-trivial portion to writing and take responsibility for everything included (so proof read others work too). This needs to be done earlier then the final deadline because it's going to have to go through some revisions. Please take 3-4 hours over the next week to complete your section, they are mostly short but have to be thoughtfully written. Due: Dec 5th

  • Manuscript Draft Finished (All)
  • Compile references in google doc into one reference manager and upload library (?)

Tasks Remaining

  • Core Syntaxes: We are incredibly close! I can do sam-vim. Anicet is working on PDB-gedit. We need VCF-Gedit and SAM-Gedit.

  • VCF-sublime (German)

  • PDB-gedit (Anicet)

  • SAM-vim (Artem)

  • VCF-gedit (German)

  • SAM-gedit (Jeff)

  • Go back to all Fasta formats (Sublime / Vim / Gedit) add 16-color NT coloring (no context recog.)

Open Bug-fixes

  • VCF-less ID highlighting (Artem)

  • VCF-less simplify last columns to speed up (Artem)

  • FQ-vim FQ-sublime does not work with comments on + line (Jasper + Artem)

  • BED-gedit use robust column selection ( Push to Post-Release)

  • FQ-gedit is incomplete (Artem)

  • Testing Syntaxes: It'll be important to test all the syntaxes across ports to make sure they behave and look the same. We should assign 2 people to start going through it and looking for inconsistencies and bugs.

  • Tester 1 (Everyone)

  • Installer: Alyssa has been leading this! Again having it done for next week would be great and we can start debugging and testing.

  • Complete Installer (Alyssa)

  • Update 'Install' instructions page with automatic installer and also manual installation instructions (Alyssa)

  • Theme: We need to make full ANSI-256bit color schemes for bioSyntax, a single set of variable names, and then go through the syntax files and implement the variable names and theme. Pretty big task but can be started for the finished ports (sublime/ less).

  • Fill out bioSyntax.xls theme file for ANSI-256 bit, decide on one logical system for variable naming (Artem)

  • Go through sublime syntax files and re-name variables (Artem)

  • Go through less syntax files and re-name variables (Artem)

  • Go through vim files and re-name variables (Jasper)

  • Go through gedit files and re-name variables (Artem Started)

I've made our gedit-theme bioSyntax/syntax/gedit/bioSyntax.xml. We need someone to go through our gedit-syntax files and change the <styles> section so that all colors are map-to="bioSyntax:VARNAME". I've done bed.lang, clustal.lang and faidx.lang as examples

Meeting up

Let's take a break this week from meeting as last week went pretty long. If you can all work independently on the thing you're assigned and the manuscript then we can meet next week to plan out the finish and do what we have to do. I think we're all a bit less busy then too.

OK, I think that's the major points.If there's anything more we'd like to discuss. Please sign yourself up for tasks which don't have someone working on them (Just Edit this comment). Also do 1 thing at a time.

bioSyntax TODO -- Post-Release

We're coming up on a few good ideas of things we should work on but don't fall into the initial release. Feel free to edit and add things here:

  • Add Atomic Coloring. JMol / CPK coloring to atoms/elements when they appear in a file-format (PDB).

  • Set-up Vim / Sublime / Gedit to only use bioSyntax theme when a bioSyntax format is being used; otherwise use the default or preset theme. Sublime Example

  • Re-write the bed-gedit syntax to use 'Robust Column Selection'

  • Optimize the Regex Engine for VCF -gedit -sublime -vim(?) to account for catastrophic backtracking. (See vcf-less for a fixed example)

  • Secondary Color Gradient: In BED/WIG files where there can be a score, have one color scheme (like we have) for 0-1000 range. Have a second color gradient (orange?) which recognizes 0.0 - 1.00 (decimal scale). This will support two widely used data-ranges then 0-1 and 0-1000.

  • Make 'Infographics' for complex file-types (SAM, VCF, GTF) to help users learn and intepret the file formats. Include things such as PHRED numeric scale, FLAG conversion bits, what each field is etc... Use bioSyntax theme colours as a teaching tool here.

Installer Updates:

  • The script should output the requirements if it fails; or prompt user to type in which software to install for
  • For less: inform user and prompt (Y/N) for software updates and adding alias commands
  • On website, include an uninstall instructions. (i.e. delete these files)

hackseq Day 3 Goals

Priorities for Day 3

  • Make Installer "readme" for each operating system + commands
  • Outline key concepts of the publication; define 'topics' for group writing
  • Begin port to Gedit: Anicet
  • Port to Vim / make pipeline: Jasper
  • FASTA (aa): Context dependent recognition in the standard file. Alyssa?
  • VCF: Color headers. German
  • VCF: Regex/Color Data. German
  • GTF: Fix Attributes Fields. Jeffrey
  • SAM : Polish coloring scheme; do some testing of syntax. Artem
  • Presentation: Artem (1pm - 3pm)
  • Organize the data in the repo into a simple 'release'

Sprint Meet-up - Nov 20th + 24th

As discussed in the meeting, it would be fun to meet up and work together on this.

Monday November 20th -- 1:30 pm-9:00 pm+. We'll meet at the BCCRC (675 W 10th Ave), 13th Floor Meeting Room. You'll have to get a hold of me to let you up. You don't have to come right at 1:30 pm but whenever you're available and stay as long as your available : )

Parking is available outside but is paid; free after 5pm (but I have to let you in).

Remote people; we can get on slack and chat during this time and make it a party ๐Ÿ‘

Phase 1: Alpha-completion (Due Tuesday 21st)

Goals

  • Complete all core syntax files (#14)
  • Define a complete theme set
  • On the website have "Install Instructions" page up and correct for beta-testers
  • Have a running less/vim installer.
  • Create test sublime / gedit packages

Round 2: Friday November 24th -- 5:00 pm-9:00 pm+*

Team Meeting Times

Schedule for team meetings on Slack Channel

For the people working remote; make sure to login for these times on Slack.

Day 1

  • Initial Meeting: 9:30AM PST
  • Mid-day Meet: 1:00PM PST
  • End of day Meet: 4:30PM PST

Day 2

  • Initial Meeting: 9:30AM PST
  • Mid-day Meet: 1:00PM PST
  • End of day Meet: 4:30PM PST

Day 3

  • Initial Meeting: 9:30AM PST
  • Mid-day Meet: 1:00PM PST
  • End of day Meet: 4:30PM PST

Porting to less

Two formats, .sam and .vcf, are often very large and cannot be opened quickly in vim or any other text editor without loading to memory (although vim is decent if you have enough memory). This can be sort of solved by using head. The better solution is using less for .sam and .vcf. So can we have syntax highlighting there for these important formats?

We can leverage the source-highlight package to accomplish this. I believe the syntax-language files may be shared with gedit which will save work on that end.

Installing source-highlight in less (Ubuntu)

  1. Install source-highlight to your system:
sudo apt-get update
sudo apt-get install source-highlight
  1. Append these lines to your ~/.bashrc and/or ~/.zshrc

## Syntax highlighting in less
## For Ubuntu / Fedora
export LESSOPEN="| /usr/share/source-highlight/src-hilite-lesspipe.sh %s"
export LESS=" -R "

alias less='less -NSi -# 10'
alias more='less'

# Explicit fasta / sam less call for piping
# i.e:   samtools view -h aligned_hits.bam | sam-less
#
alias fa-less='source-highlight -f esc --lang-def=fasta.lang --outlang-def=bioSyntax.outlang --style-file=fa.style | less'
alias sam-less='source-highlight -f esc --lang-def=sam.lang --outlang-def=bioSyntax.outlang --style-file=sam.style | less'
alias vcf-less='source-highlight -f esc --lang-def=vcf.lang --outlang-def=bioSyntax-vcf.outlang --style-file=vcf.style | less'

Note: On different systems the /usr/share/source-highlight/src-hilite-lesspipe.sh may be installed to a different directory. (i.e CentOS: export LESSOPEN="| /usr/bin/src-hilite-lesspipe.sh %s")

Installing bioSyntax for less (Ubuntu)

  1. Update the src-hilite-lesspipe.sh script in the source-highlight directory.
# source-highlight directory on your system
SRCDIR='/usr/share/source-highlight'

cd  $bioSyntax_PATH/syntax/less/

sudo cp src-hilite-lesspipe_BIO.sh $SRCDIR/src-hilite-lesspipe.sh
  1. Copy over the *.lang, .outlang and .syntax files to the source-highlight directory.
#!/bin/bash
# quickInstall.sh
# Quick installer for less syntax
# for testing purposes

SRCDIR='/usr/share/source-highlight'

# Copy over src-hilite script
sudo cp src-hilite-lesspipe_BIO.sh $SRCDIR/src-hilite-lesspipe.sh


# Copy over language files
sudo cp fasta.lang $SRCDIR/
sudo cp sam.lang $SRCDIR/
sudo cp vcf.lang $SRCDIR/

# Copy over syle files
sudo cp fasta.style $SRCDIR/
sudo cp sam.style $SRCDIR/
sudo cp vcf.style $SRCDIR/

# Copy over language files
sudo cp bioSyntax.outlang $SRCDIR/
sudo cp bioSyntax-vcf.outlang $SRCDIR/
  1. Restart your computer for the rc file to update in your terminal.

Running bio-aware less

  1. Automatic detection of file-extensions when reading entire file *.fa, *.fasta, *.sam
    less hgr1.fa

  2. Piping requires explicit use of fa-less, sam-less or vcf-less which can be combined in all the interesting ways you can come up with.
    samtools view -h accepted_hits.bam | sam-less

Developing language syntax files (ongoing)

  • Source-highlight Documentation
  • Coloring is done by ANSI escape code.
  • While any color can be used, we'll impose a limit to 256 colors to maximize compabilitiy
  • Sadly Source-highlight only has 17 colors defined in its colors.h file. We would have to re-compile it to add more color compatibility. Can do quite a bit with 17 colors; just not amino-acid coloring.
  1. Syntax regex are defined in /usr/share/source-highlight/<Language>.lang
    1b) and have an associated <Language>.style
  2. Are piped into less-readable format by esc.outlang
  3. Which is then made pretty by /usr/share/source-highlight/esc.style
  4. Automatic file-extension recognition for less is performed in src-hilite-lesspipe.sh there is logic for running source-highlight. At Line 11 insert:
	*.fasta|*.fa|*.mfa)
	source-highlight -f esc --lang-def=fasta.lang --outlang-def=bioSyntax.outlang --style-file=fasta.style -i "$source" ;;
	*.sam)
	source-highlight -f esc --lang-def=sam.lang --outlang-def=bioSyntax.outlang --style-file=sam.style -i "$source" ;;
	*.vcf)
	source-highlight -f esc --lang-def=vcf.lang --outlang-def=bioSyntax-vcf.outlang --style-file=vcf.style -i "$source" ;;

We define a single <language>.lang and <language>.style per language & bioSyntax.outlang file for fasta.lang, sam.lang and bioSyntax-vcf.outlang for vcf.lang file each to get less working.

Known Bugs

  • Some Terminals have 8-color support; some have 256-color. If the output in less looks like gibberish then chances are your terminal doesn't support 256 colors. Try tput colors to tell how many colors are supported. Will add 8-color theme in the future.

Porting to Atom

By popular demand, I think it'd be a good idea to port all the syntax files to Atom as well. Atom is Github's open source text-editor that is cross-platform (Mac, Windows, Linux).

Its syntax highlighting system is based on TextMate's language grammar so converting the current Sublime Text files that we have to be compatible with Atom should be fairly straightforward. The only difference is that we'll be creating .cson files (basically custom JSON-like CSS files) and a less file for the colour scheme. I would like to work on this and if anyone else would as well, let me know :)

Here's some resources I found that seem helpful:

hs17: Introductions

Welcome buddies,
Looks like we're a team for hackseq 17 to work on bioSyntax. Welcome! Let's start with some brief introductions?

My name is Artem, I'm a grad student at UBC in Genetics. I'm (mainly) a biologist and have merged computational work to further my research over my PhD. My research is split now between studying variation in human ribosomal RNAs and studying the effects of Transposable Elements on transcriptional innovation in cancer. Besides that I'm an avid climber and love talking about crazy / far off biology ideas.

@fransilvion
@Jwong684
@alyeffy
@lazypanda10117
@Ebedthan
@ahmdeen

Fixing GTF Syntax

Hi all,
I am trying to complete the GTF syntax and port it over to gedit. (@Ebedthan, @ababaian). As suggested in issue 14, this is near-complete, so I want to know what do I need to fix for the sublime version first, and then following that to port it over to gedit. Thank you.

Biosyntax Publication

Hi all,

During the Hackathon, I have been working on drafting up a report for a paper that we could put together for publication. I have drafted up a brief skeleton of what our project is about and added figures to demonstrate our tool's utility.

UPDATED MANUSCRIPT FILE. SEE COMMENT BELOW.

I'm not exactly sure how public this is, so I have set the share settings to "can comment" for now. Let me know if you want to add/modify anything in there.

Cheers,

J

bioSyntax Meeting 2

Time / Place

Please complete the dudle poll to select a date for the next meeting.

The next meeting will be 6:30pm on Wednesday November 15th, in Room 416 Irving Barber Library. Note the half hour delay due to room booking. We'll discord in remote people.

Assignments / Items Due

Minutes from Meeting 1

  • Vim Core Syntaxes - Jasper / Gherman
  • Gedit Core Syntaxes - Anicet / Jeff
  • Less syntaxes - Artem
  • SAM gedit/sublime - Eric
  • Running Installer - Eric / Alyssa
  • Port to Atom - Alyssa
  • Outline for website documentation - Artem // Alyssa

I'll book a room in the library once we have a date; remote ppl we can Discord again.

Agenda (add items)

  1. Wrap-up-a-thon Date
  2. Define Release 1.0 'Finish Criteria' for bioSyntax (All)
    -- File Formats
    -- Software Ports
    -- Themes
  3. Report (Jasper)
    -- Authorship. Requirements + Responsibilities for each us
    -- Funding for publication, hackseq... others
  4. Installer (Eric + Alyssa)
  5. Documentation / Website (Alyssa + Artem)
    -- Website drafted
    -- Manual Installations
    -- How To: Make your own syntax
    -- How To: Contribute
  6. Syntax Specific Discussion
  7. Assign Tasks

Vision + Plan for hackathon

BioSyntax: Parsing biological file formats for humans with syntax highlighting

A large component of bioinformatics involves reading and writing data in biological file-formats such as fasta, fastq, bed, gtf, vcf, sam, etc... While being easy to parse computationally, these and other biological file-formats are often illegible for scientists to read and write to directly. Iโ€™d propose you join the bioSyntax team and together we will develop a suite of syntax highlighting for bio-formats to be used with common text editors such as gedit or vim. This design solution will help researchers interact with their data more efficiently and gain better insight into the biological world. This project requires a strong understanding of regular expressions, an intimate familiarity with use-cases for some biological file specifications and a flare for human-interface design.

The core idea here is how can we bring scientists closer to the underlying data and able to interact / interpret it more intuitively.


I'm trying to brainstorm some of the things we'll need to prep for this. Feel free to edit and add to the list. This isn't my project or my team, it's all of ours so chip in :)

Literature Of Interst

  1. Which file formats do we want to develop syntax for? (What do you use?)
  • SAM / BAM
  • FASTA / FASTQ
  • VCF
  • Wig
  • Bed
  • GTF
  • PDB
  1. Which programs are we going to develop the syntax for? (What do you use?)
  • gedit (and other simple text editors)
  • Vim / gVim
  • Emacs ?
  1. How can we unify the colors, look and feel of all the file formats into one standard so it's universal?
  • Define a central color scheme which includes biological classes (nucleotides / amino acids / coordinates / strings / number values / ...)
  1. Is there other non-syntax functionality which we would like to develop as well to help understand data.
  • Include a simple installer which installs auto-detection for file formats and the syntax files to a system

Feature List (To Do)

Features to improve bioSyntax which we are working on

Automated Installer Scripts (Eric -- Nov 12th)

Linux -

  • Sublime
  • Gedit / gtksourceview
  • Vim
  • less (+ Source Highlight)

Mac

  • Sublime
  • Gedit
  • Vim
  • less

Windows

  • Sublime
  • Gedit

Port to Vim

  • Core Syntaxes
  • Auxiliary Syntaxes

Port to Gedit

  • Core Syntaxes
  • Auxiliary Syntaxes

Pretty Features

  • To the main color scheme; add a slightly different blue for Uradine
  • In the SAM syntax; make distinct colors for SO:unsorted (invalid highlight) and SO:coordinate (nice color) or other SO:
  • In the VCF Syntax: add REGEX for recognition and parsing of the data fields
  • Make a block gradient coloring scope where foreground == background and it's all scaled
  • Research for stream highlighting. less more or something else for big data (Artem)
  • Write up short 'feature' description for front README (highlight things from the presentation as opposed to listing each format?)

File Format / Syntax Compatibility Matrix

File format and software compatibility matrix for bioSyntax.

status
X Syntax Complete
o In Development
- Unavailable
* Bug Fix Needed

Core Syntaxes

File Format Description sublime vim gedit less
.fasta Generic nt/aa sequence X X X X
.fastq Fasta + PHRED quality X X X X
.clustal Multiple Sequence Alignment X X X X
.bed Genomic Ranges X X X X
.gtf Genomic Annotation X X X X
.pdb Protein Structure X X Anc X
.vcf Variant Call Format X X X X
.sam NGS Sequence Data X X X X

Auxillary Syntaxes

File Format Description sublime vim gedit less
.fasta fasta alternative AA colors
- Clustal X o X -
- Taylor X o X -
- Zappo X o X -
- Hydrophobicity X o X -
.fai Fasta Index (faidx) X X X X
.flagstat samtools flag summary X - - X
.wig Wiggle data o - X -
.newick Tree Format - - - -
.pdbx Protein Structure (large) - - - -
.phylip Multiple Sequence Alignment - - - -
.cwl Common Workflow Language - - - -

Porting to vim

This is a useful tutorial to get into vim syntax:
http://vim.wikia.com/wiki/Creating_your_own_syntax_files

Enable syntax in ~/.vimrc:

syntax enable

There is essentially a vim folder in your home directory:
Make these subdirectories:

  • ~/.vim/syntax
  • ~/.vim/ftdetect
    The files in these subdirectories must match. (i.e. fasta.vim in each of those folders for *.fasta)
    in ftdetect/fasta.vim: (detects file formats)
au BufRead,BufNewFile *.fasta set filetype=fasta
au BufRead,BufNewFile *.fa set filetype=fasta

In syntax/fasta.vim: (specifications)

if exists("syntax_on")
        syntax reset
endif

syntax match comment ">.*$"
syntax match ntA "A"
syntax match ntG "G"
syntax match ntC "C"
syntax match ntT "T"

hi def link comment Identifier
highlight ntA ctermfg=Black ctermbg=Green guibg=#272822
highlight ntG ctermfg=Black ctermbg=Yellow guibg=#FF8C00
highlight ntC ctermfg=Black ctermbg=Blue guibg=#2A0AFD
highlight ntT ctermfg=Black ctermbg=Red guibg=#FD0A0A

Something kept breaking when I followed the online tutorial so I broke it down to this skeleton for now. More to come.

bioSyntax ToDo List Day 1

Example files

  • upload example files to git repo

Installer

  • Windows Installer script
  • Linux Installer script
  • Mac Installer script

Syntax for File Formats

  • SAM, BAM: Eric
  • FASTA: Anicet
  • FASTQ: Jasper
  • VCF: Alyssa
  • WIG: Artem
  • BED: Gherman
  • GTF: Jeffrey
  • PDB
  • TSV / CSV - maybe introduce ways to view these nicely in the text editor

Color Scheme

  • Define Color Scheme File
  • Gradient Coloring
  • Full IUPAC nucleotides
  • Amino Acids (various kinds)

Hackseq Day 2 Goals

Goals for Day 2 Hackseq

  • Alpha for Windows Installer -- ??
  • SAM : Eric
  • FASTA: Artem
  • FASTQ: Jasper
  • VCF: Alyssa / Jasper
  • GTF: Jeffrey
  • PDB: Germen
  • WIG / BED Testing
  • Define Color Scheme Theme File: Artem
  • Amino Acids Coloring: German

Porting to Sublime

Initially we're going to be focusing on SublimeText / YAML for all the formats; we'll port it from there.

Starting an Example File for your bioformat

In Sublime > Tools > Developer > New Syntax

Installing a syntax for

Copy the '.sublime-syntax' file to

Linux: ~/.config/sublime-text-3/Packages/User
Windows: %APPDATA%/Roaming/Sublime Text 3/Packages/
Mac: ~/Library/Application Support/Sublime Text 3/Packages/User/

NOTE: The .sublime-syntax file cannot contain any Tabs; everything is space-indented

Defining / Changing the color scheme

Overview

We'll be using 'Monoka.tmTheme' as the base theme for now.

Your bioformat syntax file should use already existing definitions in the theme file for the 'scope'

The default theme file is available here or it's zipped under
'sublime_text_3/Packages/Color Scheme - Default.sublime-package'

bioSyntax Theme

Special Biological Classes and their target 'scope'

In the bioSyntax Theme there are custom defined colors / classes for highlighting. Entities which are 'biological' such as genomic coordinates, nucleotides, software names should look the same across all formats so we are defining one naming scheme (also called Scope) for each of these bio-classes

If you have a different type of data to add; comment below and I'll add it to our list so it's color scheme works with everything here

bioSyntax Theme File

As of November 30th, 2017. There is now a single unified theme defined in Hex / ANSI / Cterm for biological classes. All of the sublime syntax variable names and the bioMonokai theme have been updated to refer to the unified theme.

Porting core syntax to gedit

Hey team,

First of all i want to focus on porting core syntax to gedit. The job done already (i need your review and critics on this job, also testing it on your laptop) is in this folder.

Completed syntax

  • fasta.lang: defining colors for fasta files;
  • fasta-zappo.lang: defining zappo colors for fasta files;
  • fasta-hydrophobicity.lang: defining hydrophobicity colors for fasta files;
  • fasta-taylor.lang: defining taylor colors for fasta files;
  • fastq.lang: defining color for fastq files.

Remaining syntax

  • sam.lang;
  • bed.lang;
  • pdb.lang;
  • vcf.lang.

I plan to work on theses files on a daily basis and finish this work as soon as possible.
I'm open to your remarks and wait for your analysis of this part of the project on what we have to do.

bioSyntax Meeting 1

Moving forward with bioSyntax we'll meet for ~1 hour to discuss the logistics of the next month, maybe also use the opportunity to hammer out some code while we're together. I've set times for 6pm PST since we have work/school. Please also indicate if you would prefer to meet at the UBC library or in the BC Cancer Research Centre (Broadway + Cambie).

We'll Meet at Wednesday Nov 1 6pm at the Irving K Barber Library and video conference in remote people.

Agenda: (add things)

  1. Define Release 1.0 'Finish Criteria' for bioSyntax (All)
    -- File Formats
    -- Software Ports
    -- Themes
  2. Report (Jasper)
    -- What research do we each need to do
  3. Installer (Eric)
  4. Documentation / Front-end
    -- Manual Installations
    -- How To: Make your own syntax
    -- How To: Contribute
  5. Syntax Specific Discussion
    -- Robust column selection (Artem)
    -- Nucleotide Color Scheme
  6. Assign Tasks
  7. ... hack till you have to go home.

Sublime-Fastq Syntax breaks when comment included

@Jwong684, When you get a moment there's a bug in the fastq-sublime syntax files. If there is text in the comment row (+ ...) then it breaks that row and the subsequent one. Also check out the other fq file from the SRA examples/nt-seq/test2.fq.gz

sublime-fq-bug

Fasta Syntax

Fasta files are appropriates for viewing:

  • DNA sequences;
  • RNA sequences;
  • and protein sequences.

I have added example files of RNA and protein sequences.

Selecting Arbitrary Nth Column in Syntax

I was working on the mostly trivial case of fasta-index format (faidx) and I think because it was so simple I found a very nice way to select columns by the order in which they appear. The only requirement right now is that it is in a tab-delimited file.

What it does is match the first column until the first tab, scopes it, then pushes to contig.length

In contig.length every non-whitespace character is selected and scoped. Then when it hits the next tab it pops out.

The third column is then selected, scoped and pushed to genomic.offset. The fourth column is selected and then popped at the tab.

etc... This push-pop back and forth with tabs can be repeated for N number of columns which means that .bed, .bedpe, .gtf, .sam, and possibly some of .vcf can now be 'solved' since we know what type of data is supposed to be in the Nth column.

Can anyone think of a reason that this won't work or will break at some edge-case?

If not, we'll need to re-work those syntaxes as I think this is a more robust approach then trying to select each column by the data range which could be there.

faidx.sublime-syntax

%YAML 1.2
---
name: faidx
file_extensions: [fa.fai,fasta.fai]
scope: source.faidx

contexts:
  main:
    # COLUMN 1
    - match: '^[\S]*\t'
      scope: coord.Chr.faidx
      push: contig.length

    # COLUMN 3
    - match: '(?<=\t)[\S]*\t'
      scope: constant.numeric.faidx
      push: genomic.offset

    # COLUMN 5
    - match: '[\S]*$'
      scope: comment.line.faidx

  contig.length:
    # COLUMN 2
    - match: '[\S]*'
      scope: coord.Start.faidx
    - match: \t
      pop: true

  genomic.offset:
    # COLUMN 4
    - match: '[\S]*'
      scope: comment.line.faidx
    - match: \t
      pop: true

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.