Giter Site home page Giter Site logo

epaule / biofast Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lh3/biofast

0.0 0.0 0.0 144 KB

Benchmarking programming languages/implementations for common tasks in Bioinformatics (did a fork to add some crystal implementations))

JavaScript 3.11% Python 7.65% C 32.01% Lua 4.56% Scala 9.54% R 1.42% D 4.33% Go 3.82% F# 4.96% Rust 1.90% Nim 9.47% Julia 9.90% Crystal 5.71% Makefile 1.61%

biofast's Introduction

Introduction

Biofast is a small benchmark for evaluating the performance of programming languages and implementations on a few common tasks in the field of Bioinformatics. It currently includes two benchmarks: interval query and FASTQ parsing. Please see also the companion blog post.

Results

Setup

We ran the test on a CentOS 7 server with two EPYC 7301 CPUs and 1TB memory. The system comes with gcc-4.8.5, python-3.7.6, nim-1.2.0, julia-1.4.1, go-1.14.3, luajit-322db02 and k8-0.2.5. Relatively small libraries are included in the lib directory directory.

We tried to avoid other active processes when test programs were running. Timing in this page was obtained with hyperfine, which reports CPU time averaged in at least ten rounds. Peak memory was often measured only once as hyperfine doesn't report memory usage.

Full results can be found in the bedcov and fqcnt directories, respectively. This README only shows one implementation per language. We exclude those binding to C libraries and try to select the one implementing a similar algorithm to the C version.

Computing the depth and breadth of coverage from BED files

In this benchmark, we load one BED file into memory. We stream another BED file and compute coverage of each interval using the cgranges algorithm (see the C++ header for algorithm details). The output all programs should be identical "bedtools coverage". In the table below, "t" stands for CPU time in seconds and "M" for peak memory in mega-bytes. Subscripts "g2r" and "r2g" correspond to the following two command lines, respectively:

bedcov ex-rna.bed ex-anno.bed  # g2r
bedcov ex-anno.bed ex-rna.bed  # r2g

Both input BED files can be found in biofast-data-v1.tar.gz from the download page.

Program Language tg2r (s) Mg2r (Mb) tr2g (s) Mr2g (Mb)
bedcov_c1_cgr.c C 5.2 138.4 10.7 19.1
bedcov_cr1_klib.cr Crystal 8.8 319.6 14.8 40.7
bedcov_nim1_klib.nim Nim 16.6 248.4 26.0 34.1
bedcov_jl1_klib.jl Julia 25.9 428.1 63.0 257.0
bedcov_go1.go Go 34.0 318.9 21.8 47.3
bedcov_js1_cgr.js Javascript 76.4 2219.9 80.0 316.8
bedcov_lua1_cgr.lua LuaJIT 174.7 2668.0 218.9 364.6
bedcov_py1_cgr.py PyPy 17332.9 1594.3 5481.2 256.8
bedcov_py1_cgr.py Python >33770.4 2317.6 >20722.0 313.7

FASTQ parsing

In this benchmark, we parse a 4-line FASTQ file consisting of 5,682,010 records and report the number of records and the total length of sequences and quality. The input file is M_abscessus_HiSeq.fq in biofast-data-v1.tar.gz from the download page. In the table below, "tgzip" gives the CPU time in seconds for gzip'd input and "tplain" gives the time for raw input without compression.

Program Language tgzip (s) tplain (s) Comments
fqcnt_rs2_needletail.rs Rust 9.3 0.8 needletail; fasta/4-line fastq
fqcnt_c1_kseq.c C 9.7 1.4 multi-line fasta/fastq
fqcnt_cr1_klib.cr Crystal 9.7 1.5 kseq.h port
fqcnt_nim1_klib.nim Nim 10.5 2.3 kseq.h port
fqcnt_jl1_klib.jl Julia 11.2 2.9 kseq.h port
fqcnt_js1_k8.js Javascript 17.5 9.4 kseq.h port
fqcnt_go1.go Go 19.1 2.8 4-line only
fqcnt_lua1_klib.lua LuaJIT 28.6 27.2 partial kseq.h port
fqcnt_py2_rfq.py PyPy 28.9 14.6 partial kseq.h port
fqcnt_py2_rfq.py Python 42.7 19.1 partial kseq.h port

biofast's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.