Giter Site home page Giter Site logo

biowasm / biowasm Goto Github PK

View Code? Open in Web Editor NEW
195.0 10.0 18.0 5.23 MB

WebAssembly modules for genomics

Home Page: https://biowasm.com

License: MIT License

Shell 2.93% JavaScript 1.56% HTML 40.87% C 51.90% Python 0.76% Svelte 1.98%
webassembly genomics samtools bedtools javascript seqtk bioinformatics awk bowtie2 minimap2

biowasm's Introduction

biowasm logo

biowasm

Tests Deploy

A repository of genomics tools, compiled from C/C++ to WebAssembly so they can run in a web browser.

Getting started

Who uses biowasm?

Tool Why biowasm?
sandbox.bio Runs command-line tools in the browser to power interactive tutorials
42basepairs Runs samtools, bedtools, bcftools and other tools to preview genomic files
CZ ID (repo) Runs htsfile and seqtk to identify data issues before file upload
Nanopore Runs samtools to generate .bam files after basecalling in the browser
ViralWasm (repo) Runs minimap2 and ViralConsensus for viral molecular epidemiology analysis
Datagrok (repo) Runs kalign in the browser for multiple-sequence alignment analysis
bedqc (repo) Runs bedtools in the browser to validate BED files
Ribbon (repo) Runs samtools in the browser to parse, estimate coverage and subsample BAM files
fastq.bio (repo) Runs fastp in the browser to evaluate sequencing data quality

How it works

Tool Description Link
biowasm Recipes for compiling C/C++ genomics tools to WebAssembly This repo
biowasm CDN Free server hosting pre-compiled tools for use in your apps biowasm.com/cdn
Aioli Tool for running these modules in a browser, inside WebWorkers biowasm/aioli

Logo

Contributing

See CONTRIBUTING.md.

biowasm's People

Contributors

cjw85 avatar daniel-ji avatar dependabot[bot] avatar fbitti avatar niemasd avatar orangesi avatar robertaboukhalil avatar stevenweaver avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biowasm's Issues

ViralConsensus 0.0.4 not working

Testing out ViralConsensus 0.0.4 here is giving an error:

Uncaught Error: Failed to execute 'importScripts' on 'WorkerGlobalScope': The script at 'https://biowasm.com/cdn/v3/ViralConsensus/0.0.4/viral_consensus.js' failed to load.
    at Object._setup (blob:https://cdpn.io/98e1826f-9b2f-451e-83c1-04b84f5e14b2:1:7513)
    at Object.init (blob:https://cdpn.io/98e1826f-9b2f-451e-83c1-04b84f5e14b2:1:4026)
    at i (blob:https://cdpn.io/98e1826f-9b2f-451e-83c1-04b84f5e14b2:1:1015)

using stg environment:

const CLI = await new Aioli(["ViralConsensus/viral_consensus/0.0.4"], {env:"stg"});

how to run samtools.wasm in wasmer

Hello, thanks for the pretty work for samtools.wasm! wasmer is a runtime of webassembly in https://github.com/wasmerio/wasmer.
then I got this error:

$ wasmer samtools.wasm 
error: failed to run `samtools.wasm`
╰─> 1: Emscripten requires at least one imported table

So have you some suguesstion for this error?

The example usage of wasmer is like this:

$ cat simple.rs
#[no_mangle]
pub extern "C" fn sum(x: i32, y: i32) -> i32 {
    x + y
}
$ # complie it to simple.wasm
$ wasmer simple.wasm  -i sum 1 2
3

Thanks!
Si

Add bowtie2 to biowasm

  • Add bowtie2 to repo
  • Check if it compiles with Emscripten 2.0.0

See branch tools/bowtie2

kalign is failed to align more than ~3000 sequences

I have two samples of fasta-files of which the former is processed successfully but the latter is failed with the familiar exception (have already seen in #60):

RangeError: Maximum call stack size exceeded
    at kalign.wasm:0x277e
    at kalign.wasm:0xbc82
    at kalign.wasm:0xd417
    at kalign.wasm:0x6fce
    at kalign.wasm:0x7342
    at kalign.wasm:0x7353
    at kalign.wasm:0x7353
    at kalign.wasm:0x7353
    at kalign.wasm:0x7353
    at kalign.wasm:0x7353

The only difference of these files is number of sequences: 3000 and 4000.
Is the issue is linked with emscripten FS implementation? Or maybe JS-module initialization? Or what else?

It could probably stem from emcc compilation unequal to the original gcc? Since the initial linux-compiled kalign can readily process both of these files. Would it be possible at all that we faced some not-enough-memory or too-little-stack issue?

biowasm v2 - Cloudflare Workers Improvements

Stats

  • Copy over v1 stats to v2 KV store
  • Is there a more accurate way to calculate stats without incrementing counts on KV? (not guaranteed to be correct). Should we instead log events then aggregate everything in a Cloudflare Worker cron separately?
  • Filter out unknown package names in stats
  • Update Cloudflare Workers KV ID we're using so we keep it different between v1/v2

General

  • Update build links on README
  • Update CodePen snippets

create a pipeline

by Aioli, we can run a single software at once, So it's possible to create a pipeline of multiple softwares in Aioli ? That mean Aioli can run single software + pipeline both~

Allow shipping multiple binaries (e.g. with/without -msimd128)

It would be very useful to ship multiple differently compiled wasm binaries. This could be used for still being compatible with browsers that don't have WASM SIMD support, while allowing fast(er) code for browsers with WASM SIMD.

In Aioli this could I guess be done by allowing passing a function to urlModule, however this feature would also need biowasm cdn support.

Compile an existing biowasm tool

Thanks for the project.

I have looked at your file.

I wanted to compile the project CONTRIBUTING.md

I create file build.sh

# Fetch Emscripten docker image
docker pull emscripten/emsdk:2.0.25

# Create the container and mount ~/wasm to /src in the container
docker run \
    -it -d \
    -p 80:80 \
    --name wasm \
    --volume ~/wasm:/src \
    emscripten/emsdk:2.0.25

# Go into your container
docker exec -it wasm bash

# Compile seqtk
cd biowasm/
bin/compile.py --tools seqtk --versions 1.2

# This will create tools/<tool name>/build with .js/.wasm files
ls tools/seqtk/build

But i get error what is emcc not found.

I have little experience with docker. Can you suggest what I'm doing wrong? I did everything according to the instructions.

> Resetting code changes...
HEAD is now at 1a8319b r94: added "seq -S"; released 1.2
HEAD is now at 1a8319b r94: added "seq -S"; released 1.2
> Applying patch file <../patches/v1.2.patch>...
Checking patch seqtk.c...
Applied patch seqtk.c cleanly.
> Compiling...
../compile.sh: line 3: emcc: command not found
> Finalizing glue code...
ls: cannot access '../build/*.js': No such file or directory
cp tools/seqtk/build/seqtk.js /home/sergei/Desktop/system/node/biowasm/build/seqtk/1.2/
cp: cannot stat 'tools/seqtk/build/seqtk.js': No such file or directory
Return code not 0:  Command 'cp tools/seqtk/build/seqtk.js /home/sergei/Desktop/system/node/biowasm/build/seqtk/1.2/' returned non-zero exit status 1.

samtools sort not working

Hi There,

I just found a new issue: samtools sort output is not sorted. Seems it did not do anything. The content is still the same as the unsorted bam or sam files. Here is how I tested it:

  1. Upload a sam file 2.sam;
  2. samtools sort /data/2.sam -o /samtools/examples/2.bam
  3. samtools index /samtools/examples/2.bam

Then I got the error:
[E::hts_idx_push] NO_COOR reads not in a single block at the end 1 -1 [E::sam_index] Read 'M02034:479:000000000-D7LT7:1:1101:13367:1705' with ref_name='DELLA-4B-Kronos', ref_length=301, flags=73, pos=62 cannot be indexed samtools index: failed to create index for "/samtools/examples/2.bam": No such file or directory

Then I checked the contents of the 2.bam with samtools view /samtools/examples/2.bam and the reads order are the same as the sam file.

Thanks!

Support version-specific patches

Support version-specific patch files so we can host multiple versions of a tool that needs different patch files to compile to WebAssembly

Docker build not working

Hey @robertaboukhalil, the Docker build isn't working for me. It says it's missing lzma. Before I went down the debugging rabbit hole, I just wanted check and see if you're regularly using Docker for building biowasm tools and would expect it to work?

[jq] quotes not accepted as in the example

Hi,

Trying to run the example in the latest version of Chrome and Firefox, an error is raised:

jq: error: syntax error, unexpected INVALID_CHARACTER, expecting $end (Unix shell quoting issues?) at , line 1:
'.some.data'
jq: 1 compile error

Removing the quotes in the jq query fixes the issue, but it means that many queries can't be passed (because they require quotes to be valid).

Can't run kalign multiple times in a row

I am trying to run kalign multiple times to feed it different inputs. But if I do:

let CLI = await new Aioli("kalign/3.3.1");
await CLI.fs.writeFile("input.fa", fasta);
let output = await CLI.exec(`kalign input.fa -f fasta -o result.fasta`);
let out = await CLI.cat("result.fasta");

it executes once without any failures. The subsequent runs of this piece of code lead to the exception:

RangeError: Maximum call stack size exceeded
    at kalign.wasm:0x277e
    at kalign.wasm:0xbc82
    at kalign.wasm:0xd417
    at kalign.wasm:0xd9e1
    at kalign.wasm:0xdd6e
    at kalign.wasm:0xdd7f
    at kalign.wasm:0xdd7f
    at kalign.wasm:0xdd7f
    at kalign.wasm:0xdd7f
    at kalign.wasm:0xdd7f

OK I think. The instance of CLI should not probably be created each time the tool is run. Then I should call it once and reuse for subsequent runs. This time for the second and so on runs we end with:

Kalign (3.3.1)

Copyright (C) 2006,2019,2020,2021 Timo Lassmann

This program comes with ABSOLUTELY NO WARRANTY; for details type:
`kalign -showw'.
This is free software, and you are welcome to redistribute it
under certain conditions; consult the COPYING file for details.

Please cite:
  Lassmann, Timo.
  "Kalign 3: multiple sequence alignment of large data sets."
  Bioinformatics (2019) 
  https://doi.org/10.1093/bioinformatics/btz795


WARNING: AVX2 instruction set not found!
         Kalign will not run optimally.


Usage: kalign  -i <seq file> -o <out aln> 

Options:

   --format           : Output format. [Fasta]
   --reformat         : Reformat existing alignment. [NA]
   --gpo              : Gap open penalty. [5.5]
   --gpe              : Gap extension penalty. [2.0]
   --tgpe             : Terminal gap extension penalty. [1.0]
   --version (-V/-v)  : Prints version. [NA]

Examples:

Passing sequences via stdin:

   cat input.fa | kalign -f fasta > out.afa

Combining multiple input files:

   kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa

[2022-02-03 16:27:08] :     LOG : No input files

But the input file is there. I checked that with await CLI.fs.readdir('.'):
['.', '..', 'result.fasta', 'input.fa']

So the question is how can we run a tool more than once?

problem with chrome

Hello, i tried to use webassembly version of muscle, the application work fine ...... but i use some svg graphics on my web page and the result is very strange because when you move the mouse the svg graphics is flashing and some graphics part disappeared... If I remove muscle there is no flashing...
I tried the same code on firefox and thats work fine.

Do you have an idea ?

in any case it's really a great job to offer webassembly versions of bioinformatics tools

Fastp Reads Count Constraint

I was trying to run Biowasm's fastp package on a large fastq file of 1 million sequences and it wasn't running - the CPU usage would be at around 20% on my computer and the memory would maintain ~550MB. I did the same for 600k sequences and once again it stalled at high 550MB. When I ran 500k sequences however, it worked fine, reaching the same memory usage and even going over, but still finishing. Something interesting to note is that I can select two 500k sequences and run them one after the other and it works fine (and so appending the two together simulates the behavior of running fastp on 1 million sequences).

@robertaboukhalil followed up on this: "I traced the issue back to the variables PACK_SIZE and PACK_IN_MEM_LIMIT, which when multiplied together gives 500,000. When fastp processes >500K reads, it runs a usleep() command, which the browser can't handle. Commenting out the sleep command seems to fix the issue."

bhtsne - send back cluster info to main thread

Currently, bhtsne only sends the final result to the main thread, but not intermediate results.

To show progress to the user, it would be useful to:

  • Send back parsed row names to the main thread (to use in legend)
  • Send back intermediate matrices so they can be plotted

This depends on biowasm/aioli#18

Error when compiling bcftools

Hello.

I made a clone of the repository.

Copied all modules.

But several modules did not compile.

These are the modules:
bcftools, coreutils, gawk, grep, ivar, kalign, modbam2bed, samtools, sed

Log for bcftools

What is wrong on complie ?

 /usr/bin/python3.8 /home/sergey/Desktop/newkind/db-control/node/biowasm/bin/compile.py --tools bcftools --versions  1.10
 /usr/bin/python3.8 /home/sergey/Desktop/newkind/db-control/node/biowasm/bin/compile.py --tools bcftools --versions  1.10
git submodule update --init --recursive tools/bcftools/src/ && git submodule status tools/bcftools/src/
9f0a0a2451bb64e52a12c4a586ffa5744a4bd965 tools/bcftools/src (1.10)
    git submodule update --init --recursive tools/htslib/src/ && git submodule status tools/htslib/src/
7c16b5665daf4b2af82574d24f1649f3c385fe2c tools/htslib/src (1.10)
    mkdir -p /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10
    bin/compile.sh tools/htslib 1.10
——————————————————————————————————————————————————
🧬 htslib, branch '1.10'
——————————————————————————————————————————————————
> Resetting code changes...
HEAD is now at 7c16b56 Release 1.10
Removing a.wasm
HEAD is now at 7c16b56 Release 1.10
> Applying patch file <../patches/1.10.patch>...
Checking patch Makefile...
Checking patch version.sh...
Applied patch Makefile cleanly.
Applied patch version.sh cleanly.
> Compiling...
Reading package lists...
Building dependency tree...
Reading state information...
autoconf is already the newest version (2.69-11.1).
libbz2-dev is already the newest version (1.0.8-2).
libcurl4-gnutls-dev is already the newest version (7.68.0-1ubuntu2.13).
liblzma-dev is already the newest version (5.2.4-1ubuntu1.1).
libssl-dev is already the newest version (1.1.1f-1ubuntu2.16).
zlib1g-dev is already the newest version (1:1.2.11.dfsg-2ubuntu1.4).
0 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
rm -f test/*.tmp test/*.tmp.* test/longrefs/*.tmp.* test/tabix/*.tmp.* test/tabix/FAIL* header-exports.txt shlib-exports-so.txt
rm -f *.o *.pico cram/*.o cram/*.pico test/*.o test/*.dSYM version.h
rm -f hts-object-files
rm -f libhts.so libhts.so.*
rm -f libhts.a bgzip htsfile tabix  test/hts_endian test/fieldarith test/hfile test/pileup test/sam test/test_bgzf test/test_kstring test/test_realn test/test-regidx test/test_str2int test/test_view test/test_index test/test-vcf-api test/test-vcf-sweep test/test-bcf-sr test/fuzz/hts_open_fuzzer.o test/test-bcf-translate test/test-parse-reg test/thrash_threads1 test/thrash_threads2 test/thrash_threads3 test/thrash_threads4 test/thrash_threads5 test/thrash_threads6 test/thrash_threads7
configure: ./configure "CFLAGS=-s USE_ZLIB=1 -s USE_BZIP2=1" --disable-lzma
checking for gcc... /home/sergey/emsdk/upstream/emscripten/emcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/home/sergey/Desktop/newkind/db-control/node/biowasm/tools/htslib/src':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
emconfigure: error: './configure "CFLAGS=-s USE_ZLIB=1 -s USE_BZIP2=1" --disable-lzma' failed (returned 1)
make: make tabix CC=emcc AR=emar CFLAGS=-O2 -s USE_ZLIB=1 -s USE_BZIP2=1 LDFLAGS=-s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o tabix.o tabix.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o kfunc.o kfunc.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o knetfile.o knetfile.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o kstring.o kstring.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o bcf_sr_sort.o bcf_sr_sort.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o bgzf.o bgzf.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o errmod.o errmod.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o faidx.o faidx.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o header.o header.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o hfile.o hfile.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o hfile_net.o hfile_net.c
echo '#define HTS_VERSION_TEXT "1.10"' > version.h
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o hts.o hts.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o hts_os.o hts_os.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o md5.o md5.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o multipart.o multipart.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o probaln.o probaln.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o realn.o realn.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o regidx.o regidx.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o region.o region.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o sam.o sam.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o synced_bcf_reader.o synced_bcf_reader.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o vcf_sweep.o vcf_sweep.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o tbx.o tbx.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o textutils.o textutils.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o thread_pool.o thread_pool.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o vcf.o vcf.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o vcfutils.o vcfutils.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_codecs.o cram/cram_codecs.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_decode.o cram/cram_decode.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_encode.o cram/cram_encode.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_external.o cram/cram_external.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_index.o cram/cram_index.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_io.o cram/cram_io.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_samtools.o cram/cram_samtools.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/cram_stats.o cram/cram_stats.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/mFILE.o cram/mFILE.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/open_trace_file.o cram/open_trace_file.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/pooled_alloc.o cram/pooled_alloc.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/rANS_static.o cram/rANS_static.c
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o cram/string_alloc.o cram/string_alloc.c
emar -rc libhts.a kfunc.o knetfile.o kstring.o bcf_sr_sort.o bgzf.o errmod.o faidx.o header.o hfile.o hfile_net.o hts.o hts_os.o md5.o multipart.o probaln.o realn.o regidx.o region.o sam.o synced_bcf_reader.o vcf_sweep.o tbx.o textutils.o thread_pool.o vcf.o vcfutils.o cram/cram_codecs.o cram/cram_decode.o cram/cram_encode.o cram/cram_external.o cram/cram_index.o cram/cram_io.o cram/cram_samtools.o cram/cram_stats.o cram/mFILE.o cram/open_trace_file.o cram/pooled_alloc.o cram/rANS_static.o cram/string_alloc.o  
/home/sergey/emsdk/upstream/emscripten/emranlib libhts.a
emcc -s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2 -o ../build/tabix.js tabix.o libhts.a -lbz2 -lz   -lpthread
make: make htsfile CC=emcc AR=emar CFLAGS=-O2 -s USE_ZLIB=1 -s USE_BZIP2=1 LDFLAGS=-s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o htsfile.o htsfile.c
emcc -s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2 -o ../build/htsfile.js htsfile.o libhts.a -lbz2 -lz   -lpthread
make: make bgzip CC=emcc AR=emar CFLAGS=-O2 -s USE_ZLIB=1 -s USE_BZIP2=1 LDFLAGS=-s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2
emcc -O2 -s USE_ZLIB=1 -s USE_BZIP2=1 -I.  -c -o bgzip.o bgzip.c
emcc -s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -O2 -o ../build/bgzip.js bgzip.o libhts.a -lbz2 -lz   -lpthread
> Finalizing glue code...
    cp tools/htslib/build/tabix.js /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/tabix.js | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
    cp tools/htslib/build/tabix.wasm /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/tabix.wasm | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
    cp tools/htslib/build/htsfile.js /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/htsfile.js | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
    cp tools/htslib/build/htsfile.wasm /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/htsfile.wasm | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
    cp tools/htslib/build/bgzip.js /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/bgzip.js | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
    cp tools/htslib/build/bgzip.wasm /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/
    md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/htslib/1.10/bgzip.wasm | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
mkdir -p /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10
bin/compile.sh tools/bcftools 1.10
——————————————————————————————————————————————————
🧬 bcftools, branch '1.10'
——————————————————————————————————————————————————
> Resetting code changes...
HEAD is now at 9f0a0a2 Release 1.10
Removing a.wasm
HEAD is now at 9f0a0a2 Release 1.10
> Applying patch file <../patches/1.10.patch>...
Checking patch Makefile...
Checking patch main.c...
Checking patch version.sh...
Applied patch Makefile cleanly.
Applied patch main.c cleanly.
Applied patch version.sh cleanly.
> Compiling...
configure.ac:94: warning: AC_CONFIG_SUBDIRS: you should use literals
../../lib/autoconf/status.m4:1097: AC_CONFIG_SUBDIRS is expanded from...
m4/ax_with_htslib.m4:55: AX_WITH_HTSLIB is expanded from...
configure.ac:94: the top level
configure.ac:94: warning: AC_CONFIG_SUBDIRS: you should use literals
../../lib/autoconf/status.m4:1097: AC_CONFIG_SUBDIRS is expanded from...
m4/ax_with_htslib.m4:55: AX_WITH_HTSLIB is expanded from...
configure.ac:94: the top level
configure: ./configure --with-htslib=../../htslib/src/ "CFLAGS=-s USE_ZLIB=1 -s USE_BZIP2=1"
checking for gcc... /home/sergey/emsdk/upstream/emscripten/emcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... configure: error: in `/home/sergey/Desktop/newkind/db-control/node/biowasm/tools/bcftools/src':
configure: error: cannot run C compiled programs.
If you meant to cross compile, use `--host'.
See `config.log' for more details
emconfigure: error: './configure --with-htslib=../../htslib/src/ "CFLAGS=-s USE_ZLIB=1 -s USE_BZIP2=1"' failed (returned 1)
make: make bcftools CC=emcc AR=emar CFLAGS=-O2 -s USE_ZLIB=1 -s USE_BZIP2=1 LDFLAGS=-s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=["callMain","FS","PROXYFS","WORKERFS"] -s MODULARIZE=1 -s ENVIRONMENT="web,worker" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -s ERROR_ON_UNDEFINED_SYMBOLS=0 -O2 --preload-file test/annotate.vcf@/bcftools/annotate.vcf
config.mk:34: ../htslib/htslib_static.mk: No such file or directory
make: *** No rule to make target '../htslib/htslib_static.mk'.  Stop.
emmake: error: 'make bcftools CC=emcc AR=emar "CFLAGS=-O2 -s USE_ZLIB=1 -s USE_BZIP2=1" "LDFLAGS=-s USE_ZLIB=1 -s INVOKE_RUN=0 -s FORCE_FILESYSTEM=1 -s EXPORTED_RUNTIME_METHODS=[\"callMain\",\"FS\",\"PROXYFS\",\"WORKERFS\"] -s MODULARIZE=1 -s ENVIRONMENT=\"web,worker\" -s ALLOW_MEMORY_GROWTH=1 -lworkerfs.js -lproxyfs.js -s ERROR_ON_UNDEFINED_SYMBOLS=0 -O2 --preload-file test/annotate.vcf@/bcftools/annotate.vcf"' failed (returned 2)
> Finalizing glue code...
ls: cannot access '../build/*.js': No such file or directory
cp tools/bcftools/build/bcftools.js /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/
cp: cannot stat 'tools/bcftools/build/bcftools.js': No such file or directory
md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.js | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
md5sum: /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.js: No such file or directory
cp tools/bcftools/build/bcftools.wasm /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/
cp: cannot stat 'tools/bcftools/build/bcftools.wasm': No such file or directory
md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.wasm | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
md5sum: /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.wasm: No such file or directory
cp tools/bcftools/build/bcftools.data /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/
cp: cannot stat 'tools/bcftools/build/bcftools.data': No such file or directory
md5sum /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.data | sed 's|/home/sergey/Desktop/newkind/db-control/node/biowasm/build/||' >> /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp
md5sum: /home/sergey/Desktop/newkind/db-control/node/biowasm/build/bcftools/1.10/bcftools.data: No such file or directory
rm /home/sergey/Desktop/newkind/db-control/node/biowasm/build/manifest.stg.tmp

What is wrong on compile ?

Support for Cloudflare Workers

Since Cloudflare Workers are powered by V8, it should be possible to use biowasm modules as is with Cloudflare Workers. The key differences are: the code runs in a ServiceWorker, and there is a global variable WASM_MODULE that should contain the contents of the .wasm file (whereas in the browser, the .js file created by Emscripten downloads the .wasm file from a URL).

It should be possible for biowasm modules to support both Cloudflare Workers and the browser if we set -s TEXTDECODER=0 -s ENVIRONMENT="web" for all packages, along with defining a pre.js file that declares const document = this; if document is not already defined.

Then, the user can use robertaboukhalil/cf-workers-emscripten to deploy the compiled biowasm modules to Cloudflare Workers.

Add stats to CDN

  • Cloudflare Worker cron for estimating CDN usage stats
  • Cloudflare Worker page for plotting stats

t-coffee

Originally posted by @orangeSi in biowasm/aioli#40 (comment)

          when I use the same way to complie(```emscripten/emsdk:2.0.34```) software ```t_coffee``` (http://tcoffee-packages.s3-website.eu-central-1.amazonaws.com/#Stable/Latest/) to wasm and got ```t_coffee.wasm``` without error, but when run in Chrome get error:
[WebWorker] Executing t_coffee -seq=input.fa -reg    args=null
20b6be06-daba-47f1-bec0-8e1412abe74c:1 RuntimeError: unreachable
    at set_nproc(int) (t_coffee.wasm:0x1689fa)
    at t_coffee_dpa(int, char**) (t_coffee.wasm:0x4735be)
    at batch_main(int, char**) (t_coffee.wasm:0x4538ab)
    at main (t_coffee.wasm:0x44d553)
    at VM13 t_coffee.js:1581:22
    at Object.callMain (VM13 t_coffee.js:5618:15)
    at Object.exec (20b6be06-daba-47f1-bec0-8e1412abe74c:1:5793)
    at i (20b6be06-daba-47f1-bec0-8e1412abe74c:1:1015)

detail of at set_nproc(int) (t_coffee.wasm:0x1689fa) is as below:

  (func $set_nproc(int) (;882;) (param $var0 i32) (result i32)
    (local $var1 i32)
    (local $var2 i32)
    (local $var3 i32)
    (local $var4 i32)
    (local $var5 i32)
    global.get $__stack_pointer
    local.set $var1
    i32.const 16
    local.set $var2
    local.get $var1
    local.get $var2
    i32.sub
    local.set $var3
    local.get $var3
    local.get $var0
    i32.store offset=12
    local.get $var3
    i32.load offset=12
    local.set $var4
    i32.const 0
    local.set $var5
    local.get $var5
    local.get $var4
    i32.store offset=696988
    unreachable
  )

Do you occur the silimar error as above?

t_coffee.wasm and t_coffee.js is in t_coffee.wasm.js.zip

Thanks~
Si

Implement some execve mechanism

I am still trying to get MMseqs2 fo fully work in web assembly. For this I would need execve to work. In MMseqs2 we use bash basically as a scripting language/workflow engine.

Essentially we have:

MMseqs2 workflow (c++)
  --> execve
      --> bash (or dash, busybox ash, any posix sh)
          --> execve
              --> mmseqs call1 (c++)
              --> mmseqs call2 
              --> ...

I've also managed to compile https://github.com/mgree/libdash for biowasm and I implemented something to call into javascript/aioli from C++ if an execve system call is encountered in milot-mirdita@0959917.

Now I would need to implement an execve function in aioli.worker.js to spawn a new wasm module, sync the file system, wait until everything finishes, and sync the file system again.

I could start hacking something, but I don't really have a bigger picture of how this feature should look like. If this should be integrated with aioli/biowasm somehow then it would need also some kind of PATH resolution through the biowasm index I guess? Do you have any ideas/opinions?

Alternatively, I could think of implementing this 100% inside MMseqs2 somehow, without additional biowasm/aioli support.

biowasm v2

biowasm v2

Allow bioinformatics WebAssembly modules to share the same virtual filesystem, without manually copying or transferring data between each tool's thread. This is essential when running multiple tools where the output of one is used as the input of the next one.

Summary

v1 v2
WebAssembly Compilation -s MODULARIZE=0 -s MODULARIZE=1
Modules per WebWorker 1 per WebWorker All modules of interest run in one WebWorker
WebWorker Communication postMessage Comlink
Sharing Virtual File System Transfer File objects Use PROXYFS!
Running samtools index Not supported Symlink
CDN Path cdn.biowasm.com/<tool>/<version>/ cdn.biowasm.com/v2/<tool>/<version>/
Wasm feature detection wasm-feature-detect Same
Where the modules run WebWorkers + WORKERFS Same

Details

WebAssembly Compilation

Emscripten; but compile each tool as a separate module (-s MODULARIZE=1), i.e. the .js file will contain a Module function that initializes the module and returns a Promise that resolves when the module is loaded. This is to encapsulate each module so we can initialize multiple of them on the same page

Where the modules run

Use WebWorkers to mount large local files without loading any of their contents into memory; using WORKERFS

Getting around read-only WORKERFS

Currently, running a command such as samtools index /data/abc.bam will fail when it wants to create the index file /data/abc.bam.bai since /data (WORKERFS) is read-only. An alternative is to create a symlink to the file on WORKERFS from a different path which is writeable. It would be useful if Aioli automated this process for the end user.

Backwards compatibility

Since biowasm v2 + Aioli v2 won't be backwards compatible, we'll keep the existing CDN paths as is and create a separate path for v2.

To do

Build

  • Make sure we can still deploy CDN v1 in case it's needed
  • Copy over Aioli code from updated path
  • Ability to launch compilation on a custom branch
  • Ability to deploy CDN without re-compiling to WebAssembly
  • Add playground/web/stats to GitHub Actions workflow
  • Update config/tools.json: "branch": "v2" to 2.0.0 once the branch is tagged accordingly

General

  • Use -s MODULARIZE=1 and export PROXYFS
  • Create base module, which will host the main virtual file system
  • Update deploy.sh so we compile the base module without needing a separate repo

README

  • At the top, add a description of each supported tool
  • Remove mentions of .html files (no longer generated by Emscripten)
  • Update Emscripten version
  • Update make instructions
  • Under Getting Started, point to Aioli for more examples

Support loading biowasm modules locally or from CDN

Currently, .js files hosted on the CDN assume that the corresponding .wasm and .data files will also be pulled from CDN, as opposed to from local files. It would be useful if biowasm could autodetect where the JS file is being loaded from and load from wasm/data files from the same place, whether that's locally or from the CDN.

"samtools index" did not work

Hello, I tried the samtools playground, but seems samtools index did not work. Here is the message:

samtools index: failed to create or write index

Thanks.

Refactor + simplify

  • Remove compilation out of Makefiles (difficult to modify the scripts)
  • Use git patch files instead of sed to denote changes to code base
  • Simplify repo structure
  • Add README

Correctly set the git version of repos

Biowasm generally patches the source code of tools so that they can be compiled to WebAssembly. This results in version numbers being appended with -dirty.

e.g. samtools --version outputs 1.10-dirty instead of just 1.10

We should patch up variables such as PACKAGE_STRING / PACKAGE_VERSION (and their equivalents in other tools) to remove the -dirty suffix.

Adding agc to biowasm

Hey, Robert! I was hoping to add the Assembled Genomes Compressor (agc) to biowasm, as it's really good at compressing collections of viral (as well as other) genomes. I was able to add the agc repo as src in my fork of the biowasm repo:

https://github.com/niemasd/biowasm/tree/main/tools/agc

But I'm struggling to actually get it to compile locally. I tried writing a compile.sh as follows (note that we only need the agc Makefile target):

emmake make agc \
    CXX=em++ \
    EXE=../build/agc.js \
    CFLAGS="-fPIC -m64 -mavx -msimd128 -std=c++17 $EM_FLAGS" \
    LIBS="-lm -lpthread"

Would you be able to help me with adding agc to biowasm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.