Comments (7)
Personally, I really like this. I think the data are a worthwhile addition and I agree with @AoifeHughes that this is likely to be of interest to readers. I defer to @AoifeHughes and @yongrenjie, but as far as I'm concerned this addresses the original issue (it actually goes beyond what we asked for, but in a very positive way).
from kana.
We have these timings in the biorXiv version of the paper (see Table 2) but we had to cut them out to fit into JOSS's word limits. I'll copy it here for your convenience:
To evaluate the efficiency of our Wasm strategy, we compared a kana analysis in the browser to that of a native executable compiled from the same C++ libraries. We analyzed several public scRNA-seq datasets (Table 1) using the default kana parameters for both approaches, i.e., QC filtering to 3 MADs from the median; PCA on the top 2500 HVGs to obtain the top 25 PCs; SNN graph construction with 10 neighbors and multi-level community detection at a resolution of 0.5; t-SNE with a perplexity of 30; UMAP with 15 neighbors and a minimum distance of 0.01; and 8 threads for all parallel sections (i.e., web workers for kana, see below). We collected timings on an Intel Core i7-8850H CPU (2.60GHz, 6 cores, 32 GB memory) running Manjaro Linux. For convenience, we ran the kana timings in batch using Puppeteer to control a headless Chrome browser (HeadlessChrome/98.0.4758.0).
Our results indicate that kana analyses took approximately 25-50% longer to run compared to the native executable (Table 2). This is consistent with other benchmarking results (Jangda et al., 2019) where the performance gap is attributed to Wasm's design constraints and the overhead of the browser's Wasm runtime environment. Our native executable was also created with a different compiler toolchain (GCC, instead of LLVM for the Wasm binary), where the same nominal optimization level (O3) may have different effects. These results suggest that some work may still be required to completely fulfill Wasm's promise of "near-native execution". Nonetheless, the current performance is largely satisfactory for kana, and will likely improve over time as browser implementations evolve along with our understanding of the relevant optimizations.
Dataset Number of cells kana Native Zeisel 3005 7.00 ± 0.10 5.60 ± 0.05 Paul 10368 17.59 ± 0.20 13.52 ± 0.38 Bach 25806 54.96 ± 1.13 43.33 ± 0.39 Ernst 68937 157.15 ± 7.39 114.67 ± 1.86 Bacher 104417 228.02 ± 2.85 170.32 ± 1.34 Zilionis 173954 272.265 ± 4.22 183.77 ± 2.46
I can put all this stuff back in, though as you can see, there is a lot of associated commentary, e.g., description of the datasets, description of the laptop, description of the timing parameters and configuration, some discussion of the results. Probably would double the length of the current manuscript if I also added the text about memory usage as well. So I don't know whether JOSS (or more specifically, @AoifeHughes) would be willing to consider that.
from kana.
In the meantime, I just added some brief details in kanaverse/kana-paper#9. Hopefully this is a satisfactory compromise.
from kana.
Thanks for providing the detail and for the changes to your text. For me this is really interesting material and seems relevant for JOSS; however I can also appreciate it's a problem to include it in full given the word limit.
I'm personally happy with the compromise wording. It gives a flavour of the results (although I think the summary you give in your bioRxiv paper is more useful in practice: "Our results indicate that kana analyses took approximately 25-50% longer to run compared to the native executable").
Is there a reason you don't cite your bioRxiv paper directly? This would seem like a natural thing to do and provide an easy way for the reader to drill down further into the detail.
from kana.
Is there a reason you don't cite your bioRxiv paper directly? This would seem like a natural thing to do and provide an easy way for the reader to drill down further into the detail.
Oh. Are we allowed to do that? Seems kinda recursive to cite a different version of the same paper.
I'm happy to do it. Just didn't know whether it was "proper".
from kana.
I think this all seems fine. It would be great if you add the reference to the bioRxiv paper.
From a quick skim of the current draft of the paper I would suggest that the Further comments
section be reduced and the above mentioned table be added. Having the concrete numbers for performance values is very useful for a technical audience.
from kana.
Alright, as suggested, I added an abbreviated version of the table, added the citation to the bioRxiv paper, and trimmed out some stuff from the Further comments (though not too much, in order to still address #234 properly).
Changes are in kanaverse/kana-paper#11; I'll trigger a new build on the main issue.
from kana.
Related Issues (20)
- Some proposals from users HOT 11
- TypeError: e is not an Object. (evaluating '"_bioconductor_SLICE"in e') HOT 3
- [JOSS] Caption error HOT 2
- [JOSS] Add description for non-specialist audience HOT 3
- [JOSS] Comparison of web app and CLI from user perspective HOT 5
- [JOSS] Comparisons with other apps HOT 2
- [JOSS] Community guidelines HOT 2
- [JOSS] Instructions for deployment HOT 2
- [JOSS] Web app usage guidance HOT 4
- Subsetting by annotation is broken HOT 2
- move `summarizearray` from bakana to kana
- show # of cells in each group
- Fix issue with artifactdb zip file reader in explore mode
- Error that occurs when the number of cells is large HOT 7
- markers fails in explore mode if we don't have any row names HOT 1
- explore mode fails when switching between modalities
- Possible to visualize pre-done clustering? HOT 10
- missing annotations HOT 4
- Identifying expression of a gene across annotations
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kana.