Giter Site home page Giter Site logo

Comments (7)

llewelld avatar llewelld commented on May 29, 2024 1

Personally, I really like this. I think the data are a worthwhile addition and I agree with @AoifeHughes that this is likely to be of interest to readers. I defer to @AoifeHughes and @yongrenjie, but as far as I'm concerned this addresses the original issue (it actually goes beyond what we asked for, but in a very positive way).

from kana.

LTLA avatar LTLA commented on May 29, 2024

We have these timings in the biorXiv version of the paper (see Table 2) but we had to cut them out to fit into JOSS's word limits. I'll copy it here for your convenience:

To evaluate the efficiency of our Wasm strategy, we compared a kana analysis in the browser to that of a native executable compiled from the same C++ libraries. We analyzed several public scRNA-seq datasets (Table 1) using the default kana parameters for both approaches, i.e., QC filtering to 3 MADs from the median; PCA on the top 2500 HVGs to obtain the top 25 PCs; SNN graph construction with 10 neighbors and multi-level community detection at a resolution of 0.5; t-SNE with a perplexity of 30; UMAP with 15 neighbors and a minimum distance of 0.01; and 8 threads for all parallel sections (i.e., web workers for kana, see below). We collected timings on an Intel Core i7-8850H CPU (2.60GHz, 6 cores, 32 GB memory) running Manjaro Linux. For convenience, we ran the kana timings in batch using Puppeteer to control a headless Chrome browser (HeadlessChrome/98.0.4758.0).

Our results indicate that kana analyses took approximately 25-50% longer to run compared to the native executable (Table 2). This is consistent with other benchmarking results (Jangda et al., 2019) where the performance gap is attributed to Wasm's design constraints and the overhead of the browser's Wasm runtime environment. Our native executable was also created with a different compiler toolchain (GCC, instead of LLVM for the Wasm binary), where the same nominal optimization level (O3) may have different effects. These results suggest that some work may still be required to completely fulfill Wasm's promise of "near-native execution". Nonetheless, the current performance is largely satisfactory for kana, and will likely improve over time as browser implementations evolve along with our understanding of the relevant optimizations.

Dataset Number of cells kana Native
Zeisel 3005 7.00 ± 0.10 5.60 ± 0.05
Paul 10368 17.59 ± 0.20 13.52 ± 0.38
Bach 25806 54.96 ± 1.13 43.33 ± 0.39
Ernst 68937 157.15 ± 7.39 114.67 ± 1.86
Bacher 104417 228.02 ± 2.85 170.32 ± 1.34
Zilionis 173954 272.265 ± 4.22 183.77 ± 2.46

I can put all this stuff back in, though as you can see, there is a lot of associated commentary, e.g., description of the datasets, description of the laptop, description of the timing parameters and configuration, some discussion of the results. Probably would double the length of the current manuscript if I also added the text about memory usage as well. So I don't know whether JOSS (or more specifically, @AoifeHughes) would be willing to consider that.

from kana.

LTLA avatar LTLA commented on May 29, 2024

In the meantime, I just added some brief details in kanaverse/kana-paper#9. Hopefully this is a satisfactory compromise.

from kana.

llewelld avatar llewelld commented on May 29, 2024

Thanks for providing the detail and for the changes to your text. For me this is really interesting material and seems relevant for JOSS; however I can also appreciate it's a problem to include it in full given the word limit.

I'm personally happy with the compromise wording. It gives a flavour of the results (although I think the summary you give in your bioRxiv paper is more useful in practice: "Our results indicate that kana analyses took approximately 25-50% longer to run compared to the native executable").

Is there a reason you don't cite your bioRxiv paper directly? This would seem like a natural thing to do and provide an easy way for the reader to drill down further into the detail.

from kana.

LTLA avatar LTLA commented on May 29, 2024

Is there a reason you don't cite your bioRxiv paper directly? This would seem like a natural thing to do and provide an easy way for the reader to drill down further into the detail.

Oh. Are we allowed to do that? Seems kinda recursive to cite a different version of the same paper.

I'm happy to do it. Just didn't know whether it was "proper".

from kana.

AoifeHughes avatar AoifeHughes commented on May 29, 2024

I think this all seems fine. It would be great if you add the reference to the bioRxiv paper.

From a quick skim of the current draft of the paper I would suggest that the Further comments section be reduced and the above mentioned table be added. Having the concrete numbers for performance values is very useful for a technical audience.

from kana.

LTLA avatar LTLA commented on May 29, 2024

Alright, as suggested, I added an abbreviated version of the table, added the citation to the bioRxiv paper, and trimmed out some stuff from the Further comments (though not too much, in order to still address #234 properly).

Changes are in kanaverse/kana-paper#11; I'll trigger a new build on the main issue.

from kana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.