Giter Site home page Giter Site logo

Comments (11)

abhi-agg avatar abhi-agg commented on July 25, 2024

There are no means yet to operate with this in 1) QualityScore (uses string_view), 2) TranslationResult (uses sentence-mappings for string_views, missing word byteranges)

@jerinphilip I am replacing all the references of std::string_view in unified APIs with

ByteRange {
    size_t beginByteOffset_;
    size_t endByteOffset_;
}

(following the style guideline) where ByteRange follows half open interval [beginByteOffset_, endByteOffset_).

If you have any concerns then please raise them otherwise I will submit a PR soon 👍

from bergamot-translator.

jerinphilip avatar jerinphilip commented on July 25, 2024

@abhi-agg You're writing the conversion code as well (Response -> TranslationResult) so feel free to do whatever suits your use-case. Don't say I didn't warn you your current edits won't suffice. I'm absolving myself from the trouble of editing to incomplete structs (and unified API altogether, seeing where it's headed). 🤦‍♂️

Here's documentation for Annotation, AnnotatedBlob, Response.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on July 25, 2024
struct ByteRange {
    size_t beginByteOffset_;
    size_t endByteOffset_;
}

@abhi-agg: Also no _, it's a struct.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on July 25, 2024

Also wouldn't begin and end suffice, given that it's ByteRange is implied from type? Can you let me know if you're dependent on my implementation in marian::bergamot:: or bringing your own, in which case you will have to write the conversion code anyway. @abhi-agg

from bergamot-translator.

abhi-agg avatar abhi-agg commented on July 25, 2024

@jerinphilip I think we can safely close this issue as it is already being handled as a part of #77. As soon as that issue gets resolved, we will close this as well. I am fine with keeping it open until then though. What do you think?

from bergamot-translator.

jerinphilip avatar jerinphilip commented on July 25, 2024

I think this will require some back and forth and #77 already covers too much ground, let's leave this here to discuss binding specifics? Collapse (transfer of bindings) I did only covers proof of concept that extension isn't broken with the changes. We will still need to discuss and synchronize on alignments, quality scores etc from the bindings perspective, which I think we can use this issue for.

There is currently no bindings for ByteRange or Annotation for convenient use at WASM.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on July 25, 2024

@abhi-agg Any updates on this front? How long will it take to have functioning ByteRanges, and perhaps bindings for Annotation?

Is it optimal to return vector of things to JavaScript all at once or have individual accessors in Annotation the way it is now?

from bergamot-translator.

lars-t-hansen avatar lars-t-hansen commented on July 25, 2024

Is it optimal to return vector of things to JavaScript all at once or have individual accessors in Annotation the way it is now?

Jumping in here without sufficient context, but if this is a question about whether it is OK to use the multi-value-return functionality of wasm to receive an Array datum on the JS side, then the answer is that this works fine but it is not going to be very fast, and it's probably best avoided if the operation is very hot. The array allocation costs some, and then that particular functionality forces a slow path through the VM for both the call and return. It would be worth measuring the performance before committing to it.

from bergamot-translator.

kpu avatar kpu commented on July 25, 2024

The question is really which is better:

  1. Copy an array of size_t to javascript then it accesses that array.
  2. For each element of the array, call from javascript to a C++ accessor returning a size_t.

I'm guessing 1 is better @lars-t-hansen? In general, we should be benchmarking where all the time is going.

from bergamot-translator.

lars-t-hansen avatar lars-t-hansen commented on July 25, 2024

Normally, I would say that the most performant way to return a bunch of int values from wasm to JS is to place them in a C array in memory while in wasm, and then return the address to JS and read the values out of the memory from JS. If JS needs to see this sequence as an array, then mapping an Int32Array onto the wasm memory at the right location is usually a pretty good solution. (There are some perf traps with UInt32Array, so avoid that if you can.)

Calling C++ accessors from JS is actually pretty fast, though, because we have a special fast path from JS jitted code directly into wasm, so this may be competitive in some cases.

+1 on benchmarking.

from bergamot-translator.

abhi-agg avatar abhi-agg commented on July 25, 2024

The question is really which is better:

1. Copy an array of size_t to javascript then it accesses that array.

2. For each element of the array, call from javascript to a C++ accessor returning a size_t.

It surely makes sense to consider the performance but, I believe, we should hold performance discussions based on how tag-translation pans out. We can benchmark the performance for JS<->Wasm once tag-translation functionality is complete.

For now, I will provide JS bindings for the accessor functions so that ByteRange for each sentence within the source and translated text can be accessed in the extension (btw, it is easy to expose the other one as well in the current settings).

from bergamot-translator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.