Comments (11)
There are no means yet to operate with this in 1) QualityScore (uses string_view), 2) TranslationResult (uses sentence-mappings for string_views, missing word byteranges)
@jerinphilip I am replacing all the references of std::string_view
in unified APIs with
ByteRange {
size_t beginByteOffset_;
size_t endByteOffset_;
}
(following the style guideline) where ByteRange
follows half open interval [beginByteOffset_, endByteOffset_).
If you have any concerns then please raise them otherwise I will submit a PR soon 👍
from bergamot-translator.
@abhi-agg You're writing the conversion code as well (Response
-> TranslationResult
) so feel free to do whatever suits your use-case. Don't say I didn't warn you your current edits won't suffice. I'm absolving myself from the trouble of editing to incomplete structs (and unified API altogether, seeing where it's headed). 🤦♂️
Here's documentation for Annotation, AnnotatedBlob, Response.
from bergamot-translator.
struct ByteRange { size_t beginByteOffset_; size_t endByteOffset_; }
@abhi-agg: Also no _, it's a struct.
from bergamot-translator.
Also wouldn't begin
and end
suffice, given that it's ByteRange
is implied from type? Can you let me know if you're dependent on my implementation in marian::bergamot::
or bringing your own, in which case you will have to write the conversion code anyway. @abhi-agg
from bergamot-translator.
@jerinphilip I think we can safely close this issue as it is already being handled as a part of #77. As soon as that issue gets resolved, we will close this as well. I am fine with keeping it open until then though. What do you think?
from bergamot-translator.
I think this will require some back and forth and #77 already covers too much ground, let's leave this here to discuss binding specifics? Collapse (transfer of bindings) I did only covers proof of concept that extension isn't broken with the changes. We will still need to discuss and synchronize on alignments, quality scores etc from the bindings perspective, which I think we can use this issue for.
There is currently no bindings for ByteRange
or Annotation
for convenient use at WASM.
from bergamot-translator.
@abhi-agg Any updates on this front? How long will it take to have functioning ByteRange
s, and perhaps bindings for Annotation
?
Is it optimal to return vector of things to JavaScript all at once or have individual accessors in Annotation
the way it is now?
from bergamot-translator.
Is it optimal to return vector of things to JavaScript all at once or have individual accessors in
Annotation
the way it is now?
Jumping in here without sufficient context, but if this is a question about whether it is OK to use the multi-value-return functionality of wasm to receive an Array datum on the JS side, then the answer is that this works fine but it is not going to be very fast, and it's probably best avoided if the operation is very hot. The array allocation costs some, and then that particular functionality forces a slow path through the VM for both the call and return. It would be worth measuring the performance before committing to it.
from bergamot-translator.
The question is really which is better:
- Copy an array of size_t to javascript then it accesses that array.
- For each element of the array, call from javascript to a C++ accessor returning a size_t.
I'm guessing 1 is better @lars-t-hansen? In general, we should be benchmarking where all the time is going.
from bergamot-translator.
Normally, I would say that the most performant way to return a bunch of int values from wasm to JS is to place them in a C array in memory while in wasm, and then return the address to JS and read the values out of the memory from JS. If JS needs to see this sequence as an array, then mapping an Int32Array onto the wasm memory at the right location is usually a pretty good solution. (There are some perf traps with UInt32Array, so avoid that if you can.)
Calling C++ accessors from JS is actually pretty fast, though, because we have a special fast path from JS jitted code directly into wasm, so this may be competitive in some cases.
+1 on benchmarking.
from bergamot-translator.
The question is really which is better:
1. Copy an array of size_t to javascript then it accesses that array. 2. For each element of the array, call from javascript to a C++ accessor returning a size_t.
It surely makes sense to consider the performance but, I believe, we should hold performance discussions based on how tag-translation pans out. We can benchmark the performance for JS<->Wasm once tag-translation functionality is complete.
For now, I will provide JS bindings for the accessor functions so that ByteRange
for each sentence within the source and translated text can be accessed in the extension (btw, it is easy to expose the other one as well in the current settings).
from bergamot-translator.
Related Issues (20)
- Ignore elements with `translate=no` attribute HOT 1
- Upgrade emsdk to the latest tag
- Expose C API
- Compile WASM release also only for generic CPU capabilities (without SIMD). HOT 7
- If input is pure punctuation or numbers, pass it through unmodified HOT 1
- [Feature Request] Make a LibreOffice Extension HOT 3
- Remove wormhole references HOT 2
- Confusing error message from WebAssembly + JS HOT 1
- Importing bergamot for Python fails without network connection. HOT 1
- Content types? HOT 1
- Descreptions what the Bergamond AddOn do on Firefox HOT 1
- Build WASM fails, for failure to build marian-dev
- Simpler, more general Worker HOT 2
- App is hillariously difficult to use HOT 1
- Add Italian language HOT 3
- Compile to .so file and install headers HOT 4
- Add Russian language
- update url models on example.sh
- Build native fix, line 87 on CMakeList.txt HOT 1
- Python package seems broken HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bergamot-translator.