Giter Site home page Giter Site logo

Comments (12)

kpu avatar kpu commented on August 29, 2024 1

Reasons to maybe retain Unified API So, unified API, while it was told this was library to combine multiple translator apps - the true intentions I can understand now to be a JS conversion layer, given that's where WASM bindings are set up. The classes are envisioned in JavaScript and translated without much thought for people on the other side to C++ (lacking constructors and missing constructs). So the API sits both in C++ and JS at the same time. So long as WASM direction for the project exist, there will be shackles on the classes exposed to WASM, and I'd rather these be on UnifiedAPI and not marian::bergamot. Therefore, I propose we retain UnifiedAPI, but shift the responsibility of maintenance entirely to Mozilla. I propose Mozilla be responsible for the glue code (i.e, conversion code) because they're lagging way behind.

Because:

1. This frees us from JS constraints on the `marian::bergamot` classes which we'd additionally have to bother about. We don't want a replacement server to have these constraints imposed.

2. This reduces friction between me and @abhi-agg as both can vary independently so long as they don't break each other. Also adheres to Conway's law. There is no chicken egg problem anymore where I have to wait for a complete API + bindings or my changes break bindings and incomplete API which the bindings use.

3. Eliminates Mozilla (@abhi-agg) from having to review code they don't have incentive to speed up review of and fastrack development of features that are eventually needed (optimal batching policy, alignments, qualityscores until they're extracted).

In general my worries are (1) WASM constraining the classes I am developing on, (2) Lack of timely feedback from Mozilla despite being a stakeholder (like what happened with Alignments PR). If there can be resolutions in both, I'm happy to collapse.

@jerinphilip The primary purpose of this repo is integration. Get on with it. If you're just coding stuff for fun and expecting Mozilla to do all the integration, then there is almost zero value added from your code.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on August 29, 2024

On Feasibility of Collapse

  1. TranslationRequest and marian::bergamot::Request cannot be collapsed, they're nowhere near parity and never will be (Request has marian internal structs, it's not going anywhere else). TranslationRequest doesn't even have the input string, it's an options dict on what they need in addition to translation. Good thing is TranslationRequest is useless everywhere and can be removed.
  2. marian::bergamot::Response, TranslationResult should be largely collapsable after resolving 3 issues (1 of which is solved in a pending PR). This is assuming that UnifiedAPI can't use QualityScore which cannot be constructed and Alignments which are missing. Our data structures will perhaps need some adjustments to maintain WASM compatibility.
  3. marian::bergamot::Service, TranslationModel is probably collapsable, given TranslationModel doesn't do anything as of now. This will however needs propagation due to isAlignmentSupported being present in places at extension. TranslationModel has a blocking API (no std::future<Response>, perhaps offends WASM?) with weird arguments (vector<string>) which creates issues with batching.

Reasons to maybe retain Unified API So, unified API, while it was told this was library to combine multiple translator apps - the true intentions I can understand now to be a JS conversion layer, given that's where WASM bindings are set up. The classes are envisioned in JavaScript and translated without much thought for people on the other side to C++ (lacking constructors and missing constructs). So the API sits both in C++ and JS at the same time. So long as WASM direction for the project exist, there will be shackles on the classes exposed to WASM, and I'd rather these be on UnifiedAPI and not marian::bergamot. Therefore, I propose we retain UnifiedAPI, but shift the responsibility of maintenance entirely to Mozilla. I propose Mozilla be responsible for the glue code (i.e, conversion code) because they're lagging way behind.

Because:

  1. This frees us from JS constraints on the marian::bergamot classes which we'd additionally have to bother about. We don't want a replacement server to have these constraints imposed.
  2. This reduces friction between me and @abhi-agg as both can vary independently so long as they don't break each other. Also adheres to Conway's law. There is no chicken egg problem anymore where I have to wait for a complete API + bindings or my changes break bindings and incomplete API which the bindings use.
  3. Eliminates Mozilla (@abhi-agg) from having to review code they don't have incentive to speed up review of and fastrack development of features that are eventually needed (optimal batching policy, alignments, qualityscores until they're extracted).

In general my worries are (1) WASM constraining the classes I am developing on, (2) Lack of timely feedback from Mozilla despite being a stakeholder (like what happened with Alignments PR). If there can be resolutions in both, I'm happy to collapse.

from bergamot-translator.

abhi-agg avatar abhi-agg commented on August 29, 2024

The following class pairs should collapse into one class.

1. `Service` and `TranslationModel`

2. `Response` and `TranslationResult`

3. `Request` and `TranslationRequest`

My understanding is the Translation classes came from the "Unified API" which was a draft of what the API could look like. During that exercise, we agreed that the API would change with actual implementation. It always does. An API written without implementation inevitably does not capture the full complexity of the task. It also adds unnecessary complexity with functionality that neither the backend nor front end actually uses. Moreover, it does not consider what is efficient to pass and there's no point to multiple format conversions.

Ideally a skeleton API class should be subsumed into the implemented class. There will be no converter between them, which is just a source of bugs.

I am in for merging the API class and the implemented class and I propose following:

My understanding is nobody is calling alignments and quality estimation, so it's easiest to merge there.

  1. Let's merge the classes you mentioned above before we add Alignment and Quality Estimation feature. @kpu IIUC, you are also proposing the same. Right?

  2. A single PR for this change please (containing nothing else but this change). It is possible because Service, Response and Request classes and their counterpart Translation* classes have feature parity if we consider current main branch.

  3. Let's maintain a clean interface (i.e. the public member functions etc.) for the API consumer. Interface gives an idea regarding what features are exposed by the class. Some examples to explain what I want to convey:

    1. We don't need these functions in the interface.
    2. Request class constructor currently requires vocabs_, segments, sourceTokenRanges which, technically an API consumer doesn't and shouldn't need to provide to create a translation request. In that sense, TranslationRequest class is more closer to what a translation request should look like from the perspective of the consumer of the API.

    I have no problem with whatever private data members + private member functions we require for the implementation.

  4. Let's bring everything to Translation* classes. Reasons:

    1. More complete documentation of the APIs exists in Translation* classes
    2. We don't need to change the extension code because it already uses TranslationModel class.
    3. We don't need to adapt JS bindings

    We can add all the data members and private member functions that we require in Translation* classes. The public member functions already give a clear idea on what a user can expect out of the APIs (we can add/modify them as well)

Please correct me if I am wrong anywhere above. I am open for discussion on these points 👍

@jerinphilip Above comments are by no means a criticism of your work. I just want to list here the concerns from the perspective of the API consumer so that both of us can work together and resolve them if we go ahead with this task 👍

Please leave Alignments and Quality estimation out of the whole conversation while discussing this issue because (IIUC) we want to make this change before having these features in the main branch.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on August 29, 2024

@abhi-agg:

@kpu IIUC, you are also proposing the same. Right?

@abhi-agg A working implementation should not be taxed because its design differs.

3i We don't need these functions in the interface.

I should be let finish #66 following the alignments PR (#46).

It is possible because Service, Response and Request classes and their counterpart Translation* classes have feature parity if we consider current main branch.

Umm.. No, at least for Response - TranslationResult. Write-up. Response is richer. Also, I need to remove Histories (#53) so regression-tests don't break.

We don't need to change the extension code because it already uses TranslationModel class. We don't need to adapt JS bindings.

You need to, for incoming features. That's the whole point. With this suggestion I run around incomplete types and we are back to square 1. This is the problem which gets worse as we go forward. Extension is using isAlignmentSupported, TranslationRequest at places despite not having a concrete object which is passed around. This is what needs to stop soon.

More complete documentation of the APIs exists in Translation* classes

I'd claim otherwise - marian::bergamot, Translation*.

Please leave Alignments and Quality estimation out of the whole conversation while discussing this issue

No, I wait another month or maybe more and do more work overall while you still work with incomplete types? Merge it in, it's harmless (doesn't break anything since it operates in marian layer), use the Response with complete concrete types in your efforts to establish feature-parity between Response and TranslationResult. I'm even willing to help actively with the conversion code to poke where stuff is missing.

from bergamot-translator.

abhi-agg avatar abhi-agg commented on August 29, 2024

As I said above, I am flexible and open for proposals on how we should go about this change. Just want to agree on 1 strategy and execute this task to save everyone's time and energy.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on August 29, 2024

The following needs to happen before a collapse to save everyone's time and energy.

  1. Shortlist PR needs to merge. It operates on Service if we change one way or the other it'll become a mess to bring it up to date.
  2. Alignment PR needs to merge. Not just QualityScore or Alignments it does the cleanup act one more level from mts untangling from deep marian:: internal dependencies. This needs to be followed by 2-3 refactors to get something stabler (which are mentioned in thread already). What's important here is that the constructs here, similar to how we came from mts -> Service are useful for you to iterate 2-3 times on at the other end and provide feedback.

I suggest we do the collapse in a 3 step approach:

  1. Get concrete implementations in marian::bergamot in. That you're missing this is the reason why you have abstract structs and classes which cannot materialize. Let's say these concretes can change, unlike the Unified API which seems to not budge.
  2. Establish equivalency between Translation* and marian::bergamot counterparts while retaining bindings on Translation*
  3. Once equivalency/parity is established, collapse either one on to the other.

Just want to agree on 1 strategy and execute this task to save everyone's time and energy.

@abhi-agg It needs to be acceptable to you that software can do a bunch of iterations until it becomes stable. If you remember, I salvaged the existing Service from mts, which I hope is an improvement. I gave you an unpolished intermediate in the interest of speeding up the iterations, which is why you have a working extension now. If I had waited for API freezing etc level perfection we wouldn't be where we are right now. Iterate until it converges to something stable is the strategy. It's not do once and not reinvent the wheel.

from bergamot-translator.

jerinphilip avatar jerinphilip commented on August 29, 2024

Edit: The following was my proposition for a collapse, in topological sorted order based on dependencies. I'm collapsing it to not bloat this thread.

Click to expand

Preliminary cleanup and provision of concrete classes

Response needs to shed weight to be able to collapse to TranslationResult. Histories was kept because alignment was requested by Mozilla and I needed to process it to get alignments. In the attempt to connect to Unified API, I have already processed these into structures enough to be able to remove Histories now to let that member go.

  • Alignments (PR #46) @jerinphilip : This does surgery to remove std::string_view, absl::string_view dependencies throughout. It has most content preprocessed as ByteRanges instead of string_views already. Recently it's reworked to reset Unified API so it won't break anything. That alignments and qualityscores of some form are extracted are at least proof of concept we have these ready. It is easier to reshape the preprocessed things into classes you bring in. @abhi-agg Please go complete review.
  • Make regression test apps independent of histories and vocab (which needs to go out) (#66). @jerinphilip
  • Abstract and hide vocabs, promise from marian::bergamot::Request through callback which builds Response (#65) @jerinphilip
  • Remove Histories to thin down Response (#53) @jerinphilip

Establish equivalency

TranslationModel - Service

  • TranslationModel @abhi-agg @jerinphilip

    • I don't understand the vector<string> use case, you need to elaborate on why you need it than specify the signature, it looks confusing. My guess is the browser wants to break texts and achieve more granular control but same speed for translation (at the moment). Please communicate.
    • Rename TranslationRequest to RequestParams or RequestOptions. It need to be a YAML string like TranslationModelConfig or give me a class like marian::Options, which can work like a dictionary. You have (key, value) pairs of switches which avoids a lot of boilerplate. No setX, getX for each X. I should be able to set and lookup keys easily. Adjust TranslationModel accordingly. Also WASM bindings.
    • Update the bergamot-translator-app to have RequestParams with legal inits (not a dummy), to be used now at Service accordingly.
  • Service @jerinphilip

    • Change translate signature to accept RequestParams in addition to the string text.
    • Remove the public vocab methods. I have already staged this in issues, you have to let me finish this behind alignments PR.
    • I see you cannot export std::future<Response>, which convert this into a blocking function and give it a different name until you have std::future capability.

I hope the above resolves Request TranslationRequest collapse as well (they can't be collapsed, they do different things). We just change the naming.

TranslationResult - Response

  • TranslationResult @abhi-agg
    • Prepare your Alignments and QualityScore structures from the concrete reference which should be in. You are welcome to reuse my definitions if you deem fit or reinvent/reshape the wheel. This time they should have proper constructors.
    • Experiment with what works for WASM bindings as well I'm assuming you will need to incorporate changes.
  • Response @jerinphilip: After your experiments with TranslationResult for what interoperates well with WASM as well
    • Change the internals to achieve parity with TranslationResult if remain differences with WASM experimentation.

Collapse

  • Move Bindings code @abhi-agg @jerinphilip
  • Check nothing breaks for rest of Service @jerinphilip
    • typedef TranslationModel Service
    • typedef Response TranslationResult
  • Merge with and use sed to standardize names for whatever is the consensus TranslationResult* or whatever.

from bergamot-translator.

kpu avatar kpu commented on August 29, 2024

I am in for merging the API class and the implemented class and I propose following:

Good.

My understanding is nobody is calling alignments and quality estimation, so it's easiest to merge there.

1. Let's merge the classes you mentioned above before we add Alignment and Quality Estimation feature. @kpu IIUC, you are  also proposing the same. Right?

I agree it would be less work to implement the alignment and quality estimating passing once rather than twice.

2. A single PR for this change please (containing nothing else but this change). It is possible because `Service`, `Response` and `Request` classes and their counterpart `Translation*` classes have feature parity if we consider current `main` branch.

The changes should be focused. i think by "single PR" you're really getting at it shouldn't have other stuff changed in them. Multiple steps towards this i.e. removing Vocab from Service as a focused change could happen, yes?

3. Let's maintain a clean interface (i.e. the [public member functions](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/service.h#L39-L68) etc.) for the API consumer. Interface gives an idea regarding what features are exposed by the class. Some examples to explain what I want to convey:
   
   1. We don't need [these functions](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/service.h#L59-L62) in the interface.

Agreed. I think removing those should be a first step.

   2. [Request class constructor](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/request.h#L39-L40) currently requires `vocabs_`, `segments`, `sourceTokenRanges` which, technically an API consumer doesn't and shouldn't need to provide to create a translation request. In that sense, [TranslationRequest](https://github.com/browsermt/bergamot-translator/blob/main/src/TranslationRequest.h#L43-L81) class is more closer to what a translation request should look like from the perspective of the consumer of the API.

I agree it should just be what the client needs. The consumer shouldn't be providing all that stuff in the constructor; the Service surely knows e.g. vocabs internally. Honestly this looks more like an internal representation of a request for the Service.

Aside: I don't know why it's vocabs_ when it should be vocab. But that's going anyway.

   I have no problem with whatever [private data members + private member functions](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/service.h#L72-L117) we require for the implementation.

4. Let's bring everything to Translation* classes. Reasons:

This is a merger. Let's take the language of dominance out of this. I don't care if the result of the merger begins with Translation or not.

   1. More complete documentation of the APIs exists in Translation* classes

I am embarrassed by the documentation of Request's constructor. Nonetheless, I believe our goal should be best design that works, which is then documented.

   2. We don't need to change the extension code because it already uses TranslationModel class.
   3. We don't need to adapt JS bindings

There will be changes to the API. After all, Mozilla asked for binary model loading which exists in C++ code and you want alignments. That said, I won't want to come in with a wrecking ball for the sake of it and want to be stable.

from bergamot-translator.

abhi-agg avatar abhi-agg commented on August 29, 2024

My understanding is nobody is calling alignments and quality estimation, so it's easiest to merge there.

1. Let's merge the classes you mentioned above before we add Alignment and Quality Estimation feature. @kpu IIUC, you are  also proposing the same. Right?

I agree it would be less work to implement the alignment and quality estimating passing once rather than twice.

Just to be sure, it means we make this change and merge it before we merge Alignment+QE PR. Right?

2. A single PR for this change please (containing nothing else but this change). It is possible because `Service`, `Response` and `Request` classes and their counterpart `Translation*` classes have feature parity if we consider current `main` branch.

The changes should be focused. i think by "single PR" you're really getting at it shouldn't have other stuff changed in them. Multiple steps towards this i.e. removing Vocab from Service as a focused change could happen, yes?

Yes and yes.

  1. Let's bring everything to Translation* classes. Reasons:

This is a merger. Let's take the language of dominance out of this. I don't care if the result of the merger begins with Translation or not.

I wrote in my comment before stating all the points that it is just a proposal and I am flexible and open for discussion on these points. Using the language of dominance wasn't my intention at all. I am sorry if it came out that way.

   1. More complete documentation of the APIs exists in Translation* classes

I am embarrassed by the documentation of Request's constructor. Nonetheless, I believe our goal should be best design that works, which is then documented.

   2. We don't need to change the extension code because it already uses TranslationModel class.
   3. We don't need to adapt JS bindings

There will be changes to the API. After all, Mozilla asked for binary model loading which exists in C++ code and you want alignments. That said, I won't want to come in with a wrecking ball for the sake of it and want to be stable.

I am fine with removing the Translation* classes and let the Service, Response and Request classes become the interface for this repository. I can modify the bindings code accordingly. No issues on this front.

3. Let's maintain a clean interface (i.e. the [public member functions](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/service.h#L39-L68) etc.) for the API consumer. Interface gives an idea regarding what features are exposed by the class. Some examples to explain what I want to convey:
   
   1. We don't need [these functions](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/service.h#L59-L62) in the interface.

Agreed. I think removing those should be a first step.

   2. [Request class constructor](https://github.com/browsermt/bergamot-translator/blob/main/src/translator/request.h#L39-L40) currently requires `vocabs_`, `segments`, `sourceTokenRanges` which, technically an API consumer doesn't and shouldn't need to provide to create a translation request. In that sense, [TranslationRequest](https://github.com/browsermt/bergamot-translator/blob/main/src/TranslationRequest.h#L43-L81) class is more closer to what a translation request should look like from the perspective of the consumer of the API.

I agree it should just be what the client needs. The consumer shouldn't be providing all that stuff in the constructor; the Service surely knows e.g. vocabs internally. Honestly this looks more like an internal representation of a request for the Service.

Aside: I don't know why it's vocabs_ when it should be vocab. But that's going anyway.

As you also said that these changes need to happen, I see following tasks at hand:

  1. Removing Translation* classes
  2. Refactoring/Cleaning up the Service, Request and Response classes to address issues just discussed above (cleaning up constructors, a cleaner interface for API user etc. etc.)
  3. Adjusting the WASM bindings

3rd is clearly my responsibility (but it can happen only after 1st and 2nd are done). I can do 1st as well. I would like to know if I can contribute in some way for 2nd.

If I missed any task in this list then please feel free to add it here 👍

from bergamot-translator.

abhi-agg avatar abhi-agg commented on August 29, 2024

@kpu @jerinphilip Please let me know which PRs belong to this issue so that I can review them 👍

On a side note, I noticed that you already merged the Alignment+QE to main branch although we agreed that we should intercept that change after this issue is resolved. If you are planing to do things differently than what we agreed here, a heads up would be nice 👍

from bergamot-translator.

jerinphilip avatar jerinphilip commented on August 29, 2024

Please let me know which PRs belong to this issue so that I can review them
@abhi-agg

In order:

  1. #85: This is a prerequisite to rework the existing marian-decoder-new's operations without vocab() and histories. Essentially a fix for what was earlier behaviour unaccounted for. This was discovered during your suggestion to remove vocab.
  2. #79: This removes vocab and fixes decoder to use a non-vocab based implementation, also removing histories.
  3. #80: Is step 2 in the task list you provided, removing Histories further from response (as it is a marian-internal structure and offends a public API). It also incorporates the desired features that's supposed to be the outcome of TranslationRequest. Hence, ResponseBuilder, the proposition, does the cleanup to reduce marian-dependencies further. This is a big PR, and I don't think I can sanely break it down without a broken intermediate. Parity (accepting TranslationRequest and more behind it) is achieved here. Which allows me to do 4.
  4. #87: This is transfer of bindings, skipping UnifiedAPI and plumbing at extension, getting the Service, Response classes to be compatible with extension and proof-of-concept that marian::bergamot is collapse ready. I don't expect this PR to merge if you have strong concerns with UnifiedAPI, and am okay about the time sunk.

from bergamot-translator.

abhi-agg avatar abhi-agg commented on August 29, 2024

In order:

1. #85: This is a prerequisite to rework the existing marian-decoder-new's operations without vocab() and histories. Essentially a fix for what was earlier behaviour unaccounted for. This was discovered during your suggestion to remove vocab.

Do I need to review this one (it was created 6 days ago and I am not a reviewer there)? In general (not just specific to this PR), if you want me to review any of the PRs then it would be great if you can add me as a reviewer explicitly. It would be more time efficient 👍

3. #80: Is step 2 in the task list you provided, removing Histories further from response (as it is a marian-internal structure and offends a public API). It also incorporates the desired features that's supposed to be the outcome of TranslationRequest. Hence, ResponseBuilder, the proposition, does the cleanup to reduce marian-dependencies further. This is a big PR, and I don't think I can sanely break it down without a broken intermediate. Parity (accepting TranslationRequest and more behind it) is achieved here. Which allows me to do 4.

4. #87: This is transfer of bindings, skipping UnifiedAPI and plumbing at extension, getting the Service, Response classes to be compatible with extension and proof-of-concept that `marian::bergamot` is collapse ready. I don't expect this PR to merge if you have strong concerns with UnifiedAPI, and am okay about the time sunk.

Both of these PRs are marked as draft. I take it as a WIP from your side. So not ready for review?

from bergamot-translator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.