Giter Site home page Giter Site logo

Comments (3)

TGuiMel avatar TGuiMel commented on August 19, 2024

Issue 1 could be solved by extracting paragraph instead of text elements, then merging all the text elements of each paragraph into a single string. You can split the paragraph if you would rather detect the source language and translate by sentence.

The downside is that all formatting is lost. Perhaps the TranslateArray2 method, which returns alignment information, could be used to restore the original formatting in the translated string.

from documenttranslator-legacy.

jsypkens avatar jsypkens commented on August 19, 2024

Right now TranslateArray2 doesn't support neural network translation, which puts a slight hamper on that approach.

For issue 2: yesterday I modified the detection as described above. It now processes the entire document first with "DetectArray", then filters out any array elements where the detected language matches the target language for the translation. Then it passes only the remaining elements through the translation, and updates the documents with the translations for those elements.

It seems like a decent approach, and anecdotally it appears to have resolved my issue with this one German document. We need to see how it performs on a wider variety of documents to determine if it's a scalable approach.

from documenttranslator-legacy.

chriswendt1 avatar chriswendt1 commented on August 19, 2024

@jsypkens, did your approach succeed? Do you want to make a pull request with your change?

from documenttranslator-legacy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.