Giter Site home page Giter Site logo

microsofttranslator / documenttranslation Goto Github PK

View Code? Open in Web Editor NEW
131.0 16.0 29.0 1.46 MB

Command Line tool and Windows application for document translation, a local interface to the Azure Document Translation service for Windows, macOS and Linux.

License: Other

C# 98.82% Batchfile 1.18%
document-translation translator translate microsoft-cognitive-services translator-resource azure-cognitive-services translation

documenttranslation's Introduction

Microsoft Document Translation

Translate local files or network files in many different formats, to more than 100 different languages. Supported formats include HTML, PDF, all Office document formats, Markdown, MHTML, Outlook .MSG, XLIFF, CSV, TSV and plain text. The complete list of document formats is here.

You can select up to 1000 files and translate them to one or more different languages with a single command. The Windows UI gives you options to comfortably select source files, one or more target languages, and the folder you want to deposit the translations in. It comes with a command line utility that does the same thing using a command line interface. Document Translation uses the Azure Translator Service to perform the translations. You need a subscription to Azure, and register a Translator resource as well as a storage resource. The documentation gives detailed instructions on how to obtain those.

For the translation you can specify a glossary (custom dictionary) to use. You can also make use of a custom translation system you may have built with Custom Translator.

You can manage the credentials for accessing the Azure services in Azure Key Vault - the app will read it from there, based on your identity. Good if you want to manage the credentials centrally.

Works with Azure sovereign clouds.

Document Translation UI

The main UI provides document translation: Multiple documents to multiple languages.

Main UI

Text Translation UI

A simple copy-and-paste text translation interface is present in the Windows UI.

Text Translate

Download

A Windows installer (.MSI) and signed binaries (.ZIP) for manual installation on other OSes are provided in the releases folder.

Documentation

See the complete documentation of the tool.

The documentation is stored in the /docs folder of the project.

Implementation

Document Translation is written and compiled for .Net 6. The command line utility should be compatible with other platforms running .Net 6, namely MacOS and Linux. Tested on Windows 10, Windows 11 and Mac OS X at this point. Please let us know via an issue if you find problems with other platforms running .Net 6. Signed binaries are provided in the releases folder. To compile yourself, run Visual Studio 2022 and have the .Net 6 SDK installed. You can compile and run the tool in Visual Studio 2022.

This tool makes use of the Azure Document Translation service. The Azure Document Translation service translates a set of documents that reside in an Azure storage container, and delivers the translations in another Azure storage container. This app provides a local interface to that service, allowing you to translate a locally residing file or a folder, and receiving the translation of these documents in a local folder. The tool uploads the local documents, invokes the translation, monitors the translation progress, downloads the translated documents to your local machine, and then deletes the containers from the service. Each run is independent of each other by giving the containers it uses a unique name within the common storage account. Multiple people may run translations concurrently, using the same credentials and the same storage account.

Project "doctr" contains the command line processing based on Nate McMaster's Command Line Utilities. All user interaction is handled here. Project 'DocumentTranslationService' contains three relevant classes: DocumentTranslationService handles all the interaction with the Azure service. DocumentTranslationBusiness handles the local file operations and business logic. Class 'Glossary' handles the upload of the glossary, when a glossary is specified.

Works with Azure sovereign clouds. The app accepts fully qualified service endpoints.

Privacy and Security

This client side app is a lightweight frontend to the Azure Document Translation service. The Azure Document Translation service uses Azure Blob Storage to read the documents to be translated from and it deposits the translated documents into Azure Blob Storage. All processing and storage is within the user-provided accounts. This app does not have its own Azure credentials; user supplies the identities and authentication for Azure services. In a successful run, the app uploads the user-suplied local documents to Azure Blob Storage in user's Azure account, initiates the translation, waits for completion, downloads the translated documents to the user-specified location, and then deletes all original and translated copies of the documents from Azure Blob Storage. In an unsuccessful run the deletion may be skipped, leaving abandoned storage containers behind. On average every 10th run of the app will automatically delete any left-behind storage containers that are older than one week. The command line command doctr clear forces a deletion of storage containers older than one week. For a faster deletion of storage containers, user will have to perform the deletion manually within the Azure storage account. A faster deletion has the chance to disrupt the translation runs of other users using the same Azure credentials.

The Azure privacy statement applies.

The app stores the Azure credentials in a settings file in JSON format, unencrypted, in the user's app settings folder: C:\Users\<user>\AppData\Roaming\Document Translation as appsettings.json on Windows, and in the /usr/ folder on MacOS and other Unix flavors.
To avoid storing any Azure credentials on the client, please use the Azure Key Vault. In this case only the URL to the customer's Key Vault is stored in the user's settings file. Other Azure credentials are stored in the user's key vault. See the Key Vault section in the Document Translator documentation. The app stores UI settings (not credentials) in the uisettings.json file in the user settings folder. It stores a log of the last run in docTrLog.txt in the same folder. It stores references to custom translation systems in CustomCategories.json, also in the same folder.

The app does not create any local copies of the original or translated documents, not even temporary, EXCEPT in the case of document formats that are locally supplied. As of September 2023 the locally supplied formats are SubRIP (SRT) and WebVTT (VTT) formats. The app will create a temporary file in the user's temp folder, storing the content of the file to be translated in MarkDown format. It will create a temporary file in the MarkDown format in the target folder for translated documents. A successful run of the translation will delete the temporary files. While converting between the local format and MarkDown, the app processes the content of the file in memory.

The app does not include any telemetry or instrumentation. It does not report usage to any service.

Issues

Please submit an issue for questions or comments, and of course for any bug or problem you encouter here.

Contributions

Please contribute your bug fix and functionality additions. Submit a pull request. We will review and integrate quickly - or reject with comments.

Future plans

  • Option to extend the set of file formats with format conversions that are processed locally, as a library within this tool.
  • Web interface with .Net 6 MAUI
  • A shared storage for the glossary, so that multiple clients can refer to a single company-wide glossary.

Credits

The tool uses following Nuget packages:

  • Nate McMaster's Command Line Utilities for the CLI command and options processing.
  • Azure.Storage.Blobs for the interaction with the Azure storage service.
  • Azure.AI.Translation.Document, a client library for the Azure Document Translation Service
  • Azure.Identity for authentication to Key Vault
  • Azure.Security.KeyVault.Secrets for reading the credentials from Azure Key Vault

Our sincere thanks to the authors of these packages.

documenttranslation's People

Contributors

chriswendt1 avatar jann-skotdal avatar martinskeem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

documenttranslation's Issues

Cognitive services key should fail the Test

Using a cognitive services key passes the Test, works with text translation, but fails the format enumeration.
Using the new zip file under releases I'm getting the following error: List of translatable extensions cannot be null. (Parameter 'Extensions')

Using the GUI this is the error I'm getting

Capture

Originally posted by @parclarke in #16 (comment)

Glossary to use option error encountered

Unfortunately when using the glossary, this error pops up "Glossary element must contain only 2 nodes (source and target)". Better understanding and format of the glossary file would be much appreciated.

Fails to run after initial install - Test failedkey

After installing and adding configurations, app closed twice after clicking Save and Test. Now I have entered service key, region, endpoint and I am receiving error Test failedkey. I have entered values as described in docs.

Document formatting causing issues

We are having some issues with document translation where if there are bolded words the formatting gets lost and the translation is terrible. It is like the bolded word becomes a heading. Is this due to the application or the service?

Crash with invalid credentials

Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
at TranslationService.CLI.Program.<>c.<

b__1_19>d.MoveNext() in C:\work\Source\Repos\DocumentTranslation\DocumentTranslation.CLI\Program.cs:line 253

Originally posted by @parclarke in #16 (comment)

The target folder is not defaulting correctly

The target folder is meant to default to the previously chosen folder for this target language.
If you never translated to this target language, it is meant to default to ..
This works only AFTER choosing "Select folder". It should set the default without having to press the select button.
Exact same for the glossary.

Progress counting refinement

Currently progress is counted based on the number of files in this request, regardless of size. Not very useful for a single large file or a small number of differently sized files. Consider the size, and use the per-document status to report on the individual document.

Standard subscription required for document translation

I was receiving a 403 error when trying to translate documents. I was using the Free pricing tier for the Azure Translator and document translation requires the Standard pricing tier. This should be indicated in the setup documentation.

Document translation not working with S1 subscription

Hi,

I've created a Translator resource with Free Trial subscrition (that was the only option available). Then I have upgraded to: Azure subscription 1.

After that I tried to use this tool to translate documents. I pass the Authentication test in Settings, but when I try to Select a file to translate I get the error "S1 o higher...".

Any idea what I should do to get it working?

Thanks much.

Some more useful information in Status bar

Status bar could show

  1. size of file - shown now
  2. count of characters translated - earlier bits had this - now missig
  3. time to upload = job created time - 'Translate Documents' button press time.
  4. time in queue = job start time - job created time
  5. time in translate = job complete time - job start time
  6. time to download = files downloaded to target folder - job complete time
    image

Translate document button is enabled when no files are selected

  • If you press "Select" Documents but then cancel out of the dialog without selecting anything, the "Translate Documents" button is enabled. It shouldn't be.
  • If you press the button "Translate Documents" without any files selected, the app crashes. Expected: No crash.

Stuck inside "In Progress"

I've been successfully using the translation app a few times earlier this month to translate txt files. It is a great tool. Thank you.

This week, however, I tried again to translate a simple txt file, but the app seems to get stuck in Running/In progress mode or in NotStarted and it never completes. I wait for a while (15-30 mins), but still nothing. I tried switching languages, with/without glossary, using a simple txt file with just a few sentences inside, but no luck to get it to complete. The test in Settings passes fine and Translate Text also works fine. I also deleted all contents from the BLOB inside storage resource and re-running the translation with no luck, thinking there could be some conflict. Any idea what could be happening? Thanks much.

"documents to translate" select button not working

Hello,

pressing select button for "documents to translate" bring no file picker dialg up.
App is clothing some seconds later.

Azure Keys etc are configured. Test configuration shows ok, S1 pricing tier.
Tried Text translation in the application, this is ok.

Select picker for "Target folder" is ok, Dialog is shown.

Maybe somebody an idea?

Regards

Marc

Robustness for wrong region

Currently a wrong region doesn't matter for document translation, but it matters for text translation. Be robust for wrong choice and add to "test" option.

Trouble running the project in visual studio code

Hi! I'm trying to open and run the DocumentTranslation-master code in visual studio to customize the dashboard. After opening the folder in visual studio, and running it, I get the errors:

The type or namespace name 'DocumentTranslationService' could not be found (are you missing a using directive or an assembly reference?) [doctr]

The type or namespace name 'DocTransAppSettings' could not be found (are you missing a using directive or an assembly reference?) [doctr]

The name 'AppSettingsSetter' does not exist in the current context [doctr]

May you please give some guidance on how to open and run the code in visual studio. Thank you for your time! I appreciate it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.