Comments (4)
Hi, Hieu! It's nice to hear from you.
We created the MaCoCu corpora using Bitextor and additional Bitextor's organization software: https://macocu.eu
Not corpora, but seems like warc2text
is being used in to train the models in the HPLT project: https://hplt-project.org
from bitextor.
Hi there!
Also used in Europat https://europat.net/, at least in part ( https://aclanthology.org/2022.lrec-1.78/ )
from bitextor.
Hi Hieu!
There are a few more corpora created using Bitextor. The corpora created in the GoURMET project (https://opus.nlpl.eu/GoURMET.php) were produced using it. Also, the parallel corpora produced in the AbuMaTran project used Bitextor, even though it was an older version:
- https://www.clarin.si/repository/xmlui/handle/11356/1059
- https://www.clarin.si/repository/xmlui/handle/11356/1061
- https://www.clarin.si/repository/xmlui/handle/11356/1058
- https://www.clarin.si/repository/xmlui/handle/11356/1060
- https://www.clarin.si/repository/xmlui/handle/11356/1049
from bitextor.
Thanks guys. Good to know it's still ticking along
from bitextor.
Related Issues (20)
- Install Alcazar HOT 3
- Process completes without error but does not produce any sentence pairs HOT 1
- Urdu sentence alignment HOT 7
- Inconsistent behaviour of paths in .yaml file HOT 1
- How do you compare two different domains HOT 2
- Problem when run bitextor using document aligner NMT HOT 6
- Document aligner happily returns nothing with piped input HOT 1
- Custom Word Tokenizer Error HOT 3
- CMake build failed v8.1.1 HOT 6
- Bitextor crashes if Bicleaner filters all lines
- Hunalign and Bicleaner errors HOT 3
- Bleualign error HOT 4
- custom_translate getting called without externalMT HOT 2
- External embeddings
- Instruction on running bitextor_align_segments.py for Hunalign only? HOT 1
- Only first file in warc file appears to be processed when "directories" is used as data source HOT 4
- New Bicleaner AI full models HOT 1
- Document level granularity of Paracrawl HOT 1
- 404 Error Accessing Latest Paracrawl Bonus Release Raw Files HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bitextor.