Tool downloads bi-lingual captions from titles, pictures, figures etc. from the Wikipedia and build parallel corpus from them.
Feel free to use this tool if you cite: • Wołk K., Marasek K., “Unsupervised comparable corpora preparation and exploration for bi-lingual translation equivalents”, Proceedings of the 12th International Workshop on Spoken Language Translation, Da Nang, Vietnam, December 3-4, 2015, p.118-125
For more information, see: http://arxiv.org/pdf/1512.01641
For any questions: | Krzysztof Wolk | [email protected]