Pedro Ortiz Suarez's Projects
Source code for paper Neural Architectures for Nested NER through Linearization
My bad solutions to Advent of Code-2023
The Alephn Site
Tensorflow implementation of contextualized word representations from bi-directional language models
A notebook with CamemBERT experiments.
The website of CamemBERT
Tools to download and cleanup Common Crawl data
πΈ A simple way to extract data from Common Crawl
A collection of utilities related to CTC
π€ The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools
a Deep Learning Framework for Text
Easily apply transformer models to downstream NLP tasks
An extremely fast entity-fishing client
An extremely simple and naΓ―ve program to deduplicate huge plain text files.
Terminal tool that converts files encoding to UTF-8
HPLT to WET conversion
ISO 639 and IETF Language Code Lookup Tool
A minimal & modern LaTeX template for your (bachelor's | master's | doctoral) thesis
Data and models for lemmatising and POS-tagging modern French (16-18th c.)
A new set of utilities to work with the OSCAR Corpus
Parquet2text
My personal website
Pedro's Personal Website in German
Pedro's Personal Website in English
Pedro's Personal Website in Spanish
Pedro's Personal Website in French