Giter Site home page Giter Site logo

amld's Introduction

AMLD Africa Workshop, September 04, 2021

Organizers

Khalil Mrini, Imane Khaouja, Ihsane gryech, Anass Sedrati and Abdelhak Mahmoudi

Moroccan Darija Wikipedia: Basics of Natural Language Processing for a Low-Resource Language

Description

NLP is a field that is in high demand, and where research progresses actively and quickly. Whereas language technology for languages like English and French is highly developed, low-resource languages (like most African indigenous languages) have been left behind and marginalized. There are many opportunities to create new tools for languages with few resources. In this tutorial, we take the example of Moroccan Darija, the national vernacular in Morocco. Our use case dataset will be the Moroccan Darija Wikipedia.

The participants will first learn statistical tools to analyze language in the tutorial. The tutorial will go over NLP notions including text pre-processing and tokenization, n-gram language modeling, n-gram frequency, topic modeling, and word embeddings. The tutorial consists of theoretical definitions and concrete examples in Python. The participants can then move to the practice part of the workshop, in teams of 1 to 5 people. Each team will be given the Moroccan Darija Wikipedia and will work on analyzing the dataset from an angle of their choice. At the end of the workshop, the teams will be invited to show their findings in a short presentation.

Labs

Lab1: Wikipedia Darija Cleaning Open In Colab

Lab2: Wikipedia Darija Topic Detection Open In Colab

Lab3: NLP Tasks and Tools Open In Colab

Slides

[link of the presentation willl be available soon !]

amld's People

Contributors

abdelmahm avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.