If you are new to NLP or want to know more about the project in a broader perspective, you can start on our microsite.
Help us improve DaNLP
- ๐ Have you tried the DaNLP package? Then we would love to chat with you about your experiences from a company perspective. It will take approx 20-30 minutes and there's no preparation. English/danish as you prefer. Please leave your details here and then we will reach out to arrange a call. We also welcome and appreciate any written feedback. Reach us at [email protected]
News
- ๐พ A first version of a Spacy models for sentiment trained using hard distill from BERT is added to the repo, read about it in the docs
- ๐จ ๐ Version 0.0.9 has been released with an update of storage host for models and dataset hosted by danlp - this means older pip version support for downloading models and dataset from danlp host is broken.
- ๐ง Support for Danish in the spaCy new 2.3 version. The progress for supporting spaCy can be seen here issue #3056. The spacy model is trained using DaNE and DDT datasets - Read more about using spacy through danlp here
Next up
- ๐ฃ Example tutorials in Jupiter notebook and getting started guides is coming soon!
- ๐พ Improving spaCy ner model using hard distil of Bert Ner
To get started using DaNLP in your python project simply install the pip package. However installing the pip package will not install all NLP libraries because we want you to have the freedom to limit the dependency on what you use.
To get started using DaNLP simply install the project with pip:
pip install danlp
Note that the installation of DaNLP does not install other NLP libraries such as Gensim, Spacy, Flair or Transformers.
This allows the installation to be as minimal as possible and let the user choose to e.g. load word embeddings with either spaCy, flair or Gensim. Therefore, depending on the function you need to use, you should install one or several of the following: pip install flair
, pip install spacy
or/and pip install gensim
. You can check the requirements.txt
file to see what version the packages has been tested with.
If you want to be able to use the latest developments before they are realized in a new pip package, or you want to modify the code your self, then clone this repo and install from source.
git clone https://github.com/alexandrainst/danlp.git
cd danlp
pip install .
To install the dependency used in the package with the tested versions:
pip install -r requirements.txt
To quickly get started with DaNLP and to try out the models you can use our Docker image. To start a ipython session simply run:
docker run -it --rm alexandrainst/danlp ipython
If you want to run a <script.py>
in your current working directory you can run:
docker run -it --rm -v "$PWD":/usr/src/app -w /usr/src/app alexandrainst/danlp python <script.py>
Natural Language Processing is an active area of research and it consists of many different tasks. The DaNLP repository provides an overview of Danish models for some of the most common NLP tasks.
The repository is under development and this is the list of NLP tasks we have covered and plan to cover in the repository.
- Embedding of text
- Part of speech
- Named Entity Recognition
- Sentiment Analysis
- Dependency parsing
- Coreference resolution
- Lemmatization
If you are interested in Danish support for any specific NLP task you are welcome to get in contact with us.
We do also recommend to check out this awesome list of Danish NLP stuff from Finn ร rup Nielsen.
The number of datasets in the Danish is limited. The DaNLP repository provides an overview of the available Danish datasets that can be used for commercial purposes.
The DaNLP package allows you to download and preprocess datasets. You can read about the datasets here.
You will find examples and tutorials here that shows how to use NLP in Danish. This project keeps a Danish written blog on medium where we write about Danish NLP, and in time we will also provide some real cases of how NLP is applied in Danish companies.
To help you navigate we here provide you with a overview of the structure in the github:
.
โโโ danlp # Source files
โ โโโ datasets # Code to load dataset with different frameworks
โ โโโ models # Code to load models with different frameworks
โโโ docker # Docker image
โโโ docs # Documentation files over dataset and models
โ โโโ imgs # Images used in documentation
โ โโโ models # Overview over avalible models with code snippet and benchmarkresults
โโโ exampels # Exampels, tutorials and benchmark scripts
โ โโโ benchmarks # Scripts for reproducing bechmarksresults reported in docs
โโโ tests # Test for continous integration with travis
If you want to contribute to the DaNLP repository and make it better, your help is very welcome. You can contribute to the project in many ways:
- Help us write good tutorials on Danish NLP use-cases
- Contribute with your own pretrained NLP models or datasets in Danish
- Notify us of other Danish NLP resources
- Create GitHub issues with questions and bug reports
The DaNLP repository is maintained by the Alexandra Institute which is a Danish non-profit company with a mission to create value, growth and welfare in society. The Alexandra Institute is a member of GTS, a network of independent Danish research and technology organisations.
The work on this repository is part the Dansk For Alle performance contract allocated to the Alexandra Insitute by the Danish Ministry of Higher Education and Science. The project runs in two years in 2019 and 2020, and an overview of the project can be found on our microsite. ````