Giter Site home page Giter Site logo

duolingo-vocab-lists's Introduction

Duolingo Vocab Lists Node.js CI "Consider making a donation"

This repository is a collection of Duolingo courses' vocabs.

Currently, the only course is Spanish (learned from English) in the "english-spanish" folder.

The list of words for a course can be downloaded in JSON and CSV format.

The main purpose of this project is to provide a means of getting the vocab in a format that is easy to import into a memorization tool. Tinycards does not seem to be maintained that well anymore (sadface).

Parsing other courses

Originially, I wrote this program to parse the words in the Spanish course from an awesome post in a Duolingo discussion (which is gone now because Duolingo sunset the forums ๐Ÿฅฒ). Thank you so much FieryCat for your inspiration from that post though!

However, now the program parses words from the website duome which has a very comprehensive list of the words for each course. Switched to this approach based on the detailed blog post by Melle Dijkstra.

To use the provided code:

Step 1: Clone it.

Run git clone [email protected]:jmbeach/duolingo-vocab-lists.git

Step 2: Install Dependencies / Build

Run yarn install.

Run yarn build.

Step 3: Get Course Data

Login to Duolingo.com. Open the network tab and look for a request that has the word acquisition in it. Open the Response tab and copy all of the text. Save the text to a file locally (like english-spanish/raw-course-data.json).

Note: This file may have sensitive data in it. Be sure to delete anything sensitive before committing it to your repo.

json data

Step 4: Get Vocab HTML

Go to https://duome.eu/<your-user-name>/progress. The skills tab contains an in-order list of all of the skills in your language. In the chrome developer console, run document.querySelectorAll('.click.skill') to expand every item on the page.

Once you've ran the querySelectorAll command, save the page to an HTML file. NOTE: you may have to clean the html file to ensure there is only one root note. For example: only Body as root.

Step 5: Download Translations

Run deno run src/index.ts download -f <path-to-vocab-html-file> -s <path-to-course-data-json> [-a <google-api-key>] to download the translations to a JSON file.

The translator defaults to finding transaltions of words on Duolingo.com. However, if it can't find one, it uses Google Translate. To use google translate you'll have to get an API key and then put your API key into a .env file like this:

GOOGLE_TRANSLATE_API_KEY=<my-api-key>

NOTE: Make sure to change your desired language pair inside TranslationDownloader (it's es, en by default).

example of program running

Step 6: Generate CSV Files

Finally, run deno run src/index.ts create -f <path-to-json-file> to turn the translations into CSV's.

If the new CSV's aren't in this repository yet, please feel free to create a pull request to add them. Currently, I've only processed Spanish (for English speakers), but would love to get other languages in here.

Step 7 (Optional): Create Combined CSV Files

It might be preferable for some people to have all of the CSV files for each section combined into one file. To generate these, run deno run src/index.ts combine -p <path to language directory>.

duolingo-vocab-lists's People

Contributors

dependabot[bot] avatar edieblu avatar jmbeach avatar lazaridoue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.