Giter Site home page Giter Site logo

Comments (4)

Rascalov avatar Rascalov commented on July 24, 2024

Good day,

I have not yet tried the script myself, I'm curious how it will prevent forvo from blocking the ip that's bulk downloading all audios of a language. I'll soon have some fresh throwaway IPs to test it on.
I'm curious how the creator got a txt dump of all words recorded on forvo, very useful.

For now, I will mark this as an enhancement, this proposal could well solve the issues my original bulk scraper had by having the users give the addon a dictionary file to work with.
I just need some time to figure out how this can be done best. Taking a .mdx dictionary file as input seems like my best bet.

from anki-simple-forvo-audio.

 avatar commented on July 24, 2024

The Author of the Script is from China. He is a member of the Telegram group of "FreeMDict":
https://t.me/freemdict

He obtained a list with 5.7 million URLs from Forvo using Python and spent several weeks doing it! He finished the work on August 2021 and shared with me the script.

The original author tried to scrape too quickly all the sounds from Forvo and after querying 1 or 2 million URLs his IP was blocked in China. Then, he asked me to scrape from my IP. I did it slowly (at an speed 400 Kb/s) and succesfully queried all the 5.7 million URLs 🥇

Forvo never blocked me. 👯 😃

On September I obtained 620.000 German Pronunciations from Forvo and made an .mdx dictionary (on FreeMDict - Private post).

Yesterday I run the Python script and is still working perfectly ! I tried Russian, French and English and those languages work OK.

Just follow the instructions on FreeMDict where the script was posted:
https://forum.freemdict.com/t/topic/8100 (private post required registration)

Please contact me on FreeMDict Forum. My nickname there is "tovaremeterio" : https://forum.freemdict.com/u/tovaremeterio/

I want to scrape several languages (including Russian). We could split the work to avoid duplication of effort :D

from anki-simple-forvo-audio.

 avatar commented on July 24, 2024

@Rascalov

Someone from https://forum.ru-board.com/ (aleven) is downloading all the Russian pronunciations from Forvo.com. He might finish within 3-4 days.

Please let me know if you are interested in the sounds.

from anki-simple-forvo-audio.

 avatar commented on July 24, 2024

@Rascalov All Forvo Audios are now available to download:

https://forum.freemdict.com/t/topic/11947

You can use the Russian audios for your language learning :D

from anki-simple-forvo-audio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.