Giter Site home page Giter Site logo

Forvo downloader about anki-addons HOT 8 CLOSED

ospalh avatar ospalh commented on August 18, 2024
Forvo downloader

from anki-addons.

Comments (8)

ospalh avatar ospalh commented on August 18, 2024

it crash(es)

Really? With a core dump and everything?

at this point

What point. Do you have a stack trace?

Anyway. The forvo.py file is only an inspiration. It is not intended to download anything.

from anki-addons.

basilcool avatar basilcool commented on August 18, 2024

For some reason urllib2 make request in utf16 (or utf8), not sure properly name but its 4-byte encoding.
And it happen only when forvo.py called from anki. When I call it from terminal urllib2 make usual requect (I guess in ascii coding)
tcpdump (forvo.py called from anki)
0000 47 00 00 00 45 00 00 00 54 00 00 00 20 00 00 00 G...E...T... ...
0010 2f 00 00 00 61 00 00 00 63 00 00 00 74 00 00 00 /...a...c...t...
0020 69 00 00 00 6f 00 00 00 6e 00 00 00 2f 00 00 00 i...o...n.../...
0030 77 00 00 00 6f 00 00 00 72 00 00 00 64 00 00 00 w...o...r...d...
0040 2d 00 00 00 70 00 00 00 72 00 00 00 6f 00 00 00 -...p...r...o...
0050 6e 00 00 00 75 00 00 00 6e 00 00 00 63 00 00 00 n...u...n...c...
0060 69 00 00 00 61 00 00 00 74 00 00 00 69 00 00 00 i...a...t...i...
0070 6f 00 00 00 6e 00 00 00 73 00 00 00 2f 00 00 00 o...n...s.../...
0080 66 00 00 00 6f 00 00 00 72 00 00 00 6d 00 00 00 f...o...r...m...
0090 61 00 00 00 74 00 00 00 2f 00 00 00 6a 00 00 00 a...t.../...j...
00a0 73 00 00 00 6f 00 00 00 6e 00 00 00 2f 00 00 00 s...o...n.../...
00b0 6b 00 00 00 65 00 00 00 79 00 00 00 2f 00 00 00 k...e...y.../...
...

usual request from terminal calling
0000 47 45 54 20 2f 61 63 74 69 6f 6e 2f 77 6f 72 64 GET /action/word
0010 2d 70 72 6f 6e 75 6e 63 69 61 74 69 6f 6e 73 2f -pronunciations/
0020 66 6f 72 6d 61 74 2f 6a 73 6f 6e 2f 6b 65 79 2f format/json/key/
...

I guess that same happen with wiktionary.org (wiktionary.py) but not to check it yet
maybe it additional encoding by httplib2? I found it in /usr/share/anki/thirdparty/httplib2

from anki-addons.

basilcool avatar basilcool commented on August 18, 2024

it crash(es)

Really? With a core dump and everything?

No dump, I check stderr by sys.stderr = file("/tmp/py.log", "ab", 1) (I guess this correct... I'm newbie in python) And found nothing about warning or error on request.

at this point

What point. Do you have a stack trace?

I mean when forvo.py call self.get_data_from_url() from downloader.py

Anyway. The forvo.py file is only an inspiration. It is not intended to download anything.

It download correctly when call it from console. I really need it to download Czech pronunciations.
Let me know if you have any idea about encoding request

from anki-addons.

ospalh avatar ospalh commented on August 18, 2024

I can download from Forvo and from Wiktionary.
I just successfully tried

  • “Česko” with language code cz ­– got it from Forvo
  • “Mühle” with de – got it from German Wiktionary and other sources
  • “quarter” with en – got it from Engilsh Wiktionary and other sources

And i have no idea where the 32-bit encoding comes from (is there a UTF-32?). Maybe it is about the version of some library, with different ones used by Anki then the standard-python

What kind of system are you on, what Python, what urllib2? Where are the differences between the environment of Anki versus the shell?

One other thing you can try:

  • Comment back in the right raise in download.py.
  • Comment out all the other downloaders except Forvo in downloaders/__init__.py
  • When the download doesn’t work, you'll get the exception pop-up. Useful for debugging, annoying for production.

(Not sure that will help much in this case.)

from anki-addons.

ospalh avatar ospalh commented on August 18, 2024

This is what is coming out of my tcpdump, when i run it from Anki

        0x0000:  4500 0104 6cec 4000 4006 edb4 c0a8 0017  E...l.@.@.......
        0x0010:  b01f 6e74 9956 0050 b085 5484 b14f b44e  ..nt.V.P..T..O.N
        0x0020:  5018 3908 0151 0000 4745 5420 2f61 6374  P.9..Q..GET./act
        0x0030:  696f 6e2f 776f 7264 2d70 726f 6e75 6e63  ion/word-pronunc
        0x0040:  6961 7469 6f6e 732f 666f 726d 6174 2f6a  iations/format/j
        0x0050:  736f 6e2f XXXXXXXXXXXXXXXXXXXXXXXXXXXXX  son/key/XXXXXXXX
XXX key XXX
        0x0070:  XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 7a6c  XXXXXXXX/word/zl
        0x0080:  6f6d 656e 2543 3325 4244 2f6c 616e 6775  omen%C3%BD/langu
        0x0090:  6167 652f 637a 2f20 4854 5450 2f31 2e31  age/cz/.HTTP/1.1
        0x00a0:  0d0a 4163 6365 7074 2d45 6e63 6f64 696e  ..Accept-Encodin
        0x00b0:  673a 2069 6465 6e74 6974 790d 0a48 6f73  g:.identity..Hos
        0x00c0:  743a 2061 7069 6672 6565 2e66 6f72 766f  t:.apifree.forvo
        0x00d0:  2e63 6f6d 0d0a 436f 6e6e 6563 7469 6f6e  .com..Connection
        0x00e0:  3a20 636c 6f73 650d 0a55 7365 722d 4167  :.close..User-Ag
        0x00f0:  656e 743a 204d 6f7a 696c 6c61 2f35 2e30  ent:.Mozilla/5.0
        0x0100:  0d0a 0d0a                                ....

The text/word is send as percent-escaped and Utf8-encoded. (I looked up "kaputt", i.e. "broken" in Czech, "%C3%BD" is "ý".)

from anki-addons.

ospalh avatar ospalh commented on August 18, 2024

To check which urllib2 Anki is using, open the debug console, "Ctrl colon", type

import urllib2
print(urllib2.__version__)
print(urllib2.__file__)

and use "Ctrl Return" to exec that.

I get

>>> import urllib2
... print(urllib2.__version__)
... print(urllib2.__file__)
2.7
/usr/lib64/python2.7/urllib2.pyc

from anki-addons.

basilcool avatar basilcool commented on August 18, 2024

My system is:

Python 2.7.1+ (r271:86832, Sep 27 2012, 21:16:52) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
>>> print(urllib2.__version__)
2.7
>>> print(urllib2.__file__)
/usr/lib/python2.7/urllib2.pyc

I change downloader.py
replace

request = urllib2.Request(url_in))

with

request = urllib2.Request(url_in.encode('ascii'))

and replace

request.add_header('User-agent', self.user_agent))

with

request.add_header('User-agent', self.user_agent.encode('ascii'))

Now forvo and wiktionary download properly. Maybe this happen because I use russian interface in anki.
Thank you very match for ideas!!!!

from anki-addons.

ospalh avatar ospalh commented on August 18, 2024

I've uploaded a new version.

from anki-addons.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.