Giter Site home page Giter Site logo

tools's People

Contributors

bittachi avatar doozan avatar itkach avatar sam-tygier avatar sklart avatar weakish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tools's Issues

compiler didn't finish the whole dict

Using Ubuntu 10.04 and follow the intructions here http://aarddict.org/aardtools/doc/aardtools.html.
It works fine dumping the wiki to cdb, but when 6 hours compiling after, it just break at 24.81%.

Here is the compiler output.

(env-aard)mayli@matrix:~$ aardc wiki zhwiki-20110521-pages-articles.cdb --siteinfo zh.json
Session dir ./aardc-1306893341-52
Writing log to ./aardc-1306893341-52/log
Converting zhwiki-20110521-pages-articles.cdb
total: 665100
24.81% t: 6:11:46 avg: 7.4/s a: 73101 r: 90087 s: 0 e: 0 to: 1798 f: 0
Compiling .aar files
Creating volume 1
Wrote volume 1
zhwiki-20110521-pages-articles.aar.1 sha1: 338e89ebfdf77043ec156e258ecd397dbda13cd1
Created zhwiki-20110521-pages-articles.aar
Compilation took 6:12:01

Here is the log file (tail):

.....
16:06:48 WARNING [wiki] Worker pool timed out
16:06:48 INFO [wiki] Terminating current worker pool
16:06:48 INFO [wiki] Creating new worker pool with wiki cdb at zhwiki-20110521-pages-articles.cdb
16:07:00 WARNING [aardtools.mwaardhtmlwriter] Could not render math in u'\u51af\u8bfa\u4f9d\u66fc\u7a33\u5b9a\u6027\u5206\u6790' with 'latex': Couldn't convert equation '\n\begin{align}\n \epsilon_j^n & = e^{at} e^{ik_m x} \\n \epsilon_j^{n+1} & = e^{a(t+\Delta t)} e^{ik_m x} \\n \epsilon_{j+1}^n & = e^{at} e^{ik_m (x+\Delta x)} \\n \epsilon_{j-1}^n & = e^{at} e^{ik_m (x-\Delta x)},\n\end{align}\n' (failed cmd: 'latex -halt-on-error -output-directory /tmp/math-2BaLkB /tmp/math-2BaLkB/eq.tex', error: '')
16:07:00 WARNING [aardtools.mwaardhtmlwriter] Could not render math in u'\u51af\u8bfa\u4f9d\u66fc\u7a33\u5b9a\u6027\u5206\u6790' with 'blahtex': Couldn't convert equation '\n\begin{align}\n \epsilon_j^n & = e^{at} e^{ik_m x} \\n \epsilon_j^{n+1} & = e^{a(t+\Delta t)} e^{ik_m x} \\n \epsilon_{j+1}^n & = e^{at} e^{ik_m (x+\Delta x)} \\n \epsilon_{j-1}^n & = e^{at} e^{ik_m (x-\Delta x)},\n\end{align}\n' (failed cmd: 'blahtexml --texvc-compatible-commands --png --temp-directory /tmp/math-d2hSAh --png-directory /tmp/math-d2hSAh', error: 'Unrecognised command "\begin{align}"')
16:07:28 WARNING [wiki] Worker pool timed out
16:07:28 INFO [wiki] Terminating current worker pool
16:07:28 INFO [wiki] Creating new worker pool with wiki cdb at zhwiki-20110521-pages-articles.cdb
16:07:28 INFO [wiki] Creating new worker pool with wiki cdb at zhwiki-20110521-pages-articles.cdb
16:07:28 INFO [wiki] Creating new worker pool with wiki cdb at zhwiki-20110521-pages-articles.cdb
16:07:28 INFO [compiler] Compiling zhwiki-20110521-pages-articles.aar
16:07:29 INFO [compiler] Creating temporary index 1 file /home/mayli/aardc-1306893341-52/index1dExDHK
16:07:29 INFO [compiler] Creating temporary index 2 file /home/mayli/aardc-1306893341-52/index2fQ_ZGn
16:07:29 INFO [compiler] Creating temporary articles file /home/mayli/aardc-1306893341-52/articlesTKLswW
16:07:39 INFO [compiler] Creating volume 1
16:07:42 INFO [compiler] Done with zhwiki-20110521-pages-articles.aar.1
16:07:42 INFO [compiler] Wrote volume 1
16:07:42 INFO [compiler] Writing volume count 1 to all volumes as >H
16:07:42 INFO [compiler] Calculating checksum for zhwiki-20110521-pages-articles.aar.1
16:07:43 INFO [compiler] zhwiki-20110521-pages-articles.aar.1 sha1: 338e89ebfdf77043ec156e258ecd397dbda13cd1
16:07:43 INFO [compiler] Renaming zhwiki-20110521-pages-articles.aar.1 ==> zhwiki-20110521-pages-articles.aar
16:07:43 INFO [compiler] total: 665100, skipped: 0, failed: 0, empty: 0, timed out: 1798, articles: 73101, redirects: 90087, average: 7.39/s elapsed: 6:12:01
16:07:43 INFO [compiler] Compression: _zlib - 63331, none - 90009, _bz2 - 9850
16:07:43 INFO [compiler] Compilation took 6:12:01.425855

Option to override specific wiki pages

Some templates are hopelessly broken when used with mwlib.

For example, the enwiktionary Template:plural_of uses a call to Special:Whatlinkshere:articlename to check for the existence of an article. This call fails on mwlib and will probably always fail on mwlib. "Correcting" the template on wiktionary.org isn't an option, as the online behavior works correctly.

Since the call fails, the template assumes that the article doesn't exist and so doesn't create a link to the page containing the actual word definition.

It's possible to replace the article text before the conversion to cdb, but it would be nice if aardc provided some means of overriding pages/template.

PS, the recent changes look great, it's becoming much easier to clean up the articles, thank you!

lang-links option while compiling wiki

Wikidata migrated is migrating interlanguage wiki links from individual articles into a central database to ease maintenance.
Here is detailed infomation: https://en.wikipedia.org/wiki/Wikipedia:Wikidata
Now all links are stored in wikidata database which is about 15Gb in size.
Article structure is something like this
Q12345
enwiki: 'water'
ruwiki: 'вода'
and so on. Each article now has its own number, starting with Q.
I have written small script to parse that dump, and create seperate xdxf dictionary, in "one direction". For example ENG -> RUS,FR. Sort of english to russian dictionary, based on wikipedia.

Tutorial to create custom aard dictionary

Hello
I plan to use aard to make custom html-based dictionary with pictures and links.
Please, can you help me on steps how to make one.
I have lots of html pages, pictures and CSS-tables in local directory
What should I do next?
Thank you

Cannot install using Ubuntu 11.10

I tried installing the aard tools using Ubuntu 11.10. The problem seems to be the libicu38-package which apt-get is unable to locating using the standard 11.10. installation. I added the "hardy" packages to my sources file, then the installation of libicu38 did succeed.

Then I could compile aardtools, but starting it I only get this strange message:
ImportError: /usr/local/lib/python2.7/dist-packages/_icu.so: undefined symbol: _ZTIN7icu_3_814TransliteratorE

Any ideas? All tipps are appreciated.

Use LZMA - better compression, faster decompression

(Originally reported by itkach on Feb 11, 2009 at BitBucket)

LZMA promises better compression ratio then gzip and bzip2 and faster than bzip2 decompression. Using LZMA in aard format to compress articles may result in smaller .aar files and better word lookup performance.

aardc slow

Compile time on aardc puts conversion of large dictionaries outside the reach of most users.

Expected process time was 66 days for enwiki on a machine with 4 processors and 2GB of ram. After moving to a machine with 32 processors and 100GB of RAM (some of which was used as a ramdisk for the source and destination files for aardc) it still took 18 hours.

If the process was faster ( less than 2 days on middle of the line current PC ) users would be more likely to compile current dictionaries for the community.

Converting lurkmore.to site

Hello!
I'm trying to dump wiki-based site http://lurkmore.to.
I do everything according to instructions provided here http://aarddict.org/aardtools/doc/aardtools.html and command:

aard-siteinfo lurkmore.to > lu.json

can not be executed.

Message from terminal after running the command

(env-aard)artemon@artemon-ubuntu:~$ aard-siteinfo lurkmore.to > lu.json
fetching 'http://lurkmore.to/w/api.php?action=query&meta=siteinfo&siprop=general|namespaces|namespacealiases|magicwords|interwikimap&format=json'
Traceback (most recent call last):
  File "/home/artemon/env-aard/bin/aard-siteinfo", line 9, in <module>
    load_entry_point('aardtools==0.8.3', 'console_scripts', 'aard-siteinfo')()
  File "/home/artemon/env-aard/src/aardtools/aardtools/fetchsiteinfo.py", line 23, in main
    serialized_data = fetch(sitehostname)
  File "/home/artemon/env-aard/src/aardtools/aardtools/fetchsiteinfo.py", line 14, in fetch
    data = json.loads(data)['query']
  File "/home/artemon/env-aard/local/lib/python2.7/site-packages/simplejson/__init__.py", line 451, in loads
    return _default_decoder.decode(s)
  File "/home/artemon/env-aard/local/lib/python2.7/site-packages/simplejson/decoder.py", line 406, in decode
    obj, end = self.raw_decode(s)
  File "/home/artemon/env-aard/local/lib/python2.7/site-packages/simplejson/decoder.py", line 426, in raw_decode
    raise JSONDecodeError("No JSON object could be decoded", s, idx)
simplejson.decoder.JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)

How I can to get this site information?

P.S. Sorry for my bad English :-)

mmap.error: [Errno 12] Cannot allocate memory when starting to compile wiki dictionary

After processing of the CDB-Files from dewiki is finished I got an error when aardc is about to start compiling the aar files.

100.00% t: 2 days, 16:11:42 avg: 10.0/s a: 1373941 r: 946049 s: 0 e: 0 to: 0 f: 0
Compiling .aar files
Traceback (most recent call last):
File "/home/oliver/env-aard/bin/aardc", line 9, in
load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
File "/home/oliver/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 1094, in main
compiler.compile()
File "/home/oliver/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 507, in compile
for volume in self.make_volumes(create_volume_func, articles):
File "/home/oliver/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 526, in make_volumes
for title, serialized_article in articles:
File "/home/oliver/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 388, in sorted
article_store = mmap.mmap(article_store_f.fileno(), 0)
mmap.error: [Errno 12] Cannot allocate memory

Compilation of an older dewiki dump finished successfully a couple of days ago.

Now I have the following files in the aardc working directory:
-rw------- 1 oliver oliver 38905125 2012-03-01 14:38 aa-8x_Ux4.titles
-rw------- 1 oliver oliver 3374091660 2012-03-01 14:38 aa-Pg6S6z.articles
-rw------- 1 oliver oliver 41759820 2012-03-01 14:38 aa-R5GzZO.index

Processing took two days an 16 hours, so I don't feel like doing it again.
Free disk space is 7,7 GB, but I'm not sure, if that is the problem.
Is there a way to start the compilation of the .aar files using the titles, articles and index files without the need to create them again from the CDB files?

Aardc not showing interwiki links in en.wiktionary.org

en.wiktionary.org seems to heavily use interwiki links to en.wikipedia.org. See attached file Wiktionary.jpg.

Aard Dictionary would not show them, so in fact rendering something like what ist shown in file Aarddict.jpg.

Is there any possibilty to include these iw links in the compilation?
wiktionary
aarddict

Error installing aardtools on Ubuntu 11.10

Everything goes smoothly following the examples on the website until the step where you have to install aard-tools with "pip install aardtools" where you get this error:

layoutengine.cpp: In function ‘void _init_layoutengine(PyObject*)’:

layoutengine.cpp:551:5: error: ‘kTypoFlagKern’ is not a member of ‘icu_44::LayoutEngine’

layoutengine.cpp:552:5: error: ‘kTypoFlagLiga’ is not a member of ‘icu_44::LayoutEngine’

error: command 'gcc' failed with exit status 1


Command /home/user/env-aard/bin/python -c "import setuptools;file='/home/user/env-aard/build/PyICU/setup.py';exec(compile(open(file).read().replace('\r\n', '\n'), file, 'exec'))" install --single-version-externally-managed --record /tmp/pip-4wWS0O-record/install-record.txt --install-headers /home/user/env-aard/include/site/python2.7 failed with error code 1

Any advice on what could be going wrong with the installation?

Compile Error whenever I used example comn_dictd04_wn.tar.bz2

I installed your tool on ubuntu 10.04.4 LTS.
I tried to compile my custom xdxf dictionary. I can not compile it.
So I tested my configuration with your example .
I downloaded http://downloads.sourceforge.net/xdxf/comn_dictd04_wn.tar.bz2

aardc xdxf comn_dictd04_wn.tar.bz2
Output is
Session dir ./aardc-comn_dictd04_wn.tar.bz2-1375169810
Writing log to ./aardc-comn_dictd04_wn.tar.bz2-1375169810/log
Converting comn_dictd04_wn.tar.bz2
Calculating total number of items...
total: 136971
Traceback (most recent call last):
File "/root/mytest/bin/aardc", line 9, in
load_entry_point('aardtools==0.9.0', 'console_scripts', 'aardc')()
File "/root/mytest/src/aardtools/aardtools/compiler.py", line 1067, in main
compiler.run()
File "/root/mytest/src/aardtools/aardtools/compiler.py", line 436, in run
for article in self.article_source:
File "/root/mytest/src/aardtools/aardtools/xdxf.py", line 222, in parse
for _, element in etree.iterparse(f):
File "", line 91, in next
cElementTree.ParseError: not well-formed (invalid token): line 1, column 7

Error on big wiki dump

[quote]99.46% t: 16:46:25 avg: 30.1/s a: 869231 r: 949552 s: 0 e: 0 to: 107 f: 0
Compiling .aar files
Traceback (most recent call last):
File "/home/u/env-aad/bin/aardc", line 9, in
load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
File "/home/u/env-aad/local/lib/python2.7/site-packages/aardtools/compiler.py", line 1094, in main
compiler.compile()
File "/home/u/env-aad/local/lib/python2.7/site-packages/aardtools/compiler.py", line 507, in compile
for volume in self.make_volumes(create_volume_func, articles):
File "/home/u/env-aad/local/lib/python2.7/site-packages/aardtools/compiler.py", line 526, in make_volumes
for title, serialized_article in articles:
File "/home/u/env-aad/local/lib/python2.7/site-packages/aardtools/compiler.py", line 388, in sorted
article_store = mmap.mmap(article_store_f.fileno(), 0)
ValueError: mmap length is too large[/quote]
I dont understand, why I have got this error.
It's while compiling to aar ruwiki20120710.
Please, explain this error.

Siteinfo argument broken

in class MediawikiArticleSource in wiki.py, line 331:

    parser.add_argument('siteinfo',
                        help=('Path to Mediawiki JSON-formatted site info file. Get it with '
                              'aard-siteinfo command'))

Judging by the other add_argument calls in that function, it should be '--siteinfo'. It was broken by this commit: ac21e5f

Need mechanism for specifying article content filters for Mediawiki dumps

Currently certain Mediawiki content is excluded during article conversion based on a hardcoded list of class and id attribute values (such as "navbox", "printonly" etc.). Content that needs to be excluded, however, is different in each dump, even if some of the class and id values are common. Also, sometimes compiled articles need additional cleanup, for example to remove empty sections. A generic way to specify such content filters outside of converter code is needed.

MediaWiki metadata does not always contain rights

wiki.py assumes that the 'rights' attribute exists in the general siteinfo. It seems that it was previously blank if not set, however this is no longer the case, at least as of 1.23 and perhaps earlier.

        if options.license:
            license_file = options.license
        else:
            rights = general_siteinfo['rights']
            if rights in known_licenses:
                license_file = known_licenses[rights]
            else:
                license_file = None
                self.metadata['license'] = rights

aardc fails with "KeyError: 'rights'" on the general_siteinfo line.

One workaround is to add "--license path/to/text/file" before the "wiki" on the command line.

arrdc can break on FreeBSD without --nomp; lack of default semaphore support

I am using FreeBSD 8.1 in a FreeBSD jail on an 8-core machine, with python's virtualenv set up as described.

The default multiprocessing will fail with the default settings for python26 port. With --nomp, it works fine, just takes a while (9:14, 39.5/sec). This is because by default the semaphore component of the multiprocessor module is broken on FreeBSD.

This is not technically an arrddict problem but it would be a good idea to check if Pool creation failed and give a clearer error message like "you need a working python multiprocessor module, which may involve enabling pth or kernel semaphores in your config when compiling python". This might also be mentioned on the website.

With experimental kernel semaphores enabled in the port config, I got 94.89/s out of it - though rather disturbingly the resulting file was slightly smaller than with --nomp. Slightly greater performance (100.1/s) and a larger file size was achieved using pth support offered by the port instead http://www.gnu.org/software/pth/

Here's my initial output:

(env-aard)[wikifur@wikifur ~]$ aardc wiki wikifur.en.cdb --siteinfo wikifur.en.json --wiki-lang en
/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py:31: DeprecationWarning: Module 'PyICU' is deprecated, import 'icu' instead'
  from PyICU import Locale, Collator
Session dir ./aardc-1305488951-65
texvc: not found
blahtexml: not found
Writing log to ./aardc-1305488951-65/log
Converting wikifur.en.cdb
total: 21911
Traceback (most recent call last):
  File "/usr/home/wikifur/env-aard/bin/aardc", line 8, in <module>
    load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 1093, in main
    converter.collect_articles(converter.make_input(input_file), options, compiler)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 250, in collect_articles
    p.parse(input_file)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 463, in parse_mp
    self.pool.close()
AttributeError: 'NoneType' object has no attribute 'close'

My log file contained:

22:10:01 INFO [compiler] Maximum file size is 2147483647 bytes
22:10:01 INFO [compiler] Wikipedia language: en
22:10:01 WARNING [compiler] Dictionary version is not specified and couldn't be guessed from input file name, using 20110515221001
22:10:01 INFO [compiler] Collecting articles
22:10:02 INFO [compiler] Collecting articles in wikifur.en.cdb
22:10:02 WARNING [wiki] No metadata file specified
22:10:02 INFO [wiki] Language: en (en)
22:10:02 INFO [wiki] Creating new worker pool with wiki cdb at wikifur.en.cdb

If I put an "if self.pool:" before the line mentioned, I get a clearer error which led to the above solution:

  File "/usr/home/wikifur/env-aard/bin/aardc", line 8, in <module>
    load_entry_point('aardtools==0.8.3', 'console_scripts', 'aardc')()
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/compiler.py", line 1093, in main
    converter.collect_articles(converter.make_input(input_file), options, compiler)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 250, in collect_articles
    p.parse(input_file)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 409, in parse_mp
    self.reset_pool(f)
  File "/usr/home/wikifur/env-aard/lib/python2.6/site-packages/aardtools/wiki.py", line 388, in reset_pool
    initargs=[cdbdir, self.lang, self.rtl])
  File "/usr/local/lib/python2.6/multiprocessing/__init__.py", line 227, in Pool
    return Pool(processes, initializer, initargs)
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 84, in __init__
    self._setup_queues()
  File "/usr/local/lib/python2.6/multiprocessing/pool.py", line 130, in _setup_queues
    from .queues import SimpleQueue
  File "/usr/local/lib/python2.6/multiprocessing/queues.py", line 22, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/usr/local/lib/python2.6/multiprocessing/synchronize.py", line 33, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.

Missing article in English Wikipedia

(Originally reported by nobodysbusiness on Feb 14, 2009 at BitBucket)

For some reason, it is possible to search for Amdahl's Law, click on a result, and an error will result. This was done with the downloaded version of the full English Wikipedia available via BitTorrent on the www.aarddict.org site.

MediaWiki metadata does not always contain server; use base if so

In wiki.py the server name is extracted from JSON metadata:

server = general_siteinfo['server']

However, in versions of MediaWiki prior to 1.16, the general metadata does not contain the name of the server as a variable. For example:

{"query":{"general":{"mainpage":"Huvudsida","base":"http:\/\/www.wikilurv.se\/wiki\/Huvudsida","sitename":"WikiLurv","generator":"MediaWiki 1.15.4","case":"first-letter","rights":"GNU Free Documentation License","lang":"sv","fallback8bitEncoding":"windows-1252","writeapi":"","timezone":"UTC","timeoffset":0}}}

The lack of this variable causes failure of the script.

A more robust approach is to extract the server name from base if server is not available.

server = general_siteinfo['server'] if 'server' in general_siteinfo else general_siteinfo['base'].split("/")[2]

Compilation of german Wikipedia failed

(Originally reported by anonymous on Apr 24, 2009 at BitBucket)

aardc wiki dewiki-20090311-pages-articles.xml.bz2 -t dewiki-20090311-pages-articles.cdb

INFO: Maximum file size is 4294967295 bytes
INFO: Creating temp dir ./aardc-article-db-40eM6K
INFO: Collecting articles
INFO: Collecting articles in dewiki-20090311-pages-articles.xml.bz2
sh: texvc: not found
sh: blahtexml: not found
INFO: Creating new worker pool
INFO: Language: de
98INFO: Special article Wikipedia:Aktuelles, skipping (1 so far)

### skipping thousands of lines

647831INFO: Special article Datei:Poleck.JPG, skipping (294585 so far)
    Traceback (most recent call last):
    File "/usr/bin/aardc", line 8, in <module>
    load_entry_point('aardtools==0.7.2.dev', 'console_scripts', 'aardc')()
    File "build/bdist.linux-i686/egg/aardtools/compiler.py", line 659, in main
    File "build/bdist.linux-i686/egg/aardtools/compiler.py", line 485, in compile_wiki
    File "build/bdist.linux-i686/egg/aardtools/wiki.py", line 230, in parse_mp
    File "/usr/lib/python2.5/site-packages/multiprocessing-2.6.1.1-py2.5-linux-i686.egg/multiprocessing/pool.py", line 520, in next
    raise value
EnvironmentError: [Errno 12] **Cannot allocate memory**

Broken one click install - OS2008

Using the Install file off of arddict homepage, or off of maemo.org downloads, after adding arddict repository error - unable to install aarddict Application Package Not Found

Avoid python warning

Maybe line 356 in mwaardhtmlwriter.py should be changed to python 2.7.2+ from displaying thousands of warnings like "The behavior of this method will change in future versions. Use specific 'len(elem)' or 'elem is not None' test instead"

Is it possible to convert slob format into aar?

Software I use (GoldenDict on desktop and Aard 1 on android) supports only aar files. The link to aar dictionaries (http://aarddict.org/1/) is always down, and there are more dictionaries in slob format anyway. Is it possible to convert them into the old version (maybe even with loss of some data that is unsupported in the old format)?

missing files during install

Downloading/unpacking odfpy>=0.9,<0.10 (from mwlib>=0.14.1->aardtools)
Could not find any downloads that satisfy the requirement odfpy>=0.9,<0.10 (from mwlib>=0.14.1->aardtools)
Some externally hosted files were ignored (use --allow-external odfpy to allow).
Cleaning up...
No distributions at all found for odfpy>=0.9,<0.10 (from mwlib>=0.14.1->aardtools)
Storing debug log for failure in /home/jani/.pip/pip.log

same results with --allow-external odfpy (same error with pyPdf, but --allow-external and --allow-unverified worked with that)

problem when compiling Turkish wiktionary

Copied from aarddict/desktop#26
Hello
OS: Ubuntu 12.04 i386
After compiling tr.wikrionary.org there are no word definitions in articles, only information like language, tense, etc.
Example:
Original article:

ffordd
Galce
Ad
Anlamlar
[1] yol

My case (from aarddict):

ffordd
Galce
Ad

as you see, there is no translation
I've followed instructions here http://aarddict.org/aardtools/doc/aardtools.html, except that

  1. installed libicu48 instead of libicu38
  2. had to give executable permission to env-aard/bin/activate
  3. simplewiki-20101026-pages-articles.cdb is not a file, but a folder
  4. had lots of these messages during aardc wiki ... execution:
.../env-aard/local/lib/python2.7/site-packages/aardtools/mwaardhtmlwriter.py:356: FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.
  not (element.getchildren() or element.text or element.tail) and parent):

thank you

Incorrect installation of aardtools (ubuntu 11.04)

Hello,

I'm using ubuntu 11.04. I had the same problem as in
#14
However when trying the fix proposed of installing PyICU==1.2 and then aardtools I I got a sucessful installation however
I get the following result:

(env-aard)waynec@laptop:~/libs$ aard-siteinfo de.wiktionary.org > dewiktionary.json
Traceback (most recent call last):
File "/usr/local/bin/aard-siteinfo", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2671, in
working_set.require(requires)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 654, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 552, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: PyICU>=0.8.1

Naturally I followed the guide very carefully but I don't know what else to try.
The version of ICU installed in my system is 4.4 and everything was installed using normal apt-get commands as instructed.

Perhaps you might have any suggestions or things I could try? thanks you in advance and I hope you can help me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.