Giter Site home page Giter Site logo

ilius / pyglossary Goto Github PK

View Code? Open in Web Editor NEW
2.1K 62.0 239.0 10.01 MB

A tool for converting dictionary files aka glossaries. Mainly to help use our offline glossaries in any Open Source dictionary we like on any modern operating system / device.

License: GNU General Public License v3.0

Python 98.99% CSS 0.08% Makefile 0.09% Shell 0.50% Dockerfile 0.02% XSLT 0.32%
dictionary pygi tkinter linux gtk python glossaries stardict dictionary-tools dictionary-conversion

pyglossary's Introduction

PyGlossary

A tool for converting dictionary files aka glossaries.

The primary purpose is to be able to use our offline glossaries in any Open Source dictionary we like on any OS/device.

There are countless formats, and my time is limited, so I implement formats that seem more useful for myself, or for Open Source community. Also diversity of languages is taken into account. Pull requests are welcome.

Screenshots

Linux - Gtk3-based interface


Windows - Tkinter-based interface


Linux - command-line interface


Android Termux - interactive command-line interface

Supported formats

Format Extension Read Write
Aard 2 (slob) 🔢 .slob
ABBYY Lingvo DSL 📝 .dsl
Almaany.com (SQLite3, Arabic) 🔢 .db
AppleDict Binary 📁 .dictionary
AppleDict Source 📁
Babylon BGL 🔢 .bgl
CC-CEDICT (Chinese) 📝
cc-kedict (Korean) 📝
CSV 📝 .csv
Dict.cc (SQLite3, German) 🔢 .db
DICT.org / Dictd server 📁 (📝.index)
DICT.org / dictfmt source 📝 (.dtxt)
dictunformat output file 📝 (.dictunformat)
DictionaryForMIDs 📁 (📁.mids)
DigitalNK (SQLite3, N-Korean) 🔢 .db
DIKT JSON 📝 (.json)
EDLIN 📁 .edlin
EPUB-2 E-Book 📦 .epub
FreeDict 📝 .tei
Gettext Source 📝 .po
HTML Directory (by file size) 📁
JMDict (Japanese) 📝
JSON 📝 .json
Kobo E-Reader Dictionary 📦 .kobo.zip
Kobo E-Reader Dictfile 📝 .df
Lingoes Source 📝 .ldf
Mobipocket E-Book 🔢 .mobi
Octopus MDict 🔢 .mdx
QuickDic version 6 📁 .quickdic
SQL 📝 .sql
StarDict 📁 (📝.ifo)
StarDict Textual File 📝 (.xml)
Tabfile 📝 .txt, .tab
Wiktextract 📝 .jsonl
Wordset.org 📁
XDXF 📝 .xdxf
Yomichan 📦 (.zip)
Zim (Kiwix) 🔢 .zim

Legend:

  • 📁 Directory
  • 📝 Text file
  • 📦 Package/archive file
  • 🔢 Binary file
  • ✔ Supported
  • ❌ Will not be supported

Note: SQLite-based formats are not detected by extension (.db); So you need to select the format (with UI or --read-format flag). Also don't confuse SQLite-based formats with SQLite mode.

Requirements

PyGlossary requires Python 3.10 or higher, and works in practically all modern operating systems. While primarily designed for GNU/Linux, it works on Windows, Mac OS X and other Unix-based operating systems as well.

As shown in the screenshots, there are multiple User Interface types (multiple ways to use the program).

  • Gtk3-based interface, uses PyGI (Python Gobject Introspection) You can install it on:

    • Debian/Ubuntu: apt install python3-gi python3-gi-cairo gir1.2-gtk-3.0
    • openSUSE: zypper install python3-gobject gtk3
    • Fedora: dnf install pygobject3 python3-gobject gtk3
    • ArchLinux:
    • Mac OS X: brew install pygobject3 gtk+3
    • Nix / NixOS: nix-shell -p pkgs.gobject-introspection python38Packages.pygobject3 python38Packages.pycairo
  • Tkinter-based interface, works in the lack of Gtk. Specially on Windows where Tkinter library is installed with the Python itself. You can also install it on:

    • Debian/Ubuntu: apt-get install python3-tk tix
    • openSUSE: zypper install python3-tk tix
    • Fedora: yum install python3-tkinter tix
    • Mac OS X: read https://www.python.org/download/mac/tcltk/
    • Nix / NixOS: nix-shell -p python38Packages.tkinter tix
  • Command-line interface, works in all operating systems without any specific requirements, just type:

    python3 main.py --help

    • Interactive command-line interface
      • Requires: pip install prompt_toolkit
      • Perfect for mobile devices (like Termux on Android) where no GUI is available
      • Automatically selected if output file argument is not passed and one of these:
        • On Linux and $DISPLAY environment variable is empty or not set
          • For example when you are using a remote Linux machine over SSH
        • On Mac and no tkinter module is found
      • Manually select with --cmd or --ui=cmd
        • Minimally: python3 main.py --cmd
        • You can still pass input file, or any flag/option
      • If both input and output files are passed, non-interactive cmd ui will be default
      • If you are writing a script, you can pass --no-interactive to force disable interactive ui
        • Then you have to pass both input and output file arguments
      • Don't forget to use Up/Down or Tab keys in prompts!
        • Up/Down key shows you recent values you have used
        • Tab key shows available values/options
      • You can press Control+C (on Linux/Windows) at any prompt to exit

UI (User Interface) selection

When you run PyGlossary without any command-line arguments or options/flags, PyGlossary tries to find PyGI and open the Gtk3-based interface. If it fails, it tries to find Tkinter and open the Tkinter-based interface. If that fails, it tries to find prompt_toolkit and run interactive command-line interface. And if none of these libraries are found, it exits with an error.

But you can explicitly determine the user interface type using --ui

  • python3 main.py --ui=gtk
  • python3 main.py --ui=tk
  • python3 main.py --ui=cmd

Installation on Windows

  • Download and install Python (3.10 or above)
  • Open Start -> type Command -> right-click on Command Prompt -> Run as administrator
  • To ensure you have pip, run: python -m ensurepip --upgrade
  • To install, run: pip install --upgrade pyglossary
  • Now you should be able to run pyglossary command
  • If command was not found, make sure Python environment variables are set up:

Feature-specific requirements

  • Using Sort by Locale feature requires PyICU

  • Using --remove-html-all flag requires:

    pip install lxml beautifulsoup4

Some formats have additional requirements. If you have trouble with any format, please check the link given for that format to see its documentations.

Using Termux on Android? See doc/termux.md

Configuration

See doc/config.rst.

Direct and indirect modes

Indirect mode means the input glossary is completely read and loaded into RAM, then converted into the output format. This was the only method available in old versions (before 3.0.0).

Direct mode means entries are one-at-a-time read, processed and written into output glossary.

Direct mode was added to limit the memory usage for large glossaries; But it may reduce the conversion time for most cases as well.

Converting glossaries into these formats requires sorting entries:

That's why direct mode will not work for these formats, and PyGlossary has to switch to indirect mode (or it previously had to, see SQLite mode).

For other formats, direct mode will be the default. You may override this by --indirect flag.

SQLite mode

As mentioned above, converting glossaries to some specific formats will need them to loaded into RAM.

This can be problematic if the glossary is too big to fit into RAM. That's when you should try adding --sqlite flag to your command. Then it uses SQLite3 as intermediate storage for storing, sorting and then fetching entries. This fixes the memory issue, and may even reduce running time of conversion (depending on your home directory storage).

The temporary SQLite file is stored in cache directory then deleted after conversion (unless you pass --no-cleanup flag).

SQLite mode is automatically enabled for writing these formats if auto_sqlite config parameter is true (which is the default). This also applies to when you pass --sort flag for any format. You may use --no-sqlite to override this and switch to indirect mode.

Currently you can not disable alternates in SQLite mode (--no-alts is ignored).

Sorting

There are two things than can activate sorting entries:

  • Output format requires sorting (as explained above)
  • You pass --sort flag in command line.

In the case of passing --sort, you can also pass:

  • --sort-key to select sort key aka sorting order (including locale), see doc/sort-key.md

  • --sort-encoding to change the encoding used for sort

    • UTF-8 is the default encoding for all sort keys and all output formats (unless mentioned otherwise)
    • This will only effect the order of entries, and will not corrupt words / definition
    • Non-encodable characters are replaced with ? byte (only for sorting)
    • Conflicts with --sort-locale

Cache directory

Cache directory is used for storing temporary files which are either moved or deleted after conversion. You can pass --no-cleanup flag in order to keep them.

The path for cache directory:

  • Linux or BSD: ~/.cache/pyglossary/
  • Mac: ~/Library/Caches/PyGlossary/
  • Windows: C:\Users\USERNAME\AppData\Local\PyGlossary\Cache\

User plugins

If you want to add your own plugin without adding it to source code directory, or you want to use a plugin that has been removed from repository, you can place it in this directory:

  • Linux or BSD: ~/.pyglossary/plugins/
  • Mac: ~/Library/Preferences/PyGlossary/plugins/
  • Windows: C:\Users\USERNAME\AppData\Roaming\PyGlossary\plugins\

Using PyGlossary as a Python library

There are a few examples in doc/lib-examples directory.

Here is a basic script that converts any supported glossary format to Tabfile:

import sys
from pyglossary import Glossary

# Glossary.init() should be called only once, so make sure you put it
# in the right place
Glossary.init()

glos = Glossary()
glos.convert(
	inputFilename=sys.argv[1],
	outputFilename=f"{sys.argv[1]}.txt",
	# although it can detect format for *.txt, you can still pass outputFormat
	outputFormat="Tabfile",
	# you can pass readOptions or writeOptions as a dict
	# writeOptions={"encoding": "utf-8"},
)

And if you choose to use glossary_v2:

import sys
from pyglossary.glossary_v2 import ConvertArgs, Glossary

# Glossary.init() should be called only once, so make sure you put it
# in the right place
Glossary.init()

glos = Glossary()
glos.convert(ConvertArgs(
	inputFilename=sys.argv[1],
	outputFilename=f"{sys.argv[1]}.txt",
	# although it can detect format for *.txt, you can still pass outputFormat
	outputFormat="Tabfile",
	# you can pass readOptions or writeOptions as a dict
	# writeOptions={"encoding": "utf-8"},
))

You may look at docstring of Glossary.convert for full list of keyword arguments.

If you need to add entries inside your Python program (rather than converting one glossary into another), then you use write instead of convert, here is an example:

from pyglossary import Glossary

Glossary.init()

glos = Glossary()
mydict = {
	"a": "test1",
	"b": "test2",
	"c": "test3",
}
for word, defi in mydict.items():
	glos.addEntryObj(glos.newEntry(
		word,
		defi,
		defiFormat="m",  # "m" for plain text, "h" for HTML
	))

glos.setInfo("title", "My Test StarDict")
glos.setInfo("author", "John Doe")
glos.write("test.ifo", format="Stardict")

Note: addEntryObj is renamed to addEntry in pyglossary.glossary_v2.

Note: Switching to glossary_v2 is optional and recommended.

And if you need to read a glossary from file into a Glossary object in RAM (without immediately converting it), you can use glos.read(filename, format=inputFormat). Be wary of RAM usage in this case.

If you want to include images, css, js or other files in a glossary that you are creating, you need to add them as Data Entries, for example:

with open(os.path.join(imageDir, "a.jpeg")) as fp:
	glos.addEntry(glos.newDataEntry("img/a.jpeg", fp.read()))

The first argument to newDataEntry must be the relative path (that generally html codes of your definitions points to).

Internal glossary structure

A glossary contains a number of entries.

Each entry contains:

  • Headword (title or main phrase for lookup)
  • Alternates (some alternative phrases for lookup)
  • Definition

In PyGlossary, headword and alternates together are accessible as a single Python list entry.l_word

entry.defi is the definition as a Python Unicode str. Also entry.b_defi is definition in UTF-8 byte array.

entry.defiFormat is definition format. If definition is plaintext (not rich text), the value is m. And if it's in HTML (contains any html tag), then defiFormat is h. The value x is also allowed for XFXF, but XDXF is not widely supported in dictionary applications.

There is another type of entry which is called Data Entry, and generally contains an image, audio, css, or any other file that was included in input glossary. For data entries:

  • entry.s_word is file name (and l_word is still a list containing this string),
  • entry.defiFormat is b
  • entry.data gives the content of file in bytes.

Entry filters

Entry filters are internal objects that modify words/definition of entries, or remove entries (in some special cases).

Like several filters in a pipe which connects a reader object to a writer object (with both of their classes defined in plugins and instantiated in Glossary class).

You can enable/disable some of these filters using config parameters / command like flags, which are documented in doc/config.rst.

The full list of entry filters is also documented in doc/entry-filters.md.

pyglossary's People

Contributors

ashwinvis avatar bao-qian avatar behrouz-m avatar bergentroll avatar bobotig avatar clach04 avatar crissium avatar darkgeek avatar doozan avatar germn avatar glowinthedark avatar hmgqzx avatar holyspiritomb avatar ilius avatar karlb avatar kevinsung avatar kianmeng avatar linzhp avatar master-bob avatar mtlive avatar ousia avatar pgaskin avatar proletarius101 avatar ratijas avatar soshial avatar srghma avatar timgates42 avatar tomtung avatar tsihyoung avatar xiaoqiangwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyglossary's Issues

Cannot play audio

I used dsl.py to generate some dictionaries.
I just found that the object tag cannot play audio on my mac machine's dictionary now, it works fine before.

OS X version:10.8.4
Installed all update.
Dictionary:Version 2.2.1 (143.1)
Could you do me a favor to help me test that audio feature?

Format for CSV files?

Is there any sort of documentation that details how the data inside a csv file should be formatted for reading?

DSL to StarDict conversion issues

  1. Pyglossary was used to convert a DSL dictionary to a GoldenDict one. The result of the procedure was successful, no errors were shown by Pyglossary. However, there are HTML tags in the GoldenDict application can be seen when the dictionary is opened in it. A possible workaround of this problem is to put the sametypesequence=h instruction inside of the resulted .ifo file. Only in this case HTML tags were handled correctly by the GoldenbDict application, the text in cards is displayed as a formatted one. But it creates another issue, the symbol m appears to be preceded in each cards, which is apparently a part of the implicit sametypesequence=m flag indicated for each article.

  2. On the other side, there is another type of issue experienced. There is a well known trick when an empty line included between paragraphs to a DSL file. This line consists of several leading spaces and back slash \ and one trailing space at the end of this line:

    headword
      card body line 1
      \space
      card body line 2
    

    This line is an undocumented feature of the DSL markup language and used for a reason. It's look like the line is proceded by the pyglossary as the line which has the only back slash symbol. The resulted line with the back slash symbol is considered as a part of text, but not a part of markup language. Pyglossary just removes trailing "space" symbol in the line and leaves the \ symbol, which appears to be seen in every empty line of the cards after the dictionary connected to a dictionary application.

In order to reproduce the issues, please don't hesitate to request some samples of DSL and SD files used.

MDX: AssertionError while converting to AppleDict

Issues on OS X 10.11.5

When I try to use this command:

~/Downloads/py/pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict collins.mdx collins.xml

I get a error like this:

LZO compression support is not available
error while saving MDict data files
Traceback (most recent call last):
  File "/Users/Jerry/Downloads/py/pyglossary/plugins/octopus_mdict.py", line 74, in open
    self.writeDataFiles()
  File "/Users/Jerry/Downloads/py/pyglossary/plugins/octopus_mdict.py", line 84, in writeDataFiles
    fpath = ''.join([self._dataDir, key.replace('\\', os.path.sep)]);
TypeError: a bytes-like object is required, not 'str'

using Reader class from OctopusMdict plugin for direct conversion without loading into memory

Writing to file "collins.xml"
exception while calling plugin's write function
Traceback (most recent call last):
  File "/Users/Jerry/Downloads/py/pyglossary/glossary.py", line 771, in write
    self.writeFunctions[format].__call__(self, filename, **options)
  File "/Users/Jerry/Downloads/py/pyglossary/plugins/appledict/__init__.py", line 218, in write
    write_plist(glos, dict_name + '.plist', xsl=xsl, defaultPrefs=defaultPrefs, prefsHTML=prefsHTML, frontBackMatter=frontBackMatter)
  File "/Users/Jerry/Downloads/py/pyglossary/plugins/appledict/__init__.py", line 89, in write_plist
    copyright = ('%s' % bs4.BeautifulSoup(glos.getInfo('copyright'), "lxml").text)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bs4/__init__.py", line 156, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

Writing file "collins.xml" failed.

Any thoughts or resolutions?

Export to SQL

CREATE TABLE dbinfo (title CHAR(16), author CHAR(14), email CHAR(21), description CHAR(2), copyright CHAR(2), sourceLang CHAR(9), targetLang CHAR(8), bgl_defaultCharset CHAR(12), bgl_sourceCharset CHAR(12), bgl_targetCharset CHAR(8), bgl_creationTime CHAR(19), bgl_middleUpdated CHAR(19), bgl_lastUpdated CHAR(2), sourceCharset CHAR(7), targetCharset CHAR(7));
CREATE TABLE word ('s_id' INTEGER PRIMARY KEY NOT NULL, 'wname' TEXT, 'wmean' TEXT);

Delete single-quotes from the second line, it make it invalid in mysql.

AttributeError in AppleDict write

  File "pyglossary/pyglossary.pyw", line 311, in <module>
    reverse=args.reverse,
  File "pyglossary/ui/ui_cmd.py", line 241, in run
    if not g.write(opath, format=write_format, **write_options):
  File "pyglossary/pyglossary/glossary.py", line 408, in write
    getattr(self, 'write%s'%format).__call__(filename, **options)
  File "pyglossary/pyglossary/plugins/appledict/__init__.py", line 219, in write
    write_xml(glos, dict_name + '.xml', cleanHTML=="yes", frontBackMatter=frontBackMatter, indexes=indexes)
  File "pyglossary/pyglossary/plugins/appledict/_dict.py", line 313, in write_xml
    write_entries(glos, f, cleanHTML, indexes)
  File "pyglossary/pyglossary/plugins/appledict/_dict.py", line 260, in write_entries
    long_title = _normalize.title_long(_normalize.title(item[0], BeautifulSoup))
  File "pyglossary/pyglossary/plugins/appledict/_normalize.py", line 97, in title
    title = BeautifulSoup.BeautifulSoup(title, "html").get_text(strip=True).encode('utf-8')
  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1191, in _feed
    self.endData()
  File "/usr/lib/python2.7/dist-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'
$ pip2 show BeautifulSoup

---
Metadata-Version: 1.1
Name: BeautifulSoup
Version: 3.2.1
Summary: HTML/XML parser for quick-turnaround applications like screen-scraping.
Home-page: http://www.crummy.com/software/BeautifulSoup/
Author: Leonard Richardson
Author-email: [email protected]
License: BSD
Location: /usr/lib/python2.7/dist-packages
Requires: 

Read Lingoes LD2

it is the format of lingoes dictionary, will you wanna let this tool be able to read it?

What's is the right way to modify style?

Say if I want to beautifier the dsl formatted dictionary in apple dictionary. Where should I change?
Add embed style in dsl.py plugin? Or change css file in appledict.py?
Things I want to do:

  1. beautiful phonetic symbol
    change
    temp
    to
    temp
  2. suitable audio tag alignment
    change
    temp
    to
    temp

And some minor improvement.

StarDict to txt error

First of all - your program is a life saver!

However, I have bumped in an issue and before I dig into the code I wanted to ask you (or whoever sees this). Maybe you have encountered smth like it and know how to deal with it.

tldr: I load a stardict file (*.ifo) and it loads fine, when I try to write it to text I get errors.

Longer version:

See screenshot:
screenshot from 2015-10-24 23 49 12

I load the stardict dictionary (can be found here) and it seems to load fine. Output in terminal:

Reading from StarDict (ifo), please wait...
reading Stardict file: "/home/igor/Downloads/stardict-freedict-deu-eng-2.4.2/dictd_www.freedict.de_deu-eng.ifo" done.
81628 words found.
/home/igor/External/pyglossary/ui/ui_gtk.py:1285: DeprecationWarning: 
  self.progressbar.update(rat)
time left = 3.134076 seconds
version="2.4.2"
wordcount="81628"
idxfilesize="1730874"
bookname="German - English"
description="Made by Hu Zheng"
date="2007.8.29"
sametypesequence="m"

When I try to convert it to text I get the following error:

Converting to Tabfile (txt, dic), please wait...
filename=/home/igor/Downloads/stardict-freedict-deu-eng-2.4.2/dictd_www.freedict.de_deu-eng.txt
Traceback (most recent call last):
  File "/home/igor/External/pyglossary/ui/ui_gtk.py", line 406, in convert_clicked
    self.glos.write(oPath, format=format)
  File "/home/igor/External/pyglossary/pyglossary/glossary.py", line 372, in write
    getattr(self, 'write%s'%format).__call__(filename, **options)
  File "/home/igor/External/pyglossary/pyglossary/plugins/tabfile.py", line 72, in write
    ext='.txt',
  File "/home/igor/External/pyglossary/pyglossary/glossary.py", line 421, in writeTxt
    alts = item[2]['alts']
KeyError: 'alts'

Any ideas?

README

This is a very good project, keep on it!
I just translated a dictionary from BGL to AppleDict, and I had to look at the code to figure out some things.
In the README you have a command like this: pyglossary.pyw --read-options=resPath=OtherResources dict.BGL dict.xml. But it actually tries to convert it to XDXF format (I just assume this is correct, I do not know anything about that format). So I figured out I should explicit the write format and I had to look at the code to find the string I had to use: --write-format=AppleDict.
Perhaps you could add the exact wording for read/write format options in the README or --help option, because AppleDict Source (xml) is misleading.

And also, please update the readme to pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict dict.BGL dict.xml.

LZO compression is not supported in Mac os 10.10.5

Hi,

I follow the README try to convert mdx to Mac dict, but it told lzo commpression is not supported

Enviroment:

  1. Mac OS 10.10.5
  2. Python 2.7.9
  3. XCode 7.1
$ python pyglossary/pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict dict.mdx dict.xml
No handlers could be found for logger "root"
Reading file "dict.mdx"
LZO compression is not supported
LZO compression is not supported
Traceback (most recent call last):
  File "pyglossary/pyglossary.pyw", line 301, in <module>
    reverse=args.reverse,
  File "/Users/linbo/Downloads/dictionary/pyglossary/ui/ui_cmd.py", line 196, in run
    if g.read(ipath, format=read_format, **read_options)!=False:
  File "/Users/linbo/Downloads/dictionary/pyglossary/pyglossary/glossary.py", line 305, in read
    getattr(self, 'read%s'%format).__call__(filename, **options)
  File "/Users/linbo/Downloads/dictionary/pyglossary/pyglossary/plugins/octopus_mdict.py", line 36, in read
    for key, value in mdx.items():
  File "/Users/linbo/Downloads/dictionary/pyglossary/pyglossary/readmdict.py", line 469, in _decode_record_block
    assert(size_counter == record_block_size)
AssertionError

setup.py error mac os x

i'm trying to use this on mac os x. i'm very much a newbie with python. can use basic command line stuff. downloaded package, extracted archive to ~/Desktop/ then did following:

cd ~/Desktop/pyglossary-master
python setup.py install

result:

Traceback (most recent call last):
File "setup.py", line 18, in
from pyglossary.glossary import VERSION
File "~/Desktop/pyglossary-master/pyglossary/glossary.py", line 1277
yield wordI
SyntaxError: 'return' with argument inside generator

note: OS X 10.11.6. installed ActiveTCL from binary .pkg.
python --version
Python 2.7.8

Read Wikimedia Dumps

Hi @ilius,

Thanks for developing this wonderful package! I would like to check whether this package is able to or has any plan to support wikidump. I think wikidump is some kind of glossary, and somehow can fit into the scope of this project.

Regards,

Longqi

installation on windows

Dear Ilius, I'm new to python using python 3.5.2 and at the moment i can't use Linux, shouldn't i have copied the package into site-package folder and install it using following command?!
C:\python pip install pyglossary\setup.py
Do i have to use Tkinter library to install it? How to install it using Tkinter?
Thanks a lot in advance

Tag a new release for py3 support?

The original reason I asked about py3 support is that I made a tool that uses PyGlossary which I'd like to update for py3. As it is, it looks like I'd need to mark a random commit as a dependency to do that. I'd really rather reference some tag/release as a dependency, so will there be one I can use soon?

RSS for latest release?

سلام. می‌بخشید این پروژه، خروجی RSS برای مطلع شدن از آخرین نسخهٔ پایدار آن را دارد؟ متشکرم.

Windows installation

Hi,

I would like to install this programme on windows 7 32bit. If there is a GUI for windows I would prefer this version.

So my question is: Which files do I have to download. And how do I install it. (Normal windows programms have only one setup.exe- File. And for installing it, we must only click on this file. But here it seems to be different, more complex.)

Thank you.

New Data Structure and Glossary API

As you can see in commit 5d58a57 I have changed data structure and API of Glossary class

1- No more direct access to glos.data, it's now a private attribute glos._data
Plugins must use glos.addEntry(word, defi) or glos.addEntry(word, defi, defiFormat) where both word and defi can be either list (including alternates) or str (if there are no alternates)
Also plugins should use for entry in glos: for iterating over entries (glossary data). Although they can call glos.next() manually.
For API of Entry class, check out pyglossary/entry.py

2- Underlying of _data is different from former data. Members of _data are either (word, defi) or (word, defi, defiFormat), where in both cases both word and defi can be list or str (as mentioned above) and we store them exactly the same way in Entry object when iterating over gloss.
Please note that Entry objects are kind of temporary, we don't keep them in Glossary class or anywhere else, we keep tuples instead for memory efficiency.

The Future

Next step is to define a Reader class in all plugins with read support (starting with important ones), to use Reader class instead of old read function. This Reader class must be iterable so that Glossary class can do for entry in reader:, and the best way to implement Reader is to define an __iter__ method which is a generator and does yield entry.

In Glossary class, we will have a self._readers which is a list of plugins' Reader instances. So that we can read from multiple glossary files (with various formats) and write directly to the output glossary, without loading the whole data into memory (thanks to Python's generators).
With use of Glossary.entryFilters that we have right now, we can modify / skip entries before writing into output glossary.

The only problem is sorting entries, which is for example required for writing into StarDict. But I already have kind of a solution for that too. I have written a stream sorter, that has a buffer with a maximum size, and can sort a partially stream (Python generator)
https://gist.github.com/ilius/71cfdeb004b568ecd40c

Please let me know if you have any suggestions about the design
@tuxor1337 @xiaoqiangwang @kubtek @ratijas

Also, due to major changes and refactoring in StarDict and BGL plugins, spite that I tested them on a few glossaries, bugs might have been produced. I will appreciate very much if you review my code and/or test it on different glossaries.

Thanks :)

BGL: OSError: CRC check failed

I'm testing the latest master branch to convert Accounting_English_Persian.BGL file to .csv, using Python35-32 on Windows.
The program gives me this error:

Traceback (most recent call last):
  File "pyglossary\glossary.py", line 522, in read
    Reader = self.readerClasses[format]
KeyError: 'BabylonBgl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ui\ui_tk.py", line 86, in CallWrapper__call__
    return self.func(*args)
  File "ui\ui_tk.py", line 728, in load
    ex = self.glos.read(iPath, format=format)
  File "pyglossary\glossary.py", line 529, in read
    **options
  File "pyglossary\plugins\babylon_bgl.py", line 2820, in read
    if not reader.read():
  File "pyglossary\plugins\babylon_bgl.py", line 1269, in read
    if not self.readBlock(block):
  File "pyglossary\plugins\babylon_bgl.py", line 1210, in readBlock
    length = self.readBytes(1)
  File "pyglossary\plugins\babylon_bgl.py", line 1252, in readBytes
    buf = self.file.read(bytes)
  File "C:\Python35-32\lib\gzip.py", line 274, in read
    return self._buffer.read(size)
  File "C:\Python35-32\lib\_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "C:\Python35-32\lib\gzip.py", line 452, in read
    self._read_eof()
  File "C:\Python35-32\lib\gzip.py", line 499, in _read_eof
    hex(self._crc)))
OSError: CRC check failed 0x0 != 0x550f2e74

The main site shows its entries here. So I don't think it has a CRC error.

glossary.Glossary.__init__

data: "Default argument value is mutable". i've just shot off my leg accidentally because of this.
info also.

is there any reason to keep it like this?

AppleDict BeautifulSoup warning

Recently, I installed pyglossary on ubuntu 14.04 and used in gui mode to convert stardict to apple.

I encoutered this trackback.

Converting to AppleDict Source (xml), please wait...
filename=/home/charles/Desktop/01/LexitronEnTh-2.4.2/LexitronEnTh.xml
/usr/local/lib/python2.7/dist-packages/beautifulsoup4-4.4.1-py2.7.egg/bs4/init.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

log.txt

MDX to AppleDict conversion issue

Dear all,

I got several dictionaries with MDX format, I have converted them into XML format with pyglossary. While I tried to convert them into Mac dictionary, the whole word definitions of some dictionaries did't show in my dictionary.app. I uploaded the head of one dictionary with issues in XML format, a screen shot, and MDX files of two dictionaries.

Any help and suggestions are appreciated.
screen shot 2016-09-14 at 23 35 46

headofdictionary.xml.zip

Link of two MDX dictionaries are here:
OED,

Collins.

BGL: TypeError: a bytes-like object is required, not 'str'

I'm testing the latest master branch to convert Iranian_Geologist.BGL file to .csv, using Python35-32 on Windows.
The program gives me this error:

Traceback (most recent call last):
  File "pyglossary\pyglossary\glossary.py", line 522, in read
    Reader = self.readerClasses[format]
KeyError: 'BabylonBgl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pyglossary\ui\ui_tk.py", line 86, in CallWrapper__call__
    return self.func(*args)
  File "pyglossary\ui\ui_tk.py", line 728, in load
    ex = self.glos.read(iPath, format=format)
  File "pyglossary\pyglossary\glossary.py", line 529, in read
    **options
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 2835, in read
    for index, (words, defis) in enumerate(reader):
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 1778, in __iter__
    yield self.readEntry()
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 1759, in readEntry
    [res, pos, defi, key_defi] = self.readEntry_defi(block, pos, raw_key)
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 1850, in readEntry_defi
    defi = self.processEntryDefinition(raw_defi, raw_key)
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 2277, in processEntryDefinition
    fields.utf8_defi = self.replace_html_entries(fields.utf8_defi)
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 1917, in replace_html_entries
    return re.sub(pat_entry, replace_html_entry, text)
  File "C:\Python35-32\lib\re.py", line 182, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "pyglossary\pyglossary\plugins\babylon_bgl.py", line 243, in replace_html_entry
    return xml_escape(res)
  File "pyglossary\pyglossary\xml_utils.py", line 15, in xml_escape
    data = data.replace("&", "&amp;")
TypeError: a bytes-like object is required, not 'str'

Write E-Book formats (to replace Penelope)

Hi,

I am the developer behind Penelope, "a Python tool for creating, editing and converting dictionaries, especially for eReader devices": https://github.com/pettarin/penelope

Penelope is mostly used to convert dictionaries to Kobo and Bookeen eReader formats.

Since I no longer have time to develop or maintain it, I am looking for someone to take the project over or for similar tools to include the I/O to those formats, in order not to let the current users down.

I think pyglossary is the best candidate for the job. If you can "import" the current Penelope formats to pyglossary, I would gladly forward the current users of Penelope to pyglossary.

Penelope's code is MIT licensed, so you can do whatever you want with it. If you are going to copy code into pyglossary, feel free to do so. Attribution would be appreciated, though not required.

Let me know if I can help.

BGL: Unicode decoding is faulty!

When I decode my BGL file, all pronunciations are ruined like this:

mis- <BR>I <i>prefix</i><BR> 1) wrong, bad, or erroneous; wrongly, badly, or erroneously<BR> <font color="#008000">misunderstanding</font><BR> <font color="#008000">misfortune</font><BR> <font color="#008000">misspelling</font><BR> <font color="#008000">mistreat</font><BR> <font color="#008000">mislead</font><BR> 2) lack of; not<BR> <font color="#008000">mistrust</font><BR> •<BR>Etymology:<BR>Old English <i>mis</i>(<i>se</i>)<i>-</i><i>;</i> related to Middle English <i>mes-</i><i>,</i> from Old French <i>mes-</i><i>;</i> compare Old High German <i>missa-</i><i>,</i> Old Norse <i>mis-</i><BR>II <i>prefix</i><BR> a variant of <b><a href="entry://miso-">miso-</a></b> before a vowel </> mis-sell <BR> <i><font color="#00FF00"> <i>vb</i></font></i>, <i><font color="#00FF00"> <i>pl</i></font></i> -sells, -selling, <i>-sold</i><BR> to sell a financial product that is inappropriate for the needs of the customer </> misadventure <BR><b><font color="#FF0000"> ,mرچsة™dثˆvة›ntتƒة™</font></b><BR> <i><font color="#00FF00"> n</font></i><BR> 1) an unlucky event; misfortune<BR> 2) <i>law</i> accidental death not due to crime or negligence </> misaligned <BR><b><font color="#FF0000"> ,mرچsة™ثˆlaرچnd</font></b><BR> <i><font color="#00FF00"> adj</font></i><BR> placed or positioned wrongly or badly<BR> Derived words:<BR> misalignment <i><font color="#00FF00"> n</font></i> </> misalliance <BR><b><font color="#FF0000"> ,mرچsة™ثˆlaرچة™ns</font></b><BR> <i><font color="#00FF00"> n</font></i><BR> an unsuitable alliance or marriage </>

SQL export is broken

Sql export from bgl format is broken.
First, It does not generate Create Table scripts completely:

CREATE TABLE dbinfo_extra (
CREATE TABLE word ('id' INTEGER PRIMARY KEY NOT NULL, 

Second, It crash on converting Unicode characters. Stack trace is here:

using Reader class from BabylonBgl plugin for direct conversion without loading into memory
Writing to file "c:\test\test.sql"
�[31mexception while calling plugin's write function      |  0.0% ETA: 00:00:00
Traceback (most recent call last):
  File "C:\..\pyglossary.git\pyglossary\glossary.py", line 822, in write
    self.writeFunctions[format].__call__(self, filename, **options)
  File "..\pyglossary.git\pyglossary\plugins\sql.py", line 17, in write
    fp.write(line + '\n')
  File "C:\Program Files\Python35\lib\encodings\cp1256.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in position 733: character maps to <undefined>
�[0;0;0m
�[31mWriting file "c:\test\test.sql" failed.�[0;0;0m

Data file is corrupted. Word "..."

for every article in stardict *.ifo file i'm getting
Data file is corrupted. Word "..."

python2 ~/projects/pyglossary/pyglossary.pyw --ui=cmd PhraseBookRuEs.ifo txt
dictionary taken from here: http://rutracker.org/forum/viewtopic.php?p=69468986
*.idx.gz file gunziped, *.dict.dz file renamed to *.dict.gz and gunziped.
note that unpacked in such way dictionaries working well in "dictionary universal" (iOS app).

python-lzo & MSYS2

I can't seem to find a working python-lzo package that works in the MSYS2 environment under Windows.

Any suggestions are appreciated.

Read Babylon's BDC

Hi, what should be done in order to use (babylon dictionary) .bdc extensions? Is it some alternative to that?


Edit.: For legal issues, I think I should rephrase my question as follows:

Hi, is there any plans to read .bdc extensions?

Problem loading Chinese dictionary

I got a Babylon(.bgl) dictionary, which is Chinese to English
When I load it with pyglossary, console printed some info not expected, and some items was missing after converting it. I got different output files every time I load and convert it.

There is another dictionary with the same content but English to Chinese, it won't have any problem.

I guess this is an issue about the encoding
pyglossary

doc/bgl_structure.svgz doesn't exist or not a regular file

root@host:/home/me/Downloads/pyglossary-master# ./setup.py install
LZO compression support is not available
running install
running build
running build_py
running install_lib
running install_data
error: can't copy 'doc/bgl_structure.svgz': doesn't exist or not a regular file

AppleDict plugin: remove `jing` binary

@ratijas Please remove contents of this directory
https://github.com/ilius/pyglossary/tree/master/pyglossary/plugins/appledict/jing/jing

Specially bin folder which is 2.9 MB
You may want to add instructions to how to download / install the latest version of it (that's not my concern)

The reasons:

  • A git repo (specially containing a Python project) is not supposed to contain binary files, specially with large total size (larger than the whole project source). And git is not good for tracking binary files.
  • It's not clear to user what is the license of those files, and the conditions link in readme file is broken along with other links: https://github.com/ilius/pyglossary/blob/master/pyglossary/plugins/appledict/jing/jing/readme.html
  • I haven't read the license conditions, we may had to do something when shipping it with our peoject

Invalid newline character

Hi.
When I convert a bgl to mtxt format, all entries end to "\r" newline character instead of "\r\n"
Please fix this issue.
thanks

AppleDict Parse Error

I'm getting a parse error

  • Building SBBIC.dictionary.
  • Cleaning objects directory.
  • Preparing dictionary template.
  • Preprocessing dictionary sources.
  • Extracting index data.
  • Preparing dictionary bundle.
  • Adding body data.
  • Preparing index data.
    *** Error: Parse failure [All executable files in this folder will appear in the Scripts menu. Choosing a script from the menu will run that script.\n\nWhen executed from a local folder, scripts will be passed the selected file names. When executed from a remote folder (e.g. a folder showing web or ftp content), scripts will be passed no parameters.\n\nIn all cases, the following environment variables will be set by Nautilus, which the scripts may use:\n\nNAUTILUS_SCRIPT_SELECTED_FILE_PATHS: newline-delimited paths for selected files (only if local)\n\nNAUTILUS_SCRIPT_SELECTED_URIS: newline-delimited URIs for selected files\n\nNAUTILUS_SCRIPT_CURRENT_URI: URI for current location\n\nNAUTILUS_SCRIPT_WINDOW_GEOMETRY: position and size of current window\n\nNAUTILUS_SCRIPT_NEXT_PANE_SELECTED_FILE_PATHS: newline-delimited paths for selected files in the inactive pane of a split-view window (only if local)\n\nNAUTILUS_SCRIPT_NEXT_PANE_SELECTED_URIS: newline-delimited URIs for selected files in the inactive pane of a split-view window\n\nNAUTILUS_SCRIPT_NEXT_PANE_CURRENT_URI: URI for current location in the inactive pane of a split-view window 1164000 0 All executable files in this folder will appear in the Scripts menu. Choosing a script from the menu will run that script.\n\nWhen executed from a local folder, scripts will be passed the selected file names. When executed from a remote folder (e.g. a folder showing web or ftp content), scripts will be passed no parameters.\n\nIn all cases, the following environment variables will be set by Nautilus, which the scripts may use:\n\nNAUTILUS_SCRIPT_SELECTED_FILE_PATHS: newline-delimited paths for selected files (only if local)\n\nNAUTILUS_SCRIPT_SELECTED_URIS: newline-delimited URIs for selected files\n\nNAUTILUS_SCRIPT_CURRENT_URI: URI for current location\n\nNAUTILUS_SCRIPT_WINDOW_GEOMETRY: position and size of current window\n\nNAUTILUS_SCRIPT_NEXT_PANE_SELECTED_FILE_PATHS: newline-delimited paths for selected files in the inactive pane of a split-view window (only if local)\n\nNAUTILUS_SCRIPT_].
    normalize_key_text aborted.
    Error.
    make: *** [all] Error 1

Do you know what might cause this? Thanks!

Some part of text missing when import form is .dsl

Hi @xiaoqiangwang

The texts below [com]...[/com] tag will missing.
Here is a example:

bracket
    \[[t]bræ̱kɪt[/t]\]
    [m2][*][b]Syn:[/b][/*][/m]
    [m2][*][com]categorized[/com][/*][/m]
    [m1]4) [p][i][c][trn]N-COUNT:[/p] usu [p]pl[/p], oft in N[/c][/i] [b]Brackets[/b] are a pair of written marks that you place round a word, expression, or sentence in order to indicate that you are giving extra information. In British English, curved marks like these are also called [b]brackets[/b], but in American English, they are called parentheses.[/trn][/m]
    [m2][*][ex][lang id=1033]The prices in brackets are special rates for the under 18s...[/lang][/ex][/*][/m]
    [m2][*][ex][lang id=1033]My annotations appear in square brackets.[/lang][/ex][/*][/m]
    [m2][*][b]Syn:[/b][/*][/m]
    [m2][*][ref]parenthesis[/ref][/*][/m]
    [m1]5) [p][i][c][trn]N-COUNT:[/p] usu [p]pl[/c][/i][/p] [b]Brackets[/b] are pair of marks that are placed around a series of symbols in a mathematical expression to indicate that those symbols function as one item within the expression.[/trn][/m]

The result of glos would be:

[('bracket', '<div style="margin-left:0em">[<!-- T -->br\xc3\xa6\xcc\xb1k\xc9\xaat<!-- T -->]</div>\n<div style="margin-left:2em"><b>Syn:</b></div>', {'alts': []})]

After a while debugging, I'm pretty sure the problem is this part of code of dsl.py. Because I can get right result after I comment this part of code:

tags_open  = re.findall('(?<!\\\\)\[[cuib]', line)
tags_close = re.findall('\[/[cuib]\]', line)
  if len(tags_open) != len(tags_close):
    unfinished_line = line
    continue
  else:
    unfinished_line = ''

I'm confuse of this part of code. What's the meaning of (?<!\\\\)\[[cuib]?
Are there any situation only c u i b will be locate in two line?

At least this line of code done right things:

tags_open  = re.findall('(?<!\\\\)\[[cuib]\]', line)

You missed a bracket.

Any help would appreciate.

TypeError: a bytes-like object is required, not 'str'

Any thoughts on the cause or solution for this?

pyglossary-master> python3 pyglossary.pyw "/home/andrew/somefile.BGL" txt
Traceback (most recent call last):
  File "pyglossary.pyw", line 421, in <module>
    convertOptions=convertOptions,
  File "/home/andrew/pyglossary-master/ui/ui_cmd.py", line 269, in run
    **convertOptions
  File "/home/andrew/pyglossary-master/pyglossary/glossary.py", line 842, in convert
    **readOptions
  File "/home/andrew/pyglossary-master/pyglossary/glossary.py", line 553, in read
    reader.open(filename, **options)
  File "/home/andrew/pyglossary-master/pyglossary/plugins/babylon_bgl/bgl_reader.py", line 390, in open
    if not self.readInfo():
  File "/home/andrew/pyglossary-master/pyglossary/plugins/babylon_bgl/bgl_reader.py", line 448, in readInfo
    self.readType3(block)
  File "/home/andrew/pyglossary-master/pyglossary/plugins/babylon_bgl/bgl_reader.py", line 706, in readType3
    value = func(b_value)
  File "/home/andrew/pyglossary-master/pyglossary/plugins/babylon_bgl/bgl_info.py", line 60, in aboutInfoDecode
    aboutExt, _, aboutContents = b_value.partition('\x00')
TypeError: a bytes-like object is required, not 'str'
```sh

No such file or directory: '/usr/local/share/pyglossary/AUTHORS'

The installed app does not run:

# pyglossary
LZO compression support is not available
Traceback (most recent call last):
  File "/usr/local/share/pyglossary/pyglossary.pyw", line 27, in <module>
    from ui.ui_cmd import COMMAND, printAsError, help, parseFormatOptionsStr
  File "/usr/local/share/pyglossary/ui/ui_cmd.py", line 24, in <module>
    from base import *
  File "/usr/local/share/pyglossary/ui/base.py", line 26, in <module>
    authors = open(join(rootDir, 'AUTHORS')).read().split('\n')
IOError: [Errno 2] No such file or directory: '/usr/local/share/pyglossary/AUTHORS'

TypeError: str() takes at most 1 argument (2 given)

I want to convert .mdx to .ifo.

$ python ~/GitHub/pyglossary-master/pyglossary.pyw --read-options=resPath=OtherResources --write-format=AppleDict The_Oxford_Dictionary_of_Economics.mdx The_Oxford_Dictionary_of_Economics.xml
Reading file "The_Oxford_Dictionary_of_Economics.mdx"
Traceback (most recent call last):
  File "/Users/yangyy/GitHub/pyglossary-master/pyglossary.pyw", line 423, in <module>
    reverse=args.reverse,
  File "/Users/yangyy/GitHub/pyglossary-master/ui/ui_cmd.py", line 226, in run
    if not g.read(ipath, format=read_format, **read_options):
  File "/Users/yangyy/GitHub/pyglossary-master/pyglossary/glossary.py", line 519, in read
    self.setInfo('name', split(filename)[1])
  File "/Users/yangyy/GitHub/pyglossary-master/pyglossary/glossary.py", line 386, in setInfo
    key = fixUtf8(key)
  File "/Users/yangyy/GitHub/pyglossary-master/pyglossary/text_utils.py", line 76, in <lambda>
    fixUtf8 = lambda st: toBytes(st).replace(b'\x00', b'').decode('utf-8', 'replace')
  File "/Users/yangyy/GitHub/pyglossary-master/pyglossary/text_utils.py", line 73, in <lambda>
    toBytes = lambda s: bytes(s, 'utf8') if isinstance(s, str) else bytes(s)
TypeError: str() takes at most 1 argument (2 given)

Is there something wrong with the original .mdx file?

glossary.Glossary design question

i'm refactoring glossary.Glossary class, so i've got some questions about original intentions of the author(s).

  • is it supposed to be a singleton?
  • why data attribute is global to the class?
  • exec('read%s = mod.read'%format) umm…?

some problems

mdx-->mac osx .dictionary
ex:
Merriam-Webster Collegiate Dictionary

 <d:entry id="make a face" d:title="make a face">
 <d:index d:value="make a face"/>
 <html><body><h1>make a face</h1>
 <a href="entry://grimace">grimace</a></body></html>
 </d:entry>

 <d:entry id="make faces" d:title="make faces">
 <d:index d:value="make faces"/>
 <html><body><h1>make faces</h1>
 <a href="entry://make a face">make a face</a></body></html
 </d:entry>

entry id="make a face"----->entry id="make_a_face"
href="entry://grimace"------->href="x-dictionary:r:grimace"
href="entry://make a face"------->href="x-dictionary:r:make_a_face"
how to use sed command replace href="entry://make a face" withhref="x-dictionary:r:make_a_face"?

Python 3 support

I used pyglossary recently as part of a tool to produce StarDict-format dictionaries from CC-CEDICT, and Python 2's handling of Unicode was a major annoyance. It'd be nice to be able to use this in the future without having to worry about that.

RuntimeError

  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 1063, in decode
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 1126, in decode_contents
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 1063, in decode
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 1123, in decode_contents
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 678, in output_ready
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 160, in format_string
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 117, in substitute_xml
  File "build/bdist.macosx-10.9-intel/egg/bs4/element.py", line 107, in _substitute_if_appropriate
  File "build/bdist.macosx-10.9-intel/egg/bs4/dammit.py", line 151, in substitute_xml
RuntimeError: maximum recursion depth exceeded while calling a Python object

when i convert this file,it happens.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.