Giter Site home page Giter Site logo

pypostal's Introduction

pypostal

Build Status PyPI version License

These are the official Python bindings to https://github.com/openvenues/libpostal, a fast statistical parser/normalizer for street addresses anywhere in the world.

Usage

from postal.expand import expand_address
expand_address('Quatre vingt douze Ave des Champs-Élysées')

from postal.parser import parse_address
parse_address('The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom')

Installation

Before using the Python bindings, you must install the libpostal C library. Make sure you have the following prerequisites:

On Ubuntu/Debian

sudo apt-get install curl autoconf automake libtool python-dev pkg-config

On CentOS/RHEL

sudo yum install curl autoconf automake libtool python-devel pkgconfig

On Mac OSX

brew install curl autoconf automake libtool pkg-config

Installing libpostal

git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure --datadir=[...some dir with a few GB of space...]
make
sudo make install

# On Linux it's probably a good idea to run
sudo ldconfig

To install the Python library, just run:

pip install postal

Compatibility

pypostal supports Python 2.7+ and Python 3.4+. These bindings are written using the Python C API and thus support CPython only. Since libpostal is a standalone C library, support for PyPy is still possible with a CFFI wrapper, but is not a goal for this repo.

Tests

Make sure you have nose installed, then run:

python setup.py build_ext --inplace
nosetests postal/tests

The build_ext --inplace business is needed so the C extensions build in the source checkout directory and are accessible/importalbe by the Python modules.

pypostal's People

Contributors

albarrentine avatar cipri-tom avatar easherma avatar jordanhamill avatar myles avatar uzadude avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pypostal's Issues

cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function' with pip installation

I tried to install pypostal from pip using pip install postal and pip3 install postal but got these errors on Windows 10 using Python 3.6.2:

C:\Users\decagon>pip3 install postal
Collecting postal
  Using cached postal-1.0.tar.gz
Requirement already satisfied: six in c:\users\decagon\appdata\local\programs\python\python36-32\lib\site-packages (from postal)
Installing collected packages: postal
  Running setup.py install for postal ... error
    Complete output from command "c:\users\decagon\appdata\local\programs\python\python36-32\python.exe" -u -c "import setuptools, tokenize;__file__='C:\\Users\\decagon~1\\AppData\\Local\\Temp\\pip-build-x3iroft6\\postal\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\decagonYO~1\AppData\Local\Temp\pip-sc3m5z47-record\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-3.6
    creating build\lib.win32-3.6\postal
    copying postal\expand.py -> build\lib.win32-3.6\postal
    copying postal\parser.py -> build\lib.win32-3.6\postal
    copying postal\__init__.py -> build\lib.win32-3.6\postal
    creating build\lib.win32-3.6\postal\tests
    copying postal\tests\test_expand.py -> build\lib.win32-3.6\postal\tests
    copying postal\tests\test_parser.py -> build\lib.win32-3.6\postal\tests
    copying postal\tests\__init__.py -> build\lib.win32-3.6\postal\tests
    creating build\lib.win32-3.6\postal\utils
    copying postal\utils\encoding.py -> build\lib.win32-3.6\postal\utils
    copying postal\utils\enum.py -> build\lib.win32-3.6\postal\utils
    copying postal\utils\omitted.py -> build\lib.win32-3.6\postal\utils
    copying postal\utils\__init__.py -> build\lib.win32-3.6\postal\utils
    running build_ext
    building 'postal._expand' extension
    creating build\temp.win32-3.6
    creating build\temp.win32-3.6\Release
    creating build\temp.win32-3.6\Release\postal
    C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I/usr/local/include "-Ic:\users\decagon\appdata\local\programs\python\python36-32\include" "-Ic:\users\decagon\appdata\local\programs\python\python36-32\include" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.10240.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\shared" "-IC:\Program Files (x86)\Windows Kits\8.1\include\um" "-IC:\Program Files (x86)\Windows Kits\8.1\include\winrt" /Tcpostal/pyexpand.c /Fobuild\temp.win32-3.6\Release\postal/pyexpand.obj -std=c99 -Wno-unused-function
    cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function'
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\cl.exe' failed with exit status 2

    ----------------------------------------
Command ""c:\users\decagon\appdata\local\programs\python\python36-32\python.exe" -u -c "import setuptools, tokenize;__file__='C:\\Users\\decagon~1\\AppData\\Local\\Temp\\pip-build-x3iroft6\\postal\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\decagon~1\AppData\Local\Temp\pip-sc3m5z47-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\decagon~1\AppData\Local\Temp\pip-build-x3iroft6\postal\

Do you know how I would be able to fix it? From this completely unrelated project, easysnmp/easysnmp#32, someone said that omitting -Wno-unused-function if the platform is win32 might help, but I haven't tested it.

Need an option to set the data-dir in python

Hi,
We have the same requirement mentioned here, but in pypostal instead of jpostal.
We tracked down the python call to here, but not sure how would you suggest to init the expand object with datadir param.
could you please take a look?

can't install it via pip on Windows

C:>pip install postal
Collecting postal
Using cached postal-1.0.tar.gz
Requirement already satisfied: six in c:\program files\python36\lib\site-packages (from postal)
Installing collected packages: postal
Running setup.py install for postal ... error
Complete output from command "c:\program files\python36\python.exe" -u -c "import setuptools, tokenize;file='C:\Users\adity\AppData\Local\Temp\pip-build-ye1wzy53\postal\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\adity\AppData\Local\Temp\pip-g4h3aprc-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-3.6
creating build\lib.win-amd64-3.6\postal
copying postal\expand.py -> build\lib.win-amd64-3.6\postal
copying postal\parser.py -> build\lib.win-amd64-3.6\postal
copying postal_init_.py -> build\lib.win-amd64-3.6\postal
creating build\lib.win-amd64-3.6\postal\tests
copying postal\tests\test_expand.py -> build\lib.win-amd64-3.6\postal\tests
copying postal\tests\test_parser.py -> build\lib.win-amd64-3.6\postal\tests
copying postal\tests_init_.py -> build\lib.win-amd64-3.6\postal\tests
creating build\lib.win-amd64-3.6\postal\utils
copying postal\utils\encoding.py -> build\lib.win-amd64-3.6\postal\utils
copying postal\utils\enum.py -> build\lib.win-amd64-3.6\postal\utils
copying postal\utils\omitted.py -> build\lib.win-amd64-3.6\postal\utils
copying postal\utils_init_.py -> build\lib.win-amd64-3.6\postal\utils
running build_ext
building 'postal._expand' extension
creating build\temp.win-amd64-3.6
creating build\temp.win-amd64-3.6\Release
creating build\temp.win-amd64-3.6\Release\postal
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -I/usr/local/include "-Ic:\program files\python36\include" "-Ic:\program files\python36\include" "-IC:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /Tcpostal/pyexpand.c /Fobuild\temp.win-amd64-3.6\Release\postal/pyexpand.obj -std=c99 -Wno-unused-function
cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2

----------------------------------------

Command ""c:\program files\python36\python.exe" -u -c "import setuptools, tokenize;file='C:\Users\adity\AppData\Local\Temp\pip-build-ye1wzy53\postal\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\adity\AppData\Local\Temp\pip-g4h3aprc-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\adity\AppData\Local\Temp\pip-build-ye1wzy53\postal\

nose tests failing

Trying to run the initial tests documented
using >> nosetests postal/tests
ERROR: Failure: ImportError (cannot import name '_expand')
ERROR: Failure: ImportError (cannot import name '_parser')
Any help ?

fatal error: 'libpostal/libpostal.h' file not found

I followed the instructions and installed libpostal.

git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
mkdir ~/libpostal-datadir
./configure --datadir=~/libpostal-datadir
make

I ran into issues with language_classifier failing, but downloaded it manually with curl:

curl http://libpostal.s3.amazonaws.com/language_classifier.tar.gz -o ~/libpostal-datadir/libpostal/language_classifier.tar.gz
cd ~/libpostal-datadir/libpostal
tar -xzf language_classifier.tar.gz

I'm able to use the library:

➜  libpostal git:(master) ✗ src/address_parser
Loading models...

Welcome to libpostal's address parser.

Type in any address to parse and print the result.

Special commands:

.language [code] to specify a language
.country [code] to specify a country
.exit to quit the program

> 123 Main Street

Result:

{
  "house_number": "123",
  "road": "main street"
}

> 

But can't install the pypostal package:

(country-append) ➜  pypostal git:(master) pip install postal
Collecting postal
  Using cached postal-0.3.tar.gz
Requirement already satisfied: six in /Users/nicole/.virtualenvs/country-append/lib/python2.7/site-packages (from postal)
Building wheels for collected packages: postal
  Running setup.py bdist_wheel for postal ... error
  Complete output from command /Users/nicole/.virtualenvs/country-append/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-build-AbquqQ/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/tmpF3t5tapip-wheel- --python-tag cp27:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.12-intel-2.7
  creating build/lib.macosx-10.12-intel-2.7/postal
  copying postal/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal
  copying postal/expand.py -> build/lib.macosx-10.12-intel-2.7/postal
  copying postal/parser.py -> build/lib.macosx-10.12-intel-2.7/postal
  creating build/lib.macosx-10.12-intel-2.7/postal/tests
  copying postal/tests/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
  copying postal/tests/test_expand.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
  copying postal/tests/test_parser.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
  creating build/lib.macosx-10.12-intel-2.7/postal/utils
  copying postal/utils/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
  copying postal/utils/encoding.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
  copying postal/utils/enum.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
  copying postal/utils/omitted.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
  running build_ext
  building 'postal._expand' extension
  creating build/temp.macosx-10.12-intel-2.7
  creating build/temp.macosx-10.12-intel-2.7/postal
  cc -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -arch i386 -arch x86_64 -pipe -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.12-intel-2.7/postal/pyexpand.o -std=c99 -Wno-unused-function
  postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
  #include <libpostal/libpostal.h>
           ^
  1 error generated.
  error: command 'cc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for postal
  Running setup.py clean for postal
Failed to build postal
Installing collected packages: postal
  Running setup.py install for postal ... error
    Complete output from command /Users/nicole/.virtualenvs/country-append/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-build-AbquqQ/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-E3CTvr-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/nicole/.virtualenvs/country-append/include/site/python2.7/postal:
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.12-intel-2.7
    creating build/lib.macosx-10.12-intel-2.7/postal
    copying postal/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal
    copying postal/expand.py -> build/lib.macosx-10.12-intel-2.7/postal
    copying postal/parser.py -> build/lib.macosx-10.12-intel-2.7/postal
    creating build/lib.macosx-10.12-intel-2.7/postal/tests
    copying postal/tests/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
    copying postal/tests/test_expand.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
    copying postal/tests/test_parser.py -> build/lib.macosx-10.12-intel-2.7/postal/tests
    creating build/lib.macosx-10.12-intel-2.7/postal/utils
    copying postal/utils/__init__.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
    copying postal/utils/encoding.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
    copying postal/utils/enum.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
    copying postal/utils/omitted.py -> build/lib.macosx-10.12-intel-2.7/postal/utils
    running build_ext
    building 'postal._expand' extension
    creating build/temp.macosx-10.12-intel-2.7
    creating build/temp.macosx-10.12-intel-2.7/postal
    cc -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -Qunused-arguments -Qunused-arguments -arch i386 -arch x86_64 -pipe -I/usr/local/include -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.12-intel-2.7/postal/pyexpand.o -std=c99 -Wno-unused-function
    postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
    #include <libpostal/libpostal.h>
             ^
    1 error generated.
    error: command 'cc' failed with exit status 1
    
    ----------------------------------------
Command "/Users/nicole/.virtualenvs/country-append/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-build-AbquqQ/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-E3CTvr-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/nicole/.virtualenvs/country-append/include/site/python2.7/postal" failed with error code 1 in /private/var/folders/8f/_lk_kbrj49q3b317sp3b9y1c0000gn/T/pip-build-AbquqQ/postal/

I added the location of libpostal.pc to PKG_CONFIG_PATH:

(country-append) ➜  pypostal git:(master) echo $PKG_CONFIG_PATH 
/Users/nicole/GitHub/libpostal

However it still fails with the same error. I've also tried cloning this repository and installing with python setup.py install but get the same error. Any help is appreciated!

pip install libpostal issue on Windows

Hey guys,

I'm getting the following issue trying to install a Python bind for libpostal on Wilndows 10 machine during running setup.py: "Cannot open include file: 'libpostal/libpostal.h': No such file or directory". Looks like the same as below on MacOS. Could you please estimate when this could be fixed?

Thanks,
Leonid

postal.parser.parse_address() returns a list instead of a dict

Just began using libpostal and decided to try its python binding.

Whats I immediately noticed is that comparing to the command-line 'src/address_parser' the corresponding python function return a list. Compare:

src/address_parser:

1010 EASY ST OTTAWA K1A 0B1

Result:

{
"house_number": "1010",
"road": "easy st",
"city": "ottawa",
"postcode": "k1a 0b1"
}

python postal:

parse('1010 EASY ST OTTAWA K1A 0B1')
[(u'1010', u'house_number'), (u'easy st', u'road'), (u'ottawa', u'city'), (u'k1a 0b1', u'postcode')]

I could not find a way to output the data in a more organized, JSON way. What's the reason it's done like that? Is it always guaranteed to have a value as a first element, and a key as a second?

Please advise.

building 'postal._expand' extension error

Hi.
I'm trying to install pypostal on a alpine OS. libpostal was successfully compiled but I get this error doing:
pip install "postal==1.1.8"

The output of the command:

    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/postal
    copying postal/expand.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/token_types.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/dedupe.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/normalize.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/tokenize.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/near_dupe.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/__init__.py -> build/lib.linux-x86_64-3.6/postal
    copying postal/parser.py -> build/lib.linux-x86_64-3.6/postal
    creating build/lib.linux-x86_64-3.6/postal/tests
    copying postal/tests/test_parser.py -> build/lib.linux-x86_64-3.6/postal/tests
    copying postal/tests/test_expand.py -> build/lib.linux-x86_64-3.6/postal/tests
    copying postal/tests/_test_near_dupes.py -> build/lib.linux-x86_64-3.6/postal/tests
    copying postal/tests/__init__.py -> build/lib.linux-x86_64-3.6/postal/tests
    creating build/lib.linux-x86_64-3.6/postal/utils
    copying postal/utils/encoding.py -> build/lib.linux-x86_64-3.6/postal/utils
    copying postal/utils/enum.py -> build/lib.linux-x86_64-3.6/postal/utils
    copying postal/utils/omitted.py -> build/lib.linux-x86_64-3.6/postal/utils
    copying postal/utils/__init__.py -> build/lib.linux-x86_64-3.6/postal/utils
    copying postal/pyutils.h -> build/lib.linux-x86_64-3.6/postal
    running build_ext
    building 'postal._expand' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/postal
    gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/local/include -I/usr/local/include/python3.6m -c postal/pyexpand.c -o build/temp.linux-x86_64-3.6/postal/pyexpand.o -std=c99
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1

Any ideas?

from postal.parser import parse_address are very slow initially

Hello! I'm using libpostal and pypostal for an application, where users' input addresses are parsed and geocoded. I notice that the initial import of

from postal.parser import parse_address

runs very slow (takes as long as 7s on my Mac OSX machine).

Now this is fine if I just want to play in the shell, since after the initial import all parse_address calls run blazing fast. However, to serve each request and geocode the address in the backend, I need to keep importing the function before calling it. This adds a significant amount of latency into each request where the address parsing functionality is called upon.

My question is: is this the expected behavior? If so, is there a way to speed the import up? If not, what could have gone wrong? The steps I followed were:

brew install curl autoconf automake libtool pkg-config
git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure --datadir=[...some dir with a few GB of space...]
make
sudo make install
pip install postal

Thanks in advance!

Issue with directory names with spaces.

I am currently running MACOS v10.13.1 and trying to compile libpostal.

I did the following so far:
git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh

These steps work fine. However, when make is run, there is issue with directory not found and it gets stuck there. I solved the permission errors, however still there are issues with this. However, I know that directory exists and when I cd into the directory, it executes without any problem.
I figured the solution i.e. my directories were named as: example 1/example 2/libpostal etc.
When I cd into the directory, I use cd "example 1/example 2/libpostal" which works fine. However, the make doesn't consider this and thus the error.

Please push a new update to handle such cases or add a disclaimer regarding this.

Error loading transliteration module

Good day

I have installed libpostal and can successfully run address_parser

root@ubuntu:/home/pdossantos/libpostal# ./src/address_parser
Loading models...

Welcome to libpostal's address parser.

Type in any address to parse and print the result.

Special commands:
.exit to quit the program

> 6 Pringle St, Boksburg

Result:

{
  "house_number": "6",
  "road": "pringle st",
  "city": "boksburg"
}

>

However, when trying to use the Python bindings I get the following:

root@ubuntu:/home/pdossantos/tests# python test.py
ERR   Error loading transliteration module, dir=(null)
   at libpostal_setup_datadir (libpostal.c:1057) errno: No such file or directory
Traceback (most recent call last):
  File "test.py", line 1, in <module>
    from postal.parser import parse_address
  File "/usr/local/lib/python2.7/dist-packages/postal/parser.py", line 2, in <module>
    from postal import _parser
TypeError: Error loading libpostal data

Is there something additional that needs to be configured? I don't really understand why the dir is coming back as null.

near_dupe_hashes returns empty list

The near_dupe_hashes function from the module postal.near_dupe seems to invariably return an empty list.

Here are some examples of US addresses I have tried. The tokens and labels were obtained with parse_address. The parameters do not change the output, but I have included them just in case

near_dupe_hashes(['house_number', 'road', 'city', 'state', 'postcode'],['209', 'st michaels circle', 'odenton', 'md', '21113'], with_city_or_equivalent=True, with_postal_code=True)

near_dupe_hashes(['house_number', 'road', 'city', 'state', 'postcode'],['1', 'six flags blvd', 'jackson township', 'nj', '08527'])

near_dupe_hashes(['house_number', 'road', 'city', 'state', 'postcode'], ['1313', 'disneyland dr', 'anaheim', 'ca', '92802'])

pip install issue in MAC

Dear Barrentine,

I am getting the following error message when I tried installing the postal python library. I followed the instruction given here.
https://github.com/openvenues/pypostal

Collecting postal
Using cached https://files.pythonhosted.org/packages/3d/0b/2f077c14165c0e4ed795c3fa83e1b68d357186da42ee3ab5d64b77424f12/postal-1.1.7.tar.gz
Requirement already satisfied: six in /Users/appfolio/Library/Python/2.7/lib/python/site-packages (from postal) (1.11.0)
Building wheels for collected packages: postal
Running setup.py bdist_wheel for postal ... error
Complete output from command /usr/local/opt/python@2/bin/python2.7 -u -c "import setuptools, tokenize;file='/private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-install-DpJRLT/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-wheel-EbeFEp --python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.13-x86_64-2.7
creating build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/token_types.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/dedupe.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/normalize.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/parser.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/near_dupe.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/tokenize.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/expand.py -> build/lib.macosx-10.13-x86_64-2.7/postal
creating build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_parser.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_expand.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
creating build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/encoding.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/omitted.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/enum.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/pyutils.h -> build/lib.macosx-10.13-x86_64-2.7/postal
running build_ext
building 'postal._expand' extension
creating build/temp.macosx-10.13-x86_64-2.7
creating build/temp.macosx-10.13-x86_64-2.7/postal
clang -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/usr/local/include -I/usr/local/opt/openssl/include -I/usr/local/opt/sqlite/include -I/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.13-x86_64-2.7/postal/pyexpand.o -std=c99
postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command 'clang' failed with exit status 1


Failed building wheel for postal
Running setup.py clean for postal
Failed to build postal
Installing collected packages: postal
Running setup.py install for postal ... error
Complete output from command /usr/local/opt/python@2/bin/python2.7 -u -c "import setuptools, tokenize;file='/private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-install-DpJRLT/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-record-WC0B8k/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.macosx-10.13-x86_64-2.7
creating build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/token_types.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/dedupe.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/normalize.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/parser.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/near_dupe.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/tokenize.py -> build/lib.macosx-10.13-x86_64-2.7/postal
copying postal/expand.py -> build/lib.macosx-10.13-x86_64-2.7/postal
creating build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_parser.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_expand.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.13-x86_64-2.7/postal/tests
creating build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/encoding.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/init.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/omitted.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/utils/enum.py -> build/lib.macosx-10.13-x86_64-2.7/postal/utils
copying postal/pyutils.h -> build/lib.macosx-10.13-x86_64-2.7/postal
running build_ext
building 'postal._expand' extension
creating build/temp.macosx-10.13-x86_64-2.7
creating build/temp.macosx-10.13-x86_64-2.7/postal
clang -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/usr/local/include -I/usr/local/opt/openssl/include -I/usr/local/opt/sqlite/include -I/usr/local/Cellar/python@2/2.7.15_1/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.13-x86_64-2.7/postal/pyexpand.o -std=c99
postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command 'clang' failed with exit status 1

----------------------------------------

Command "/usr/local/opt/python@2/bin/python2.7 -u -c "import setuptools, tokenize;file='/private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-install-DpJRLT/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-record-WC0B8k/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/3z/7q10k8ss2rvflxq36z_bsxk80000gp/T/pip-install-DpJRLT/postal/

Release on pypi.python.org

Hi,

is there a release on pypi.python.org planned because this would be very convenient so that pip can be used to install the package and listing in requirements files can be done easily. At the moment a release on pypi.python.org would be blocked by a name clash with the already existing package https://pypi.python.org/pypi/pyPostal.

Best regards

Issue when installing on mac (Both prerequisites as well as the actual package)

So I was following the installation instruction step by step in the terminal. I first copied the following command
brew install curl autoconf automake libtool pkg-config
it took some time and then I just copied the following block directly
`git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure --datadir=[...some dir with a few GB of space...]
make
sudo make install

On Linux it's probably a good idea to run

sudo ldconfig`

and I am getting the following issues on the terminal:
glibtoolize: putting auxiliary files in '.'. glibtoolize: copying file './ltmain.sh' glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'. glibtoolize: copying file 'm4/libtool.m4' glibtoolize: copying file 'm4/ltoptions.m4' glibtoolize: copying file 'm4/ltsugar.m4' glibtoolize: copying file 'm4/ltversion.m4' glibtoolize: copying file 'm4/lt~obsolete.m4' glibtoolize: Consider adding '-I m4' to ACLOCAL_AMFLAGS in Makefile.am. ./bootstrap.shconfigure.ac:14: installing './compile' configure.ac:14: installing './config.guess' configure.ac:14: installing './config.sub' configure.ac:12: installing './install-sh' configure.ac:12: installing './missing' src/Makefile.am: installing './depcomp' parallel-tests: installing './test-driver' (base) Shengyus-MacBook-Pro-2:libpostal shengyuchen$ ./bootstrap.sh glibtoolize: putting auxiliary files in '.'. glibtoolize: copying file './ltmain.sh' glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'. glibtoolize: copying file 'm4/libtool.m4' glibtoolize: copying file 'm4/ltoptions.m4' glibtoolize: copying file 'm4/ltsugar.m4' glibtoolize: copying file 'm4/ltversion.m4' glibtoolize: copying file 'm4/lt~obsolete.m4' glibtoolize: Consider adding '-I m4' to ACLOCAL_AMFLAGS in Makefile.am. configure.ac:14: installing './compile' configure.ac:12: installing './missing' src/Makefile.am: installing './depcomp' (base) Shengyus-MacBook-Pro-2:libpostal shengyuchen$ make make: *** No targets specified and no makefile found. Stop. (base) Shengyus-MacBook-Pro-2:libpostal shengyuchen$ sudo make install Password: make: *** No rule to make target install'. Stop.
(base) Shengyus-MacBook-Pro-2:libpostal shengyuchen$ pip install postal
Collecting postal
Using cached https://files.pythonhosted.org/packages/3d/0b/2f077c14165c0e4ed795c3fa83e1b68d357186da42ee3ab5d64b77424f12/postal-1.1.7.tar.gz
Requirement already satisfied: six in /anaconda3/lib/python3.6/site-packages (from postal) (1.12.0)
Building wheels for collected packages: postal
Running setup.py bdist_wheel for postal ... error
Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-install-7a1n45t_/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-wheel-bxkeetgi --python-tag cp36:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.6
creating build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/token_types.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/dedupe.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/normalize.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/parser.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/near_dupe.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/tokenize.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/expand.py -> build/lib.macosx-10.7-x86_64-3.6/postal
creating build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_parser.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_expand.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
creating build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/encoding.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/omitted.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/enum.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/pyutils.h -> build/lib.macosx-10.7-x86_64-3.6/postal
running build_ext
building 'postal._expand' extension
creating build/temp.macosx-10.7-x86_64-3.6
creating build/temp.macosx-10.7-x86_64-3.6/postal
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/usr/local/include -I/anaconda3/include/python3.6m -c postal/pyexpand.c -o build/temp.macosx-10.7-x86_64-3.6/postal/pyexpand.o -std=c99
postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1


Failed building wheel for postal
Running setup.py clean for postal
Failed to build postal
Installing collected packages: postal
Running setup.py install for postal ... error
Complete output from command /anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-install-7a1n45t_/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-record-fu7tijzh/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build/lib.macosx-10.7-x86_64-3.6
creating build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/token_types.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/dedupe.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/normalize.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/parser.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/near_dupe.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/tokenize.py -> build/lib.macosx-10.7-x86_64-3.6/postal
copying postal/expand.py -> build/lib.macosx-10.7-x86_64-3.6/postal
creating build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_parser.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_expand.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.7-x86_64-3.6/postal/tests
creating build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/encoding.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/init.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/omitted.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/utils/enum.py -> build/lib.macosx-10.7-x86_64-3.6/postal/utils
copying postal/pyutils.h -> build/lib.macosx-10.7-x86_64-3.6/postal
running build_ext
building 'postal._expand' extension
creating build/temp.macosx-10.7-x86_64-3.6
creating build/temp.macosx-10.7-x86_64-3.6/postal
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/anaconda3/include -arch x86_64 -I/anaconda3/include -arch x86_64 -I/usr/local/include -I/anaconda3/include/python3.6m -c postal/pyexpand.c -o build/temp.macosx-10.7-x86_64-3.6/postal/pyexpand.o -std=c99
postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
error: command 'gcc' failed with exit status 1

----------------------------------------

Command "/anaconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-install-7a1n45t_/postal/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-record-fu7tijzh/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/19/1pcpc2r57_g6gjl_2v5_4kgc0000gn/T/pip-install-7a1n45t_/postal/
`

Please help.... I need this package to parse some address fields ......

no difference in parse_address output when passing language

Given this address:

〒106-0044東京都港区東麻布1-8-1 東麻布ISビル4F

The code listed below produces the same parsed output regardless of whether the language code is set to 'ja' or 'en'. The code was run with Python 2.7.

Output:

JA: {postcode:〒106-0044} {state:東} {city:京} {city_district:都港区} {house_number:東麻布1-8-1} {house:東麻布isビル} {house_number:4f}
EN: {postcode:〒106-0044} {state:東} {city:京} {city_district:都港区} {house_number:東麻布1-8-1} {house:東麻布isビル} {house_number:4f}

Code:

from postal.parser import parse_address
import sys
sys.stdout.write('JA: ')
for x,y in parse_address('〒106-0044東京都港区東麻布1-8-1 東麻布ISビル4F', language='ja'):
    sys.stdout.write(' {')
    sys.stdout.write(y)`
    sys.stdout.write(':')
    sys.stdout.write(x)
    sys.stdout.write('} ')
sys.stdout.write('\n')
sys.stdout.write('EN: ')
for x,y in parse_address('〒106-0044東京都港区東麻布1-8-1 東麻布ISビル4F', language='en'):
    sys.stdout.write(' {')
    sys.stdout.write(y)
    sys.stdout.write(':')
    sys.stdout.write(x)
    sys.stdout.write('} ')
sys.stdout.write('\n')

Pip installation fails

Linux Mint 18.1 64bit with pip and pip3

    postal/pyexpand.c:1:20: fatal error: Python.h: Datei oder Verzeichnis nicht gefunden
    compilation terminated.
    error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-A8bI6q/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-psmMIx-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-A8bI6q/postal

Setup problem

Hi there: I hit a snag when trying to follow the setup instructions. At the "python setup.py install" step, I got an error message that indicates the compiler can't find libpostal/libpostal.h.

libpostal itself seems to be correctly installed and I'm able to run it from the command line.

I'm running the OSX system version of Python 2.7.10. Before I go mucking with new Python installations or moving directories around, thought I'd post this here. Since no one else has reported this issue, it's probably something stupid on my end...

Here's what I see when I try to run setup.py. It's the same error regardless of whether I preface with sudo:

rob$ python setup.py install
running install
Checking .pth file support in /Library/Python/2.7/site-packages/
/usr/bin/python -E -c pass
TEST PASSED: /Library/Python/2.7/site-packages/ appears to support .pth files
running bdist_egg
running egg_info
writing requirements to pypostal.egg-info/requires.txt
writing pypostal.egg-info/PKG-INFO
writing top-level names to pypostal.egg-info/top_level.txt
writing dependency_links to pypostal.egg-info/dependency_links.txt
reading manifest file 'pypostal.egg-info/SOURCES.txt'
writing manifest file 'pypostal.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.11-intel/egg
running install_lib
running build_py
running build_ext
building 'postal._expand' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.11-intel-2.7/postal/pyexpand.o -std=c99 -Wno-unused-function
postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
         ^
1 error generated.
error: command 'cc' failed with exit status 1

Import Error using Arch

Hello, I am currently using arch with python 3.7 in my server
Clean install of arch, I manage to install everything from libpostal. it works correctly using src/address_parser
I exported LD_LIBRARY_PATH, I installed postal from pip, and I get:
ImportError: libpostal.so.1: cannot open shared object file: No such file or directory
I did my homework and read all the issues (open and closed), I cloned the repo, nosetest fails (same error)
I reinstalled libpostal, set --datadir to a place with enough GB, put that place in $PATH, tested with python setup.py develop.
I have tried every combination possible and I still cannot access the C lib.
ImportError: libpostal.so.1: cannot open shared object file: No such file or directory happens every single time.
I have also installed from source with python setup.py build_ext --inplace

I created a virtualenv, installed postal from pip, from source, nothing. Same error, nosetest keep failing

What I do to trigger the error is just trying to import the parser
import postal.parser
Both inside the repo and outside the repo. I understand the difference, for instance if I launch from the checkout folder and I import postal, postal.__file__ points to the cloned repo, If I import from a different directory postal.__file__ shows the virtualenv directory

Any idea of how can I plug to the C lib?

ImportError: cannot import name _parser

If I install the package with python setup.py install I get an ImportError: cannot import name '_parser'. If I python setup.py develop however, it works.

This is on Ubuntu using Python 2.7.6 or 3.4.3.

merger with usaddress

hi @thatdatabaseguy ,

At @datamade, we are increasingly in need of a multinational version of usaddress. Now that libpostal has moved to a CRF model, it seems a little silly to not try to combine our efforts. Before we can do that we need

  • pypostal to be fully installable from pip (this would mean vendoring libpostal and providing binary wheels for linux, mac, and windows).
  • For United States addresses, extending the libpostal address model to be similar to the one that we for usaddress. At the least, for our purposes, it's critical to break out street type and street direction.
  • porting our training data for use for libpostal

Are these things you would consider? @datamade would do the necessary work to make these happen.

Gem install failure: fatal error: 'libpostal/libpostal.h' file not found

I read through two similar issues but they were dated from 2016 so I'm trying a fresh one.

I installed the C library & lib postal per these instructions. I tried a second time with these slightly different instructions (adding 'make -j4' instead of just 'make'). When I try doing a bundle install it always fails at the ruby postal gem.

If I manually do 'gem install ruby_postal', this is the error:
expand.c:1:10: fatal error: 'libpostal/libpostal.h' file not found
#include <libpostal/libpostal.h>
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [expand.o] Error 1

make failed, exit code 2

Any ideas?

ModuleNotFoundError: No module named 'postal'

Using macOS Mojave I followed the instructions and installed libpostal.

git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
mkdir ~/libpostal-datadir
./configure --datadir=~/libpostal-datadir

This is the output

~/dev_environment/new_data_science/etl/postal_address/libpostal$ sudo make install
Password:
Making install in src
./libpostal_data download all ~/libpostal-datadir/libpostal
Checking for new libpostal data file...
libpostal data file up to date
Checking for new libpostal parser data file...
libpostal parser data file up to date
Checking for new libpostal language classifier data file...
libpostal language classifier data file up to date
 .././install-sh -c -d '/usr/local/bin'
 /usr/bin/install -c libpostal_data '/usr/local/bin'
 .././install-sh -c -d '/usr/local/lib'
 /bin/sh ../libtool   --mode=install /usr/bin/install -c   libpostal.la '/usr/local/lib'
libtool: install: /usr/bin/install -c .libs/libpostal.1.dylib /usr/local/lib/libpostal.1.dylib
libtool: install: (cd /usr/local/lib && { ln -s -f libpostal.1.dylib libpostal.dylib || { rm -f libpostal.dylib && ln -s libpostal.1.dylib libpostal.dylib; }; })
libtool: install: /usr/bin/install -c .libs/libpostal.lai /usr/local/lib/libpostal.la
libtool: install: /usr/bin/install -c .libs/libpostal.a /usr/local/lib/libpostal.a
libtool: install: chmod 644 /usr/local/lib/libpostal.a
libtool: install: ranlib /usr/local/lib/libpostal.a
/Library/Developer/CommandLineTools/usr/bin/ranlib: file: /usr/local/lib/libpostal.a(libpostal_la-strndup.o) has no symbols
/Library/Developer/CommandLineTools/usr/bin/ranlib: file: /usr/local/lib/libpostal.a(libscanner_la-drand48.o) has no symbols
 .././install-sh -c -d '/usr/local/include/libpostal'
 /usr/bin/install -c -m 644 libpostal.h '/usr/local/include/libpostal'
Making install in test
make[2]: Nothing to be done for `install-exec-am'.
make[2]: Nothing to be done for `install-data-am'.
make[2]: Nothing to be done for `install-exec-am'.
 ./install-sh -c -d '/usr/local/lib/pkgconfig'
 /usr/bin/install -c -m 644 libpostal.pc '/usr/local/lib/pkgconfig'

I added export LD_LIBRARY_PATH=/usr/local/lib to my ~/.bash_profile and source it.
When I try to import postal inside python window I get the error No module named 'postal'

Data downloads page broken

The downloads page at https://results.openaddresses.io/ is currently down and returning a 502 Error:

Proxy Error
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /.

Reason: Error reading from remote server

Apache/2.4.18 (Ubuntu) Server at results.openaddresses.io Port 80

Error "Cannot allocate memory" when importing `parse_address`

I just followed the installation instructions in the README in a fresh Ubuntu 18.10 machine. expand_address works well, but it throws error when importing parse_address.

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from postal.expand import expand_address
>>> from postal.parser import parse_address
ERR   Error loading address parser module, dir=(null)
   at libpostal_setup_parser_datadir (libpostal.c:410) errno: Cannot allocate memory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/.local/share/virtualenvs/test-_wVQhZDO/lib/python3.6/site-packages/postal/parser.py", line 2, in <module>
    from postal import _parser
SystemError: initialization of _parser raised unreported exception
>>>

Also, I checked RAM usage and there is still about 1.5 GB memory available. So I'm not sure why it cannot allocate memory.

Did I miss something?

Cannot Install pip install postal in windows

# Also installed Visual C++ compiler for python

E:\Desktop>pip install postal
Collecting postal
Using cached postal-1.0.tar.gz
Requirement already satisfied (use --upgrade to upgrade): six in c:\python27\lib\site-packages (from postal)
Building wheels for collected packages: postal
Running setup.py bdist_wheel for postal ... error
Complete output from command c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\ibrahim\appdata\local\temp\pip-build-x33nju\postal\setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" bdist_wheel -d c:\users\ibrahim\appdata\local\temp\tmpyovm4wpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\postal
copying postal\expand.py -> build\lib.win-amd64-2.7\postal
copying postal\parser.py -> build\lib.win-amd64-2.7\postal
copying postal_init_.py -> build\lib.win-amd64-2.7\postal
creating build\lib.win-amd64-2.7\postal\tests
copying postal\tests\test_expand.py -> build\lib.win-amd64-2.7\postal\tests
copying postal\tests\test_parser.py -> build\lib.win-amd64-2.7\postal\tests
copying postal\tests_init_.py -> build\lib.win-amd64-2.7\postal\tests
creating build\lib.win-amd64-2.7\postal\utils
copying postal\utils\encoding.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils\enum.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils\omitted.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils_init_.py -> build\lib.win-amd64-2.7\postal\utils
running build_ext
building 'postal._expand' extension
creating build\temp.win-amd64-2.7
creating build\temp.win-amd64-2.7\Release
creating build\temp.win-amd64-2.7\Release\postal
C:\Users\Ibrahim\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I/usr/local/include -Ic:\python27\include -Ic:\python27\PC /Tcpostal/pyexpand.c /Fobuild\temp.win-amd64-2.7\Release\postal/pyexpand.obj -std=c99 -Wno-unused-function
cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function'
error: command 'C:\Users\Ibrahim\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe' failed with exit status 2


Failed building wheel for postal
Running setup.py clean for postal
Failed to build postal
Installing collected packages: postal
Running setup.py install for postal ... error
Complete output from command c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\ibrahim\appdata\local\temp\pip-build-x33nju\postal\setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record c:\users\ibrahim\appdata\local\temp\pip-xlfvwe-record\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\postal
copying postal\expand.py -> build\lib.win-amd64-2.7\postal
copying postal\parser.py -> build\lib.win-amd64-2.7\postal
copying postal_init_.py -> build\lib.win-amd64-2.7\postal
creating build\lib.win-amd64-2.7\postal\tests
copying postal\tests\test_expand.py -> build\lib.win-amd64-2.7\postal\tests
copying postal\tests\test_parser.py -> build\lib.win-amd64-2.7\postal\tests
copying postal\tests_init_.py -> build\lib.win-amd64-2.7\postal\tests
creating build\lib.win-amd64-2.7\postal\utils
copying postal\utils\encoding.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils\enum.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils\omitted.py -> build\lib.win-amd64-2.7\postal\utils
copying postal\utils_init_.py -> build\lib.win-amd64-2.7\postal\utils
running build_ext
building 'postal._expand' extension
creating build\temp.win-amd64-2.7
creating build\temp.win-amd64-2.7\Release
creating build\temp.win-amd64-2.7\Release\postal
C:\Users\Ibrahim\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -I/usr/local/include -Ic:\python27\include -Ic:\python27\PC /Tcpostal/pyexpand.c /Fobuild\temp.win-amd64-2.7\Release\postal/pyexpand.obj -std=c99 -Wno-unused-function
cl : Command line error D8021 : invalid numeric argument '/Wno-unused-function'
error: command 'C:\Users\Ibrahim\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\amd64\cl.exe' failed with exit status 2

----------------------------------------

Command "c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:\users\ibrahim\appdata\local\temp\pip-build-x33nju\postal\setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record c:\users\ibrahim\appdata\local\temp\pip-xlfvwe-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\ibrahim\appdata\local\temp\pip-build-x33nju\postal\

Cannot find postal.lib file (Windows)

Hi,

I tried following the instructions on the README. The core issue I have is the following error:

LINK : fatal error LNK1181: cannot open input file 'postal.lib'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.14.26428\bin\HostX86\x64\link.exe' failed with exit status 1181

Not sure if it's related, but I had two other issues that may have impacted something here:

  1. Previously it couldn't find "libpostal\libpostal.h", so I simply added a folder called "libpostal" and copied the "libpostal.h" file into both the mingw64 and conda "include" folders. The error doesn't come up anymore.

  2. When installing on MSYS2 and typing "make install" it ends with the following error:

C:\msys64\home\user\libpostal\test/test_expand.c:19: undefined reference to `libpostal_expan d_address_root'
collect2.exe: error: ld returned 1 exit status
make[1]: *** [Makefile:626: test_libpostal.exe] Error 1
make[1]: Leaving directory '/home/user/libpostal/test'
make: *** [Makefile:454: install-recursive] Error 1

Wasn't able to resolve this issue, but I assumed it was okay because it appears to only be doing a test run. Please let me know if this issue needs to be resolved as well.

Thanks in advance!

Standardizer

Thanks for creating this useful library.
I have a feature request. There can be use cases where we are required to match a number of addresses from two different datasets. In order to be able to properly match, the address from both of the datasets needs to be standardized in "single standard" format.

Current solution: Using the expand_address, we can get a list of possible addresses for an addrress. One option to compare any 2 addresses is to expand both of the addresses and see if any of the element in first list matches with any of the element in the second list. This works. But this becomes very inefficient when size of the datasets grow.

Requirement: What would work better here is a standardize_address function that doesn't give a list of all possible addresses but just a single standardized address. For eg. st., st, St, ST., str in an address etc. should all be renamed to "street". This function could be directly useful for many tasks and could be nice to have functionaliy for this library.

near_dupe_hashes returns empty array

Not sure what's going on, postal.near_dupe.near_dupe_hashes simply returns an empty list no matter what the inputs are. I'm passing in the labels and values created from the parse_address function

Errors on installation - Debian 8.6 VM

What I did:
(installed in ~/Documents/libpostal)

git clone https://github.com/openvenues/libpostal
cd libpostal
./bootstrap.sh
./configure LDFLAGS=-L/usr/lib64 --datadir=$(pwd)/data --prefix=$(realpath $(pwd)) --bindir=$(realpath $(pwd)/bin)
sudo make install
./src/address_parser
Loading models...
ERR Error loading transliteration module, LIBPOSTAL_DATA_DIR=/home/user/Documents/libpostal/data/libpostal
at libpostal_setup (libpostal.c:1069) errno: No such file or directory

libpostal/data is only 29.2kb big, but I have 24.3GB of free space.

libpostal/data/libpostal only has last_updated, last_updated_geo, last_updated_language_classifier, and last_updated_parser, and they all only have `Jan 1 00:00:00 1970'.

SystemError: initialization of _parser raised unreported exception

I am having issue with the parser module. Detail error info below:
Expand module works fine.


SystemError Traceback (most recent call last)
in ()
----> 1 from postal.parser import parse_address

/Users/zhanghuiting/anaconda/lib/python3.6/site-packages/postal/parser.py in ()
1 """Python bindings to libpostal parse_address."""
----> 2 from postal import _parser
3 from postal.utils.encoding import safe_decode
4
5

SystemError: initialization of _parser raised unreported exception

And I just tested the address_parser and shows the below error:

ERR Could not find parser model file of known type
at address_parser_load (address_parser.c:208) errno: No such file or directory

Thank you in advance.

Nick

Lack of clear documentation

Hello,

I've been using the libpostal python bindings for a while now and it has really helped with database cleansing and the like.

One issue I have, however, is that the only functions which have any documentation or usage examples are parse_address and expand_address. Even this comes from the main libpostal page, and not this page.

By looking at the source code, I can find some sparse documentation on how some of the additional functions work, but having to open source files for reference is tedious and not all of the functions have documentation. Please add some documentation for the rest of the functions provided by this library.

Thanks

Installation/Usage unclear

Hi there. Running in terminal on a CentOS7 box. I installed libpostal and can use its utilities, and followed instructions to install python bindings (sudo pip install postal), but I get this error when trying to use code examples:

from postal.expand import expand_address ...
cannot open shared object file: No such file or directory...

dir(postal) = ['builtins', 'doc', 'file', 'name', 'package', 'path']

Ubuntu: libpostal.so.1: cannot open shared object file: No such file or directory

I seemed to have the same problem as issue #8

cannot open shared object file: No such file or directory

I tried the fixes suggested in the issue including adding export LD_LIBRARY_PATH=/usr/local/lib (lib being where libpostal is installed)

I can import postal, just can't follow the tutorial due to this error:

>>> import postal
>>> postal.__file__
'/home/sean/.local/lib/python2.7/site-packages/postal/__init__.pyc'
>>> from postal.expand import expand_address
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sean/.local/lib/python2.7/site-packages/postal/expand.py", line 5, in <module>
    from postal import _expand
ImportError: libpostal.so.1: cannot open shared object file: No such file or directory
sean@sean-Virtual-Machine:~/.local/lib/python2.7/site-packages/postal$ ls

dedupe.py   __init__.py    normalize.pyc  pyutils.h       token_types.pyc
dedupe.pyc  __init__.pyc   _normalize.so  tests           _token_types.so
_dedupe.so  near_dupe.py   parser.py      tokenize.py     utils
expand.py   near_dupe.pyc  parser.pyc     tokenize.pyc
expand.pyc  _near_dupe.so  _parser.so     _tokenize.so
_expand.so  normalize.py   __pycache__    token_types.py

pypostal with airflow

Hi, I got pypostal working from command line and jupyter notebook, but got error when importing from airflow job, ImportError: No module named postal. had this error when running from command line and jupyter notebook initially, both fixed w LD_LIBRARY_PATH. not sure airflow's problem now.

thanks

Can't import `dedupe` module

I'm looking to investigate the new APIs in the alpha release of libpostal but I can't import the module on 3.4.6 or 3.6.2 but I can on 2.7.2. It seems like a python version issue but I can't find much about _PyInt_FromSsize_t and where it is indirectly used.

Python 3.6.2 (default, Jul 23 2017, 13:15:15)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import postal.near_dupe
>>> import postal.dedupe
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jordan/.pyenv/versions/3.6.2/lib/python3.6/site-packages/postal/dedupe.py", line 3, in <module>
    from postal import _dedupe
ImportError: dlopen(/Users/jordan/.pyenv/versions/3.6.2/lib/python3.6/site-packages/postal/_dedupe.cpython-36m-darwin.so, 2): Symbol not found: _PyInt_FromSsize_t
  Referenced from: /Users/jordan/.pyenv/versions/3.6.2/lib/python3.6/site-packages/postal/_dedupe.cpython-36m-darwin.so
  Expected in: flat namespace
 in /Users/jordan/.pyenv/versions/3.6.2/lib/python3.6/site-packages/postal/_dedupe.cpython-36m-darwin.so

configure datadir don't accept path

Hello, I try to configure pypostal and for that I have to set datadir:

./configure --datadir=[...some dir with a few GB of space...]

but I get the error message:

 ./configure --datadir=["/opt/libpostal/"]
configure: error: expected an absolute directory name for --datadir: [/opt/libpostal/]

But I don't understand, /opt/libpostal/ is an absolute directory name ...

running parse_address() throws "parser is not setup" error

I was running the example in README:

from postal.parser import parse_address
parse_address('The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom')

The second line gives me this:

In [3]: parse_address('The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom')
ERR   parser is not setup, call libpostal_setup_address_parser()
   at address_parser_parse (address_parser.c:1666) errno: None
ERR   Parser returned NULL
   at libpostal_parse_address (libpostal.c:243) errno: None
---------------------------------------------------------------------------
SystemError                               Traceback (most recent call last)
<ipython-input-3-16e3f788b70c> in <module>()
----> 1 parse_address('The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom')

/Users/yulong/.local/share/virtualenvs/ml-rVau861g/lib/python2.7/site-packages/postal/parser.pyc in parse_address(address, language, country)
     13     """
     14     address = safe_decode(address, 'utf-8')
---> 15     return _parser.parse_address(address, language=language, country=country)

SystemError: error return without exception set

Not sure if this is a pypostal thing or libpostal in general. I was able to use other functions such as expand_address or those dedupe functions.

pip installer looks for README.rst, whereas it's README.md

Trying to install in Anaconda Python 3.6 on Ubuntu 16.10:

pip install pypostal
Collecting pypostal
Downloading pyPostal-1.1.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-it1zbgs0/pypostal/setup.py", line 10, in
long_description=codecs.open('README.rst', "r", "utf-8").read(),
File "/home/david/anaconda3/envs/py36/lib/python3.6/codecs.py", line 895, in open
file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-it1zbgs0/pypostal/

Do I have to install the C library on Mac?

I got the following error when I directly run pip install postal on Mac.

Do I have to manually install the C library? I'd rather not to manually install packages on Mac. Is there a way to make the installation automatic with either include the C library installation in pip or include it in homebrew?

$ pip install postal

Collecting postal
  Using cached https://files.pythonhosted.org/packages/3d/0b/2f077c14165c0e4ed795c3fa83e1b68d357186da42ee3ab5d64b77424f12/postal-1.1.7.tar.gz
Requirement already satisfied: six in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages (from postal) (1.11.0)
Building wheels for collected packages: postal
  Running setup.py bdist_wheel for postal ... error
  Complete output from command /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python -u -c "import setuptools, tokenize;__file__='/private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-install-guQ8Tb/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-wheel-cO252B --python-tag cp27:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.9-x86_64-2.7
  creating build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/token_types.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/dedupe.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/normalize.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/parser.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/near_dupe.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/tokenize.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  copying postal/expand.py -> build/lib.macosx-10.9-x86_64-2.7/postal
  creating build/lib.macosx-10.9-x86_64-2.7/postal/tests
  copying postal/tests/test_parser.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
  copying postal/tests/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
  copying postal/tests/test_expand.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
  copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
  creating build/lib.macosx-10.9-x86_64-2.7/postal/utils
  copying postal/utils/encoding.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
  copying postal/utils/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
  copying postal/utils/omitted.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
  copying postal/utils/enum.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
  copying postal/pyutils.h -> build/lib.macosx-10.9-x86_64-2.7/postal
  running build_ext
  building 'postal._expand' extension
  creating build/temp.macosx-10.9-x86_64-2.7
  creating build/temp.macosx-10.9-x86_64-2.7/postal
  gcc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.9-x86_64-2.7/postal/pyexpand.o -std=c99
  postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
  #include <libpostal/libpostal.h>
           ^~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.
  error: command 'gcc' failed with exit status 1
  
  ----------------------------------------
  Failed building wheel for postal
  Running setup.py clean for postal
Failed to build postal
Installing collected packages: postal
  Running setup.py install for postal ... error
    Complete output from command /Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python -u -c "import setuptools, tokenize;__file__='/private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-install-guQ8Tb/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-record-0EXPw6/install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build/lib.macosx-10.9-x86_64-2.7
    creating build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/token_types.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/dedupe.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/normalize.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/parser.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/near_dupe.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/tokenize.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    copying postal/expand.py -> build/lib.macosx-10.9-x86_64-2.7/postal
    creating build/lib.macosx-10.9-x86_64-2.7/postal/tests
    copying postal/tests/test_parser.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
    copying postal/tests/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
    copying postal/tests/test_expand.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
    copying postal/tests/test_near_dupes.py -> build/lib.macosx-10.9-x86_64-2.7/postal/tests
    creating build/lib.macosx-10.9-x86_64-2.7/postal/utils
    copying postal/utils/encoding.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
    copying postal/utils/__init__.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
    copying postal/utils/omitted.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
    copying postal/utils/enum.py -> build/lib.macosx-10.9-x86_64-2.7/postal/utils
    copying postal/pyutils.h -> build/lib.macosx-10.9-x86_64-2.7/postal
    running build_ext
    building 'postal._expand' extension
    creating build/temp.macosx-10.9-x86_64-2.7
    creating build/temp.macosx-10.9-x86_64-2.7/postal
    gcc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -g -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/usr/local/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c postal/pyexpand.c -o build/temp.macosx-10.9-x86_64-2.7/postal/pyexpand.o -std=c99
    postal/pyexpand.c:2:10: fatal error: 'libpostal/libpostal.h' file not found
    #include <libpostal/libpostal.h>
             ^~~~~~~~~~~~~~~~~~~~~~~
    1 error generated.
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python -u -c "import setuptools, tokenize;__file__='/private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-install-guQ8Tb/postal/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-record-0EXPw6/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/r7/bvmh1vvx41d63snvgbdz7bl40000gr/T/pip-install-guQ8Tb/postal/

parse_address leaks memory on Python2.7 when passing language and/or country

Tested with Python2.7 and Python3.4.
Passing any string value for either language or country in a call to postal.parser.parse_address leaks memory on Python2.7. Unicode vs string does not change it. Garbage collection has no impact. Neither has regular reloading of the Python code modules. I've tried reloading the _postal shared library, but apparently the Python reload machinery does not actually release the .so once loaded, so I gave up on this.

Known workarounds:
use Python3
Only pass a value, never pass non-empty language, never pass non-empty country

We're talking around 40 bytes lost per passed country/language argument, per invocation, starting from two-character str. Empty strings do not leak. Single-character strings do not leak (assuming internal Python optimizations kick in to reuse single-char str instances).

Observed on this system:

$ uname -a
Linux ron-VirtualBox 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/version
Linux version 3.13.0-91-generic (buildd@lgw01-21) (gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3) ) #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016
$ python2.7 --version && python3 --version
Python 2.7.6
Python 3.4.3
$ (pip2.7 freeze && pip3 freeze) | grep postal
postal==0.3
postal==0.3

Note: the 2.7.6 is Ubuntu's standard Python version. I have seen the same behavior with Python2.7.11 compiled from source (with --enable-unicode=ucs4) on the same machine, and also on a Centos box.
Libpostal (the C library) is installed system global, so both Python versions' postal packages definitely layer onto the same libpostal.so.

Reproduction script (supports both Python2 and Python3)

"""Demonstrate leaking memory on calling postal.parser.parse_address with
`language` and/or `country` argument(s).
Leak only observable on Python 2.7. Python 3.4 memory usage completely stable.
Module reloading and garbage collection have zero impact.
Using unicode or str for country/language arguments has zero impact.
Length of country/language strings has impact: longer strings leak more memory.
"""


from __future__ import print_function

import resource
import gc
import imp

import postal.parser

try:
    # Py3k
    _ = xrange
except NameError:
    xrange = range

def reload_all_postal_python_modules():
    # reload everything implicitly pulled in by importing postal.parse
    # >>> import sys
    # >>> import postal.parse
    # >>> [name for (name, mod) in sys.modules.items() if 'post' in name and mod]
    # ['postal', 'postal.utils.encoding', 'postal._parser', 'postal.parser', 'postal.utils']
    import postal.parser
    import postal
    import postal.utils
    import postal.utils.encoding
    postal.utils.encoding = imp.reload(postal.utils.encoding)
    postal.utils = imp.reload(postal.utils)
    globals()['postal'].parser = imp.reload(postal.parser)
    globals()['postal'] = imp.reload(postal)

def format_maxrss():
    return "%dkiB" % (resource.getrusage(resource.RUSAGE_SELF).ru_maxrss, )

def run(per_spin=50000, spins=20):
    parse_invocations = 0
    print("maxrss=%s at startup" % (format_maxrss(), ))
    for spin in xrange(spins):
        for invocation in xrange(per_spin):
            # memory continues to grow linearly per invocation
            # ONLY IF either country or language or both are passed.
            # Length of passed value for country / language directly affects rate of memory usage growth.
            # * language='idontknow' => ~47MB per million invocations
            # * country='idontknow' => ~47MB per million invocations
            # * language='idontknowbutletsmakethisalittlelongernow' => ~80MB per million invocations
            # * country='idontknow', language='idontknow' => ~95MB per million invocations
            _ = postal.parser.parse_address("Hello", language='idontknow')
        parse_invocations += per_spin
        print("maxrss=%s after spin %d (%d calls to postal.parser.parse_address)" % (format_maxrss(), spin, parse_invocations))

        # reloading postal Python modules regularly does not influence memory usage at all
        reload_all_postal_python_modules()
        # garbage collection does not influence memory usage at all
        gc.collect()

if __name__ == '__main__':
    run()

Example output:

$ python2.7 pypostal_memory_leak_demo.py 
maxrss=961376kiB at startup
maxrss=963748kiB after spin 0 (50000 calls to postal.parser.parse_address)
maxrss=966008kiB after spin 1 (100000 calls to postal.parser.parse_address)
maxrss=968380kiB after spin 2 (150000 calls to postal.parser.parse_address)
maxrss=971024kiB after spin 3 (200000 calls to postal.parser.parse_address)
maxrss=973140kiB after spin 4 (250000 calls to postal.parser.parse_address)
maxrss=975784kiB after spin 5 (300000 calls to postal.parser.parse_address)
maxrss=977904kiB after spin 6 (350000 calls to postal.parser.parse_address)
maxrss=980548kiB after spin 7 (400000 calls to postal.parser.parse_address)
maxrss=982664kiB after spin 8 (450000 calls to postal.parser.parse_address)
maxrss=985308kiB after spin 9 (500000 calls to postal.parser.parse_address)
maxrss=987688kiB after spin 10 (550000 calls to postal.parser.parse_address)
maxrss=990072kiB after spin 11 (600000 calls to postal.parser.parse_address)
maxrss=992188kiB after spin 12 (650000 calls to postal.parser.parse_address)
maxrss=994832kiB after spin 13 (700000 calls to postal.parser.parse_address)
maxrss=997212kiB after spin 14 (750000 calls to postal.parser.parse_address)
maxrss=999596kiB after spin 15 (800000 calls to postal.parser.parse_address)
maxrss=1001712kiB after spin 16 (850000 calls to postal.parser.parse_address)
maxrss=1004356kiB after spin 17 (900000 calls to postal.parser.parse_address)
maxrss=1006736kiB after spin 18 (950000 calls to postal.parser.parse_address)
maxrss=1009120kiB after spin 19 (1000000 calls to postal.parser.parse_address)
$ python3 pypostal_memory_leak_demo.py 
maxrss=962880kiB at startup
maxrss=962880kiB after spin 0 (50000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 1 (100000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 2 (150000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 3 (200000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 4 (250000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 5 (300000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 6 (350000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 7 (400000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 8 (450000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 9 (500000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 10 (550000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 11 (600000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 12 (650000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 13 (700000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 14 (750000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 15 (800000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 16 (850000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 17 (900000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 18 (950000 calls to postal.parser.parse_address)
maxrss=962880kiB after spin 19 (1000000 calls to postal.parser.parse_address)

HTH

Fails to install from pip with a complaint about a missing README

pip install pypostal
Collecting pypostal
  Downloading https://files.pythonhosted.org/packages/e7/96/145d60501264de505926f48f52c29c5fd20d066f2e158107a66ea963cf56/pyPostal-1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-x4mnpikd/pypostal/setup.py", line 10, in <module>
        long_description=codecs.open('README.rst', "r", "utf-8").read(),
      File "/var/lang/lib/python3.7/codecs.py", line 898, in open
        file = builtins.open(filename, mode, buffering)
    FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-x4mnpikd/pypostal/

Cannot Allocate memory error

While parsing address by giving a string it shows Cannot Allocate memory error
My Machine Details - RAM 4GB, High End Processor.
Help me to get this solved.

CFFI bindings for PyPy

CPython extensions are significantly slower on PyPy.
It'd be useful to support PyPy as well using CFFI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.