Comments (14)
Could you get the stack trace? Take a look at [1] for instructions. Basically, it is:
ulimit -c unlimited
pg_ctl start -D /path/to/pgdata
psql -c "my query here" mydb
[... crash ...]
gdb /path/to/postgres /path/to/pgdata/core
(gdb) bt
[1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
from pg_similarity.
I probably won't be able to do this anytime soon.
I could provide you the 3400+ organization_names, if you are able to set up a test for yourself and see if you can replicate the problem.
from pg_similarity.
Here is a csv of the organizations
from pg_similarity.
Please send me the file, I will test over the weekend. What is your OS?
from pg_similarity.
%uname -a
Linux mshapiro 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2105-08-04) x86_64 GNU/Linux
from pg_similarity.
To speed-up, can you also share the import code and table creation?
I would like to make sure you used the one of the LATIN* encoding. Can you see in your db entries like 2910,"University of León", with the proper accent?
Anyway, even with the wrong encoding, the query shouldn't have failed.
from pg_similarity.
I can see the accent.
CREATE DATABASE mshapiro
WITH OWNER = mshapiro
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
CONNECTION LIMIT = -1;
CREATE TABLE organizations
(
organization_id integer not null,
organization_name character varying(300) NOT NULL,
CONSTRAINT pk_organizations PRIMARY KEY (organization_id)
CONSTRAINT org_name_uk UNIQUE (organization_name)
)
WITH (
OIDS=TRUE
);
from pg_similarity.
@shapiro2 thanks for the test case. I reproduced the bug here. I'll fix it later.
from pg_similarity.
Great. I think I may have hit a similar bug with one or two other measures (same query), but I didn't keep track so I can't be sure which ones. I'll run the query with other measures and let you know if I hit it or not. Might be a few days before I can get back to you on it. I know lev and qgram worked OK.
from pg_similarity.
I ran the same query for all but the hamming and soundex operators.
They all worked except needlemanwunsch (reported above).
But NOTE that the ~?? operator (jacard) makes this operator difficult to use in perl since the perl DBI uses the ? as a placeholder and wants to substitute a bound value for the ?.
from pg_similarity.
@shapiro2 Take a look at [1] to solve your placeholder problem.
[1] http://blog.endpoint.com/2015/01/dbdpg-escaping-placeholders-with.html
from pg_similarity.
Thanks. I saw that. In general, though, I have run into the ? issue with other interfaces, too, not just Perl.
For example, in one tool I have to use ~????. Also, for our system, we don't have the laster version of DBD::Pg installed and we have so many scripts using DBD::Pg in production that upgrading the version of DBD::Pg for this is not something that our admins will do. I personally don't think I will use this operator, but it might be a good idea to offer an alternative (keep ~??, but also have one like ~--)
from pg_similarity.
@shapiro2 There are so many operators and few options to consider. Someone already suggested to deprecate this operator for the same reason. I'll consider it in the next release. You can always replace this operator with another one that has a different name (drop the old one and create another -- look at pg_similarity--1.0.sql).
from pg_similarity.
@shapiro2 Thanks for your report. Fixed in be1a8b0.
from pg_similarity.
Related Issues (20)
- could not load library HOT 18
- Cannot install HOT 2
- Could not open extension control file “/usr/share/postgresql/10/extension/pg_similarity.control”: No such file or directory HOT 2
- cosine and jaro on perfect matches HOT 1
- Example of how to add Indexes HOT 4
- Regression tests fail with PG12 HOT 6
- Comparing timestamps
- Empty strings with ~*~ crashes server
- pg_config editting HOT 1
- Wrong loop variable in levenshtein.c
- Unable to install the extension in rhel7 and postgres version is Postgres13.3 HOT 1
- Querying a table with jaccard similarity with 1.6 million records take 12 seconds HOT 2
- set PGS_MAX_STR_LEN at runtime HOT 3
- Possible Bug in Dice (and Jaccard) Function
- Trying to get pg_similarity extension on windows
- ERROR: argument exceeds the maximum length of 1024 bytes HOT 1
- release 版本用例跑不通, master分支可以跑通,但没有发布release,希望能发布下
- pg_similarity crashes on PG16 HOT 1
- New feature: compare array of ids
- Extension does not compile against PG16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pg_similarity.