Giter Site home page Giter Site logo

Comments (14)

eulerto avatar eulerto commented on July 20, 2024

Could you get the stack trace? Take a look at [1] for instructions. Basically, it is:

ulimit -c unlimited
pg_ctl start -D /path/to/pgdata
psql -c "my query here" mydb
[... crash ...]
gdb /path/to/postgres /path/to/pgdata/core
(gdb) bt

[1] https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

I probably won't be able to do this anytime soon.

I could provide you the 3400+ organization_names, if you are able to set up a test for yourself and see if you can replicate the problem.

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

Here is a csv of the organizations

org.txt

from pg_similarity.

RazvanFromAltair avatar RazvanFromAltair commented on July 20, 2024

Please send me the file, I will test over the weekend. What is your OS?

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

%uname -a

Linux mshapiro 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2105-08-04) x86_64 GNU/Linux

from pg_similarity.

RazvanFromAltair avatar RazvanFromAltair commented on July 20, 2024

To speed-up, can you also share the import code and table creation?
I would like to make sure you used the one of the LATIN* encoding. Can you see in your db entries like 2910,"University of León", with the proper accent?

Anyway, even with the wrong encoding, the query shouldn't have failed.

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

I can see the accent.

CREATE DATABASE mshapiro
  WITH OWNER = mshapiro
       ENCODING = 'UTF8'
       TABLESPACE = pg_default
       LC_COLLATE = 'en_US.UTF-8'
       LC_CTYPE = 'en_US.UTF-8'
       CONNECTION LIMIT = -1;
CREATE TABLE organizations
(
  organization_id integer not null,
  organization_name character varying(300) NOT NULL,
  CONSTRAINT pk_organizations PRIMARY KEY (organization_id)
  CONSTRAINT org_name_uk UNIQUE (organization_name)
)
WITH (
  OIDS=TRUE
);

from pg_similarity.

eulerto avatar eulerto commented on July 20, 2024

@shapiro2 thanks for the test case. I reproduced the bug here. I'll fix it later.

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

Great. I think I may have hit a similar bug with one or two other measures (same query), but I didn't keep track so I can't be sure which ones. I'll run the query with other measures and let you know if I hit it or not. Might be a few days before I can get back to you on it. I know lev and qgram worked OK.

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

I ran the same query for all but the hamming and soundex operators.
They all worked except needlemanwunsch (reported above).

But NOTE that the ~?? operator (jacard) makes this operator difficult to use in perl since the perl DBI uses the ? as a placeholder and wants to substitute a bound value for the ?.

from pg_similarity.

eulerto avatar eulerto commented on July 20, 2024

@shapiro2 Take a look at [1] to solve your placeholder problem.

[1] http://blog.endpoint.com/2015/01/dbdpg-escaping-placeholders-with.html

from pg_similarity.

shapiro2 avatar shapiro2 commented on July 20, 2024

Thanks. I saw that. In general, though, I have run into the ? issue with other interfaces, too, not just Perl.
For example, in one tool I have to use ~????. Also, for our system, we don't have the laster version of DBD::Pg installed and we have so many scripts using DBD::Pg in production that upgrading the version of DBD::Pg for this is not something that our admins will do. I personally don't think I will use this operator, but it might be a good idea to offer an alternative (keep ~??, but also have one like ~--)

from pg_similarity.

eulerto avatar eulerto commented on July 20, 2024

@shapiro2 There are so many operators and few options to consider. Someone already suggested to deprecate this operator for the same reason. I'll consider it in the next release. You can always replace this operator with another one that has a different name (drop the old one and create another -- look at pg_similarity--1.0.sql).

from pg_similarity.

eulerto avatar eulerto commented on July 20, 2024

@shapiro2 Thanks for your report. Fixed in be1a8b0.

from pg_similarity.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.