Giter Site home page Giter Site logo

Comments (21)

vog avatar vog commented on July 29, 2024 1

I propose the following additional testcase:

          with_conn (fun db ->
            simple_query db {|
              CREATE TEMPORARY TABLE this_test (id text);
              INSERT INTO this_test (id) VALUES('test-\303\244-test')
            |}
            >>= fun _ ->
            execute db "SELECT id FROM this_test"
            >>| function
            | [[ result ]] ->
              assert_equal (Some {|test-\303\244-test|}) (Pgx.Value.to_string result)
            | _ -> assert false
          )
        )

This inserts the pure ASCII string with backslashes test-\303\244-test into the database and checks that this ASCII string is returned. (Currently, the non-ASCII string test-ä-test is returned instead.)

Note that in all recent PostgreSQL versions, adhering to the SQL standard, the backslashes in 'test-\303\244-test' are treated as-is. You would have to write E'test-\303\244-test' to have it interpreted as test-ä-test.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

I added this as a test in pull request #40 but it seems to be working for me.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Hm actually I see you meant inserting manually. It's weird that this test works:

          with_conn (fun db ->
            simple_query db {|
              CREATE TEMPORARY TABLE this_test (id text);
              INSERT INTO this_test (id) VALUES('test-ä-test')
            |}
            >>= fun _ ->
            execute db "SELECT id FROM this_test"
            >>| function
            | [[ result ]] ->
              assert_equal (Some "test-ä-test") (Pgx.Value.to_string result)
            | _ -> assert false
          )
        )

But this fails:

          let expect = "test-ä-test" in
          with_conn (fun db ->
            simple_query db {|
              CREATE TEMPORARY TABLE this_test (id text);
              INSERT INTO this_test (id) VALUES('test-ä-test')
            |}
            >>= fun _ ->
            execute db ~params:Pgx.Value.[ of_string expect]
              "SELECT id FROM this_test WHERE id = $1"
            >>| function
            | [[ result ]] ->
              assert_equal (Some expect) (Pgx.Value.to_string result)
            | _ -> assert false
          )
        )

:\

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Since it's just WHERE statements affected, could this be some sort of collation thing and our encoding is confusing it?

from pgx.

bradlangel avatar bradlangel commented on July 29, 2024

Could it be this encode step in the execute_iter function?

let params =
        List.map (fun s ->
            Value.to_string s
            |> Option.map encode_unprintable)
          params
      in

from pgx.

vog avatar vog commented on July 29, 2024

@brendanlong Regarding your comment #38 (comment)

This is no surprise at all.

          with_conn (fun db ->
            simple_query db {|
              CREATE TEMPORARY TABLE this_test (id text);
              INSERT INTO this_test (id) VALUES('test-ä-test')
            |}

This inserts the correct value test-ä-test into the database.

            >>= fun _ ->
            execute db "SELECT id FROM this_test"
            >>| function
            | [[ result ]] ->
              assert_equal (Some "test-ä-test") (Pgx.Value.to_string result)
            | _ -> assert false
          )
        )

This loads the value test-ä-test from the database, and wrongly "unescapes" it. However, since test-ä-test contains no backslash sequence, it is unescaped to itself.

from pgx.

vog avatar vog commented on July 29, 2024

@brendanlong This issue is not restricted to WHERE clauses. It affects parameter encoding as well as result decoding! But since it affects both in a "compatible" way, it is hard to debug and one gets easily confused. I had to check in parallel with other SQL interfaces (the psql command line tool as well as some simple Python script) to make sure I don't get confused along the way.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Ahh that makes sense. That's a lot for the test case and explanation! It sounds likely that it's the issue @bradlangel found. We'll try to put up a fix as soon as we can but I'd also be happy to review a pull request if you put a fix up first (I probably don't have time to write and test this until next week).

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Hm I created a series of test cases and I don't really understand what's wrong. This seems to specifically only be a problem for WHERE id = $1. Inserting the literal string and then selecting the entire table back works fine:

(* this works *)
let expect = "test-ä-test" in
with_conn (fun db ->
  simple_query db {|
      CREATE TEMPORARY TABLE this_test (id text);
      INSERT INTO this_test (id) VALUES ('test-ä-test')
    |}
  >>= fun _ ->
  execute db "SELECT id FROM this_test"
  >>| function
  | [[ result ]] -> assert_equal (Some expect) (Pgx.Value.to_string result)
  | _ -> assert false
)
(* this doesn't work *)
let expect = "test-ä-test" in
with_conn (fun db ->
  simple_query db {|
        CREATE TEMPORARY TABLE this_test (id text);
        INSERT INTO this_test (id) VALUES ('test-ä-test')
      |}
  >>= fun _ ->
  execute db ~params:[ Pgx.Value.of_string expect ]
    "SELECT id FROM this_test WHERE id = $1"
  >>| function
  | [[ result ]] -> assert_equal (Some expect) (Pgx.Value.to_string result)
  | [] -> assert_failure "Expected one row but got zero"
  | _ -> assert false
)

Note that bound parameters are sent as prepared statement parameters, not inserted literally into the query, so I don't think the E'' stuff applies.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Note that the other test cases show that result decoding seems to work fine no matter how we insert the values.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Hm this text seems to agree with you though:

https://www.postgresql.org/docs/10/protocol-overview.html#PROTOCOL-FORMAT-CODES

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)

Our escaping is specifically there to prevent embedded nulls so maybe we have to send strings using the binary protocol?

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Oh right, the issue is that double-decode results, which is safe. Right..

Sorry this is hard to keep track of.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

I can't find any documentation for the binary protocol so this will be.. fun.

from pgx.

vog avatar vog commented on July 29, 2024

The PostgreSQL manual is pretty comprehensive in this regard:

Is there anything important missing? Then it's perhaps worth asking on their (very active and friendly) mailing list.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

Unfortunately the only thing I can find for parameter encoding is:

The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)

Binary representations for integers use network byte order (most significant byte first). For other data types consult the documentation or source code to learn about the binary representation. Keep in mind that binary representations for complex data types might change across server versions; the text format is usually the more portable choice.

https://www.postgresql.org/docs/11/protocol-overview.html#PROTOCOL-FORMAT-CODES

And I can't figure out where in the documentation the binary representations are implemented, or how to tell Postgres which data type a binary parameter is (the "extended query" protocol docs say there's an OID column in Parse, but it doesn't say anything about what format to send unspecified format data in).

from pgx.

vog avatar vog commented on July 29, 2024

Would you mind asking this on the respective PostgreSQL mailing list?

I think the pgsql-general list is the best fit for this question:

Maybe pgsql-interfaces is also a good fit, but that one looks a bit inactive.

If you don't receive a good answer there, you should go straight to pgsql-hackers:

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

To do this we need to:

  • Make Pgx_value.t have an OID and binary string
  • Add a way to register encoders and decoders
  • Add encoders and decoders for the build-in types
  • Make our Parse message send binary and a list of OID's
  • Make our Bind message send binary (assumes the same types as the Parse, so we should probably have error checking here that the bound params are the same type as the ones sent with Parse)

The annoying bit here is that the types need to exactly match between two different messages.

The encoding is generally straightforward (usually just length + bytes). You can see examples here: https://github.com/MagicStack/py-pgproto/tree/6079e5b2addf7717aabbdcdb7825d6b68c731409/codecs

For the OID's, asyncpg has a list here: https://github.com/MagicStack/asyncpg/blob/master/asyncpg/protocol/pgtypes.pxi

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

#83 fixes this in a more straightforward way by adding Pgx_value.of/to_binary and not treating strings as binary by default.

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

@vog This should be fixed. If you use Pgx_value.of/to_string, the strings will not be quoted in any way (which means you can't pass null chars this way), and if you use the new Pgx_value.of/to_binary it will handle nulls and other things correctly as long as you're putting the data in a bytea column.

from pgx.

vog avatar vog commented on July 29, 2024

Thanks! I'll keep that in mind when I have the next opportunity to try Pgx. (At the moment we switched to Caqti in the project that was originally designated to use Pgx, but that decision is not final and the top selling points of Pgx are a simpler yet typesafe API, as well the independency from native C libraries (libpq in this case).)

from pgx.

brendanlong avatar brendanlong commented on July 29, 2024

If you're ok with the C dependency and a libpq wrapper does what you want, I'd probably still recommend that in general just because libpq is a lot more mature, but if you need a pure OCaml Postgres library now it exists :)

from pgx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.