Comments (21)
I propose the following additional testcase:
with_conn (fun db ->
simple_query db {|
CREATE TEMPORARY TABLE this_test (id text);
INSERT INTO this_test (id) VALUES('test-\303\244-test')
|}
>>= fun _ ->
execute db "SELECT id FROM this_test"
>>| function
| [[ result ]] ->
assert_equal (Some {|test-\303\244-test|}) (Pgx.Value.to_string result)
| _ -> assert false
)
)
This inserts the pure ASCII string with backslashes test-\303\244-test
into the database and checks that this ASCII string is returned. (Currently, the non-ASCII string test-ä-test
is returned instead.)
Note that in all recent PostgreSQL versions, adhering to the SQL standard, the backslashes in 'test-\303\244-test'
are treated as-is. You would have to write E'test-\303\244-test'
to have it interpreted as test-ä-test
.
from pgx.
I added this as a test in pull request #40 but it seems to be working for me.
from pgx.
Hm actually I see you meant inserting manually. It's weird that this test works:
with_conn (fun db ->
simple_query db {|
CREATE TEMPORARY TABLE this_test (id text);
INSERT INTO this_test (id) VALUES('test-ä-test')
|}
>>= fun _ ->
execute db "SELECT id FROM this_test"
>>| function
| [[ result ]] ->
assert_equal (Some "test-ä-test") (Pgx.Value.to_string result)
| _ -> assert false
)
)
But this fails:
let expect = "test-ä-test" in
with_conn (fun db ->
simple_query db {|
CREATE TEMPORARY TABLE this_test (id text);
INSERT INTO this_test (id) VALUES('test-ä-test')
|}
>>= fun _ ->
execute db ~params:Pgx.Value.[ of_string expect]
"SELECT id FROM this_test WHERE id = $1"
>>| function
| [[ result ]] ->
assert_equal (Some expect) (Pgx.Value.to_string result)
| _ -> assert false
)
)
:\
from pgx.
Since it's just WHERE statements affected, could this be some sort of collation thing and our encoding is confusing it?
from pgx.
Could it be this encode step in the execute_iter
function?
let params =
List.map (fun s ->
Value.to_string s
|> Option.map encode_unprintable)
params
in
from pgx.
@brendanlong Regarding your comment #38 (comment)
This is no surprise at all.
with_conn (fun db -> simple_query db {| CREATE TEMPORARY TABLE this_test (id text); INSERT INTO this_test (id) VALUES('test-ä-test') |}
This inserts the correct value test-ä-test
into the database.
>>= fun _ -> execute db "SELECT id FROM this_test" >>| function | [[ result ]] -> assert_equal (Some "test-ä-test") (Pgx.Value.to_string result) | _ -> assert false ) )
This loads the value test-ä-test
from the database, and wrongly "unescapes" it. However, since test-ä-test
contains no backslash sequence, it is unescaped to itself.
from pgx.
@brendanlong This issue is not restricted to WHERE clauses. It affects parameter encoding as well as result decoding! But since it affects both in a "compatible" way, it is hard to debug and one gets easily confused. I had to check in parallel with other SQL interfaces (the psql
command line tool as well as some simple Python script) to make sure I don't get confused along the way.
from pgx.
Ahh that makes sense. That's a lot for the test case and explanation! It sounds likely that it's the issue @bradlangel found. We'll try to put up a fix as soon as we can but I'd also be happy to review a pull request if you put a fix up first (I probably don't have time to write and test this until next week).
from pgx.
Hm I created a series of test cases and I don't really understand what's wrong. This seems to specifically only be a problem for WHERE id = $1
. Inserting the literal string and then selecting the entire table back works fine:
(* this works *)
let expect = "test-ä-test" in
with_conn (fun db ->
simple_query db {|
CREATE TEMPORARY TABLE this_test (id text);
INSERT INTO this_test (id) VALUES ('test-ä-test')
|}
>>= fun _ ->
execute db "SELECT id FROM this_test"
>>| function
| [[ result ]] -> assert_equal (Some expect) (Pgx.Value.to_string result)
| _ -> assert false
)
(* this doesn't work *)
let expect = "test-ä-test" in
with_conn (fun db ->
simple_query db {|
CREATE TEMPORARY TABLE this_test (id text);
INSERT INTO this_test (id) VALUES ('test-ä-test')
|}
>>= fun _ ->
execute db ~params:[ Pgx.Value.of_string expect ]
"SELECT id FROM this_test WHERE id = $1"
>>| function
| [[ result ]] -> assert_equal (Some expect) (Pgx.Value.to_string result)
| [] -> assert_failure "Expected one row but got zero"
| _ -> assert false
)
Note that bound parameters are sent as prepared statement parameters, not inserted literally into the query, so I don't think the E''
stuff applies.
from pgx.
Note that the other test cases show that result decoding seems to work fine no matter how we insert the values.
from pgx.
Hm this text seems to agree with you though:
https://www.postgresql.org/docs/10/protocol-overview.html#PROTOCOL-FORMAT-CODES
The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)
Our escaping is specifically there to prevent embedded nulls so maybe we have to send strings using the binary protocol?
from pgx.
Oh right, the issue is that double-decode results, which is safe. Right..
Sorry this is hard to keep track of.
from pgx.
I can't find any documentation for the binary protocol so this will be.. fun.
from pgx.
The PostgreSQL manual is pretty comprehensive in this regard:
Is there anything important missing? Then it's perhaps worth asking on their (very active and friendly) mailing list.
from pgx.
Unfortunately the only thing I can find for parameter encoding is:
The text representation of values is whatever strings are produced and accepted by the input/output conversion functions for the particular data type. In the transmitted representation, there is no trailing null character; the frontend must add one to received values if it wants to process them as C strings. (The text format does not allow embedded nulls, by the way.)
Binary representations for integers use network byte order (most significant byte first). For other data types consult the documentation or source code to learn about the binary representation. Keep in mind that binary representations for complex data types might change across server versions; the text format is usually the more portable choice.
https://www.postgresql.org/docs/11/protocol-overview.html#PROTOCOL-FORMAT-CODES
And I can't figure out where in the documentation the binary representations are implemented, or how to tell Postgres which data type a binary parameter is (the "extended query" protocol docs say there's an OID column in Parse, but it doesn't say anything about what format to send unspecified format data in).
from pgx.
Would you mind asking this on the respective PostgreSQL mailing list?
I think the pgsql-general
list is the best fit for this question:
Maybe pgsql-interfaces
is also a good fit, but that one looks a bit inactive.
If you don't receive a good answer there, you should go straight to pgsql-hackers
:
from pgx.
To do this we need to:
- Make Pgx_value.t have an OID and binary string
- Add a way to register encoders and decoders
- Add encoders and decoders for the build-in types
- Make our Parse message send binary and a list of OID's
- Make our Bind message send binary (assumes the same types as the Parse, so we should probably have error checking here that the bound params are the same type as the ones sent with Parse)
The annoying bit here is that the types need to exactly match between two different messages.
The encoding is generally straightforward (usually just length + bytes). You can see examples here: https://github.com/MagicStack/py-pgproto/tree/6079e5b2addf7717aabbdcdb7825d6b68c731409/codecs
For the OID's, asyncpg has a list here: https://github.com/MagicStack/asyncpg/blob/master/asyncpg/protocol/pgtypes.pxi
from pgx.
#83 fixes this in a more straightforward way by adding Pgx_value.of/to_binary and not treating strings as binary by default.
from pgx.
@vog This should be fixed. If you use Pgx_value.of/to_string
, the strings will not be quoted in any way (which means you can't pass null chars this way), and if you use the new Pgx_value.of/to_binary
it will handle nulls and other things correctly as long as you're putting the data in a bytea column.
from pgx.
Thanks! I'll keep that in mind when I have the next opportunity to try Pgx. (At the moment we switched to Caqti in the project that was originally designated to use Pgx, but that decision is not final and the top selling points of Pgx are a simpler yet typesafe API, as well the independency from native C libraries (libpq in this case).)
from pgx.
If you're ok with the C dependency and a libpq wrapper does what you want, I'd probably still recommend that in general just because libpq is a lot more mature, but if you need a pure OCaml Postgres library now it exists :)
from pgx.
Related Issues (20)
- Module name clash HOT 1
- Support connection service file HOT 1
- Remove Pgx_async_test from the public API
- Pgx_eof errors are hard to debug HOT 1
- Support TLS encryption HOT 4
- Support TLS for Pgx_lwt_unix
- Support PG* environment variables for TLS configuration HOT 1
- Nested Transactions
- More guards around transactions
- Server hangs up during authentication HOT 5
- Support SASL and SCRAM-SHA-256
- Any ORM suite for OCaml , working with pgx together?
- Use BINARY mode for results HOT 1
- Wrong clause number in LGPL special exception
- Deprecated `Stream` module HOT 1
- unix_domain_socket_dir HOT 1
- Add Date and Time Pgx.Values for Lwt backend HOT 4
- Run dune-release lint in CI HOT 1
- Build in CI with `-p` argument HOT 1
- Add Pgx_value_ptime with ptime converters HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgx.