Comments (12)
Shouldn't be an issue to use whatever charset you prefer. Have you tried?
The engine itself does not interpret strings, just stores them as a bunch of bytes.
from pinba2.
Thx for the info. I will give it a try and come back with the results
from pinba2.
One potential issue i can think of is protobuf-c implementation on the client side.
In case you use it from php - it might store them as c-strings (i.e. terminating null byte) - shouldn't be a problem for utf8, but maybe other charsets.
Engine itself, parses protobuf as data+length, so null bytes should not be a problem there.
In any case - curious to see if it works for you, if it doesn't - we'll investigate.
from pinba2.
I did test this locally with "simple" utf8 script such as 'κόσμε', and it works.
The Pinba2 server in use was the official container from this project. The client-side were the php pinba extension (ver. 1.1.1) from Ubuntu Focal, and one pure-php protobuf implementation.
I created the reports table as
CREATE TABLE `report_by_script_name` (
`script` varchar(64) NOT NULL,
...
) ENGINE=PINBA DEFAULT CHARSET=utf8 COMMENT='v2/request/60/~script/no_percentiles/no_filters';
And I had to add in the code querying its data: $db->set_charset('utf8');
from pinba2.
ps: about NULL chars: indeed, if I add chr(0)
in the middle of the script_name, the corresponding line does not show up any more in the db after calling flush
on the client side. I did not investigate the raw protobuf packet to find out what was being sent (or not) in that case...
from pinba2.
Ok, great to know.
I'll keep this one open for a bit longer, please let me know if you get to check out the null chars - is it a client php extension issue or an engine issue.
from pinba2.
WDYT about changing the sql files which are part of this repo? At the very least we could add a comment in them mentioning Latin1 is not a requirement...
from pinba2.
Sure, happy to accept a PR with utf8 everywhere in sql files.
from pinba2.
Sure, will do.
Btw, I checked the "raw message" being sent when there is a NUL byte in the middle of the php string used to indicate the script_name, using Wireshark as protobuf dissector:
- both the php extension and my pure-php polyfill do send a message in that scenario
- the packet sent by the php extension is dissected as having a string with a value made of all chars up to the null one, excluding it
- the packet sent by the polyfill apparently sends "garbled" data
So it seems that the code dropping those packets might be on the server-side (I will test that again with Pinba2, last tests were ran against Pinba1)
from pinba2.
if I add chr(0) in the middle of the script_name, the corresponding line does not show up any more in the db after calling flush on the client side
Further testing disproved the above.
I ran fresh tests with a NUL char in the middle of hostname, server_name and script_name, after having patched my own lib to work the same way as the php extension (ie. truncate strings at the first NUL char found), and both pinba1 and pinba2 engines are happy with what they get.
I wonder why I got no-data results in the first rounds of testing... probably because sending a single NUL char results in an empty string, which might not be considered a valid script_name/server_name/hostname.
In any case: sending of NUL chars does not seem to have much to do with the chosen character set encoding for retrieving the data once it's in the DB (at least as long we are talking about UTF8 vs Latin1 - they both use the same encoding for NUL).
I will send a PR as discussed above.
from pinba2.
I kind of assumed that pinba and pinba2 engines should treat strings with null-bytes in the middle differently, attributing to slightly different ways they unpack incoming protobuf packets.
Pinba1 - truncating, pinba2 - preserving.
For uts8 this point is moot since there is you're "not supposed to" havre null-bytes inside strings.
Thank you for the PR's, have been very busy lately, will review asap.
from pinba2.
I kind of assumed that pinba and pinba2 engines should treat strings with null-bytes in the middle differently, attributing to slightly different ways they unpack incoming protobuf packets.
This is possible.
What I did verify is that, for the pinba php extension, the truncation happens on the client side - possibly within code generated by the protoc compiler - so in those tests the NUL char was not sent to the engine at all.
As for being able to send a NUL char as part of the protobuffer packet: I did manage to do that by using my own php implementation of the client, but the results were mixed. I tried to decode the raw protobuf packet to check its contents: using protoc --decode_raw
, the string was decoded correctly, and NUL displayed as \u0000
; using wireshark the string was highlighted as being malformed. This prompted me to alter the behaviour of the php code to behave the same way as the php extension, as there seems to be little point in being able to send NUL chars if in any case all it would be helpful for is to expose weird corner cases in the receiver side. For all I know, it could even be used for some hacking attempt :-) I can not report reliably of behaviour differences between pinba1 and pinba2.
For uts8 this point is moot since there is you're "not supposed to" have null-bytes inside strings.
Not to be pedantic, but the NUL character is valid in utf8, just as it is valid in ascii. It is just C strings that have issues with that ;-)
from pinba2.
Related Issues (12)
- Make sure we run with jemalloc enabled in default docker image HOT 1
- Move to c++14
- Any chance there will be a support of counters? HOT 7
- Improve manual build instructions for centos7 (maybe add pre-built docker images) HOT 5
- Docker Image does NOT build HOT 16
- Проблема при компиляции pinba2 HOT 7
- Idea: improve select-s performance with lockless word_id -> word translations HOT 1
- Ошибки при сборке. HOT 10
- Docker-compose issues HOT 3
- Error when compiling pinba2 with maridb 10.5.10 (Debian buster) HOT 2
- MySQL died after PHP send data to it. HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pinba2.