halaxa / json-machine Goto Github PK
View Code? Open in Web Editor NEWEfficient, easy-to-use, and fast PHP JSON stream parser
License: Apache License 2.0
Efficient, easy-to-use, and fast PHP JSON stream parser
License: Apache License 2.0
@dizirator, I see you need support for PHP 5.5. Would you like to make a pull request to make it officcial in JSON Machine?
This library should not have either of the methods: fromFile()
nor fromStream()
. Perhaps these could be kept around as a convenience, but a pure implementation should not have methods that deal with reading files or streams, because insodoing, the library only supports files and streams. If I wanted to use this library with a general purpose async framework like Amp or React, I can't, because it doesn't explicitly support them. A well designed implementation would not tie itself to particular data protocols, instead just accepting incomplete JSON fragments from a string buffer, like Duct.
foreach while parsing data is taking alot of time on live server rather than local. need help .
here is my code
$jsonFilePath = dirname(__FILE__) . "/cronJob.json";
foreach ($array as $value) {
$compare[] = "/" . $value;
}
$array = $compare;
try {
$jsonData = Items::fromFile($jsonFilePath, [
'pointer' => $array
]);
} catch (\Throwable $e) {
return $dataToFetch;
}
foreach ($jsonData as $key => $value) {
if ($key == 'id') {
$id = $value;
}
$dataToFetch[$id][$key] = $value;
}
$dataToFetch = array_values($dataToFetch);
return $dataToFetch
Currently, the precision of JSON numbers is not, in general, preserved.
Unfortunately, using JSON_BIGINT_AS_STRING, at least by itself, doesn't help, first because it converts all "big integers" to strings (see below), and because it does nothing for "big decimals" (meaning "big or little decimals").
Perhaps asking for the preservation of numeric decimals is too much to ask for in this project; if so,
please interpret this ER as a request for the preservation of integer precision.
--
print json_encode(json_decode("[\"123\", 123]", flags: JSON_BIGINT_AS_STRING))."\n";
print json_encode(json_decode("[\"123000000000000000000000000123\", 123000000000000000000000000123]", flags: JSON_BIGINT_AS_STRING))."\n";
produces:
["123",123]
["123000000000000000000000000123","123000000000000000000000000123"]
whereas for the second array, we want:
["123000000000000000000000000123",123000000000000000000000000123]
Hey!
This is going to be a really basic question, I downloaded the library last week using brew. But having some issues getting it working, never used composer before and when its trying to use the import file command its not working?
I've used the example but it falls over on trying to use the items::from file command.
The code looks like this but unsure how to fix it?
use JsonMachine\Items;
use JsonMachine\JsonDecoder\PassThruDecoder;
$users = Items::fromFile
Hello,
how can you send the header with the request?
$context_re = stream_context_create(array(
'http' => array(
'header' => "Authorization: Basic " . base64_encode($user . ":" . $pass)
)
));
$json = Items::fromFile($domain . "/exports/missions-published.json", ['debug' => true]);
print_r($json);
This tool is quite useful for me to synchronize products in my ecommerce, but I need to be able to detect when a transmission ended unexpectedly, in order to cancel the operation.
For example:
This is the format used
{
"status": "success",
"data": [
{"id": 1, "name": ...},
{"id": 2, "name": ...},
{"id": 3, "name": ...},
{"id": 4, "name": ...},
.
.
.
{"id": N, "name": ...}
]
}
But when the transmission ends unexpectedly, the format is truncated
{
"status": "success",
"data": [
{"id": 1, "name": ...},
{"id": 2, "name": ...},
{"id": 3, "name": ...},
{"id": 4, "name": ...},
.
.
.
{"id": X, "name": ...}
Note sometimes the object of the last item is complete, so there is no format error.
When processing, this library ignores that unexpected ending, I need to be able to detect that the json format has not finished correctly.
Is this possible to do?
It would be ideal to throw an exception in that case.
Thank you
This will make it more predictable as json_decode
works the same way.
The only thing needed to do is to change the line with default instantiation of ExtJsonDecoder
in Parser
.
Huge BC break - will wait to version 1.0
in /var/www/html/vendor/halaxa/json-machine/src/Parser.php:181
$accounts =\JsonMachine\JsonMachine::fromFile('data.json');
FILE:
https://mega.nz/#!SUchmYSA
key is 8nEKU0-JUQzx39x8V1_my_dELb71C12rG5knMULEySc
Never mind, someone (a stupid me) removed an index off our MySQL Join column without letting anyone know, which caused everything to slow down. Fixed that and the entire file parses in 15 minutes. JSON Machine works great!
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /var/www/html/vendor/halaxa/json-machine/src/Parser.php on line 177
Json-machine version is 0.3.2
I am trying to read file having size of 27mb.
[InvalidArgumentException]
Package halaxa/json-machine at version has a PHP requirement incompatible
with your PHP version (5.3.28)
error composer. Help me please
my version php 5.6.36
Update composer.json
Add scalar typehints
Update phpunit
...
i saw the readme code for
// this often causes Allowed Memory Size Exhausted
- $users = json_decode(file_get_contents('500MB-users.json'));
// this usually takes few kB of memory no matter the file size
+ $users = \JsonMachine\JsonMachine::fromFile('500MB-users.json');
But, what if the file is over network? or perhaps call external https / htpp valid json respond...?
Is it applicable or it will be died as well...?
I want to know the last item index of my JSON array.
I sometimes get this error reading valid json files. Any idea why this could be happening?
If I load the file into any json validator, there is no error on ','. Sometimes the position is not ',', but another arbitrary part of the json file,
PHP Fatal error: Uncaught JsonMachine\Exception\UnexpectedEndSyntaxErrorException: JSON string ended unexpectedly ',' At position 0. in /.../.../.../.../.../vendor/halaxa/json-machine/src/Parser.php:368 Stack trace: #0 /.../.../.../.../.../vendor/halaxa/json-machine/src/Parser.php(249): JsonMachine\Parser->error() #1 /.../.../.../.../.../.../load.php(92): JsonMachine\Parser->getIterator()
All exceptions in namespace JsonMachine\Exception
should extend one common exception, say JsonMachineException
, so that userland code can catch for only one type and catch anything from this library. Fell free to create pull request and participate :)
I'm loading JSON from the FDA (example query) which contains meta
values which are non-arrays. I would like to be able to retrieve the last_updated
and total
values from this JSON:
{
"meta": {
"terms": "https://open.fda.gov/terms/",
"license": "https://open.fda.gov/license/",
"last_updated": "2019-12-20",
"results": {
"skip": 0,
"limit": 105845,
"total": 105845
}
},
"results": [
"… 105845 records here …"
]
}
I've tried to get the values from meta
like this, but it doesn't work because meta
is not an iterable object:
$meta = \JsonMachine\JsonMachine::fromFile($import_file, '/meta');
foreach ($meta as $key => $val) {
if ($key == 'last_updated') {
$last_updated = $val;
}
}
Is there a way to get these values using JsonMachine?
Edit: I also tried this:
$last_updated = \JsonMachine\JsonMachine::fromFile($import_file, '/meta/last_updated');
echo $last_updated;
echo $last_updated->current();
echo end($last_updated);
I'm not familiar with using IteratorAggregate
, so I'm not sure how to get a value from the $last_updated
object.
Please let me know by reactions/voting or comments if a CLI version of JSON Machine would be useful to have. Thanks.
jm
command would take a JSON stream from stdin
, and send items one by one to stdout
wrapped in a single-item JSON object encoded as {key: value}
.
Possible usage:
$ wget <big list of users to stdout> | jm --pointer=/results
{"0": {"name": "Frank Sinatra", ...}}
{"1": {"name": "Ray Charles", ...}}
...
Another idea might be to wrap the item in a JSON list instead of an object, like so:
$ wget <big list of users to stdout> | jm --pointer=/results
[0, {"name": "Frank Sinatra", ...}]
[1, {"name": "Ray Charles", ...}]
...
@halaxa asked me to start a new thread stemming from the discussion at #73
Hopefully the script described below, which I'll call jm, will provide a basis for discussion or perhaps even the foundation of a script worthy of inclusion in the JSON Machine repository.
The --help option produces quite extensive documentation, so for now I'll just add two points:
[ EDIT: The jm
script is now in the "jm" repository. ]
Lexer
to Tokens
Parser
into 2 parts
SytnaxCheckedTokens
- Will iterate Tokens
and will only check syntax of tokens and yield it alongPhpItems
- Will iterate SyntaxCheckedTokens
and yield php structuresIt makes possible to skip syntax checking (to gain speed where applicable) very easilly. User can simply remove SyntaxCheckedTokens
from generator stack.
It will also pave way to #36.
I'm currently trying to encode a large number of arrays with json.
Does this package only support json_decode? Is there any way to do the same for json_encode?
Hi
if i "$json = JsonMachine::fromFile($_FILES["file"]["tmp_name"]);" a large file and foreach to loop it, it crash
foreach ($json as $e) {
$element = (object)$e;
}
If i add "var_dump($element);" inside the foreach loop, it back to work. why?
thanks
Peter
It should throw an error, that the key "" (empty string) was not found.
Hi guys,
the memory usage is awesome but the cpu-time is ~100x compared to json_decode (100MB json with 10000 entries).
Did you consider using a c-extension for the tokenizing/parsing?
Never wrote a extension, but looks like we could extend ext-json
or even just use ext-parle for the heavy lifting.
Could try to implement a lexer with ext-parle and look how the performance changes and then implement a parser if you guys think this is a good idea.
Greeting
In my use case I want to get all results for a certain nth depth.
"rest": [
{
"mode": "server",
"resource": [
"type": "AllergyIntolerance",
...]
"resource": [
"type": "MedicationStatement",
...]
So I want to return all resource types:
i.e.
["AllergyIntolerance","MedicationStatement"]
Is that possible?
I want to thank you from core of my heart.
This library is just awesome, especially the ease of using decoders and pointers.
A very big thank you.
Hi @halaxa
Is there a way to stop iterating the subtree? I have a JSON file of 500 GB with 10 subtrees. Right now the code would continue iterating the subtree and thus wasting alot of time doing so.
The problem is there is no way - with the current code base - to know how to break out of the for loop. I would argue that it is most useful that the iteration stops by it self instead of having to code a break your self. What do you think?
Reference: #21
It is mentioned in the guide that it's possible to use by only cloning the repository.
If anyone would kindly give a test script to demonstrate this possibility.
A bit surprising this will allocate a lot of memory with large json files, in this example a 181MB one (Found here; https://github.com/zemirco/sf-city-lots-json/blob/master/citylots.json)
<?php
require_once __DIR__ . '/vendor/autoload.php';
$client = new \GuzzleHttp\Client();
$response = $client->request('GET', 'http://127.0.0.1:8001/storage/citylots.json');
// Gets PHP stream resource from Guzzle stream
$phpStream = \GuzzleHttp\Psr7\StreamWrapper::getResource($response->getBody());
foreach (\JsonMachine\JsonMachine::fromStream($phpStream) as $key => $value) {
//
}
% php memory.php
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in /tmp/test/vendor/halaxa/json-machine/src/Parser.php on line 177
PHP Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 20480 bytes) in /tmp/test/vendor/guzzlehttp/promises/src/TaskQueue.php on line 24
I'm 100% guessing its due to json-machine is not registering as the sink
for Guzzle
http://docs.guzzlephp.org/en/stable/request-options.html#sink
Hi,
How can I know, how many Items has the JSON?
I tried with 187 Mb of json data and boom! Still get the "Allowed memory size of xxx bytes exhausted (tried to allocate xx bytes)".
So I'm trying to parse a very large json that is effectively one dimensional element series (though with nested parameters belong to that particular element)
https://drive.google.com/drive/folders/1PA9_Hq1Te7aBUoPurVdICKuCuY5Peka8?usp=sharing
Memory issues seem to happen intermittently. Really would love to be able to just foreach this, but json-machine seems to die with this even after setting PHP to higher memory limits
Adventures so far
https://gist.github.com/yosun/d1ef6ef56943bd2417b07f4970ff7447
It would be useful when it is necessary to read same file few times.)
When calling fromFile on a wrong file format I get some uncatchable errors.
I had this issue with both 0.7.0 and 1.1.1
\JsonMachine\JsonMachine::fromFile($filename, "/myattribute");
I get the following fatal error with a zip file, but I can also have similar issues with text files
message: Undefined variable $P
script: .../halaxa/json-machine/src/Parser.php
line: 115
Testing vars before $tokenType = ${$token[0]};
seems to be helful.
if($token == null || !isset($token[0])|| !isset(${$token[0]})) { throw new JsonMachineException("Error parsing stream."); }
Is it possible for Jsonmachine to return a response without using foreach?
Hello there,
| Stack trace:
| #0 /var/www/vendor/halaxa/json-machine/src/Parser.php(245): JsonMachine\Parser->error('Cannot iterate ...', NULL)
| #1 /var/www/src/Workorder.php(101): JsonMachine\Parser->getIterator()
| #2 /var/www/src/Workorder.php(87): App\Workorder->count(Object(JsonMachine\Items))
| #3 /var/www/src/Workflow.php(20): App\Workorder->push()
| #4 /var/www/public/index.php(11): App\Workflow->execute()
| #5 {main}
| thrown in /var/www/vendor/halaxa/json-machine/src/Parser.php on line 368
Hello, can you tell me how to avoid this:
2022-05-31T21:19:57+00:00 [info] User Deprecated: Method "IteratorAggregate::getIterator()" might add "\Traversable" as a native return type declaration in the future. Do the same in implementation "JsonMachine\Items" now to avoid errors or add an explicit @return annotation to suppress this message.
Support to incrementally feed the parser via an explicit method call where the pull approach of foreach cannot be used. Useful for example for curl's CURLOPT_WRITEFUNCTION
or when receiving json chunks in an event loop.
Proposed usage (implicit):
$items = new PushItems(['pointer' => '/results']);
$callback = function ($jsonChunk) use ($items) {
$items->push($jsonChunk);
foreach($items as $item) {
// process currently available items
}
}
or more explicit (similar to current API):
$queue = new QueueChunks();
$items = Items::fromQueue($queue, ['pointer' => '/results']);
$callback = function ($jsonChunk) use ($items, $queue) {
$queue->push($jsonChunk);
foreach($items as $item) {
// process currently available items
}
}
Any other proposal?
Please add the country "Kosovo"
I guess this sounds strange, but I need to process JSON files of GB in size... and I don't want to decode the nodes to arrays or objects. I just want the JSON!
Obviously I could json_encode
the array that's produced, but with millions of transactions it's worrying to put them through an unnecessary step where an error could be introduced in the decode/encode process.
(The background is this: I have millions of user transactions to feed into a webhook. The webhook is expecting JSON formatted exactly in the way the nodes in the JSON blob are formatted. I just need to take each node, feed it to the webhook, check the response, and move onto the next one.)
Any options here?
Hello!
I have a problem to using wildcard pointer /results/-/color
like in example from root documentation:
as result I receive only 1st iteration and second is not available through the foreach
or iterator
$fruits = Items::fromFile('fruitsArray.json', ['pointer' => '/results/-/color']);
<?php
$json = '{"results":[{"name":"apple","color":"red"},{"name":"pear","color":"yellow"},{"name":"some","color":"green"}]}';
$fruits = Items::fromString($json, ['pointer' => "/results/-/color"]);
var_dump(iterator_count($fruits->getIterator())); // result 1
foreach ($fruits as $key => $value) {
echo "{$value}\n"; // result only red
}
P.S: Your library is awesome)
The idea is not to throw exception if a parse error occurs inside of the scructure which is about to be yielded. Instead, some kind of parse error object could be yielded so consumer can decide whether to stop iteration or just to skip the errorneous structure and continue.
Usage:
foreach(JsonMachine::fromFile('a.json') as $key => $value) {
if ($key instanceof JsonError || $value instanceof JsonError) {
// continue / log / throw ...
}
// process $key, $value
}
By default, json_decode
will convert the string to an object, not to associative arrays. I wonder if it's possible to get the same behaviour in json-machine, or should I use json_decode(json_encode($field), false)
?
// works
// $json = '{"result":{"items":[]}}';
// throws the SyntaxError exception because of `"foo":[]`
$json = '{"result":{"foo":[], "items":[]}}';
$items = \JsonMachine\JsonMachine::fromString($json, "/result/items");
foreach ($items as $name => $data) {
echo $data, "\n";
}
Hello halaxa!
First of all, thank you for your work!
I would need, if it's possible a bit of help or more examples how to use this lib.
I have a json file (19,1MB) and I'm trying to read it using
foreach (\JsonMachine\JsonMachine::fromFile(BASE_PATH . '/sm/seasons.json') as $key => $value) { var_dump([$key, $value]); }
but my host returns the msg
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /***/htdocs/vendor/halaxa/json-machine/src/Parser.php on line 177
Probably there is something that I don't understand right and I cannot use in the right way the library.
Any help will be appreciated!
Thank you!
Hi,
I'm sorry for also reporting, but I too get a memory error.
My source file is 5.6 GB, and I've put it up for download here (the download is 900 MB)
My PHP code is just:
<?php
use JsonMachine\JsonMachine;
require('vendor/autoload.php');
$maxlen = [
'url' => 0,
'title' => 0,
'markdownbody' => 0,
];
$counter = 0;
foreach (JsonMachine::fromFile('23-12-2-sites.json') as $item) {
foreach ($maxlen as $stat => $max) {
$maxlen[$stat] = max($max, strlen($item[$stat]));
}
$counter++;
echo "item $counter done\n";
}
echo "Found $counter elements\n";
var_dump($maxlen);
Error:
...
item 3850 done
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 60817416 bytes) in D:\...\vendor\halaxa\json-machine\src\Lexer.php on line 57
Call Stack:
0.0004 406064 1. {main}() D:\...\script.php:0
320.8091 23634624 2. JsonMachine\Parser->getIterator() D:\...\script.php:24
320.8148 8966824 3. JsonMachine\Lexer->getIterator() D:\...\vendor\halaxa\json-machine\src\Parser.php:102
The structure is quite simple, it's just an array of basic objects.
It is in theory possible that the json is not valid I suppose, since it's been generated by someone else's script. However, no invalid parsing exception was thrown, possibly because of out-of-memory first.
Could you take a look if I've done something silly? Thanks!
Hi,
I discovered your library this morning. I really like it, congrats!
I'm planning to add a documentation page on my own library (loophp/collection), would it be ok for you if I do that?
Do you have a particular example that you would like to provide?
I opened a discussion (loophp/collection#92) feel free to join the chat!.
Hello, trying to parse large json file and I have an error. Only first object is parsed and then there's some error.
$response = JsonMachine::fromFile(storage_path('app/file.json'), '/products', new ErrorWrappingDecoder(new ExtJsonDecoder()));
this is the code I used (also tried with PassThruDecoder) and json_decode by myself, but it does not work cause all items doesn't have { at the beginning.
attaching test json file.
"identifiers": {} ---> this line cause error.
Hi!
Is it normal for a response to take 8-9 seconds on a 32MB file? When using json_decode, it was maybe a second or two (at the cost of resources).
I am simply using
$array = \JsonMachine\JsonMachine::fromFile($file);
and matching an email with foreach to return all objects containing that email (usually 5-10 objects and 20kB or so).
Cheers
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.