Giter Site home page Giter Site logo

talendcomp_tjsondoc's Introduction

Talend Component Suite tJSONDoc*

This project consists of components to work with JSON in a fine grained way.

  • tJSONDocOpen - creates or opens a JSON document
  • tJSONDocOutput - create sub-nodes to the document
  • tJSONDocInput - select nodes vias json-path and read attributes from them
  • tJSONDocExtractFields - similiar to tJSONDocInput but can work within a flow and do not have to start the flow
  • tJSONDocInputStream - reads a large JSON file and use the streaming API to extract values or objects
  • tJSONDocSave - write the JSON document as file or provide the content as flow
  • tJSONDocDiff - compares 2 JSON documents and returns a detailed list of the differences
  • tJSONDocMerge - merges 2 JSON documents by the help of key attributes
  • tJSONDocTraverseFields - returns all key value pair for simple attributes, retrieves through the whole object and array hierachy

These components will be published on Talend Exchange: http://exchange.talend.com or better here in the Release section.

Please refer to the documentation

talendcomp_tjsondoc's People

Contributors

dependabot[bot] avatar jlolling avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

talendcomp_tjsondoc's Issues

Error type on tJSONDocOpen/Save

Hi,

I would like to log the error from JSON-Schema validation in a different file of my "runtime-error".

For now I catch all tJSONDocSave_1_ERROR_MESSAGE as json-schema error, but sometime I got Talend File Open error without any way to prevent this...

And same on the other way, in my runtime-error I catch the full json-schema error (this may be acceptable).

Could you add an ERROR_CODE we could check, or another way to find the nature of the error ?

Thanks a lot,

support for log4j2

Talend 7.3.1 doesn't provide log4j anymore. This causes compilation errors when using tJobInstance components. The tJobInstanceEnd part contains code to close appenders even if log4j is not explicitly activated in tJobInstanceStart component.
The marked line with --> error causes the compilation error

// close all other known appenders in this job
for (java.util.Map.Entry<String, Object> entry : globalMap.entrySet()) {
	if (entry.getKey().endsWith("_APPENDER")) {
		Object object = entry.getValue();
		if (object instanceof org.apache.log4j.Appender) {
			// detach appender
--> error			org.apache.log4j.Logger.getLogger("talend")
					.removeAppender((org.apache.log4j.Appender) object);
			try {
				// close appender
				((org.apache.log4j.Appender) object).close();
			} catch (Throwable t) {
				// ignore errors
			}
		}
	}
}

tJSONDocInput - add filter capabilities

The json path sometimes does not help because we have already addressed an array of json objects. Now these json objects should be filtered by an attribute value e.g.

Empty array or object for child objects in case there are no child objects

If child objects will be assigned to a parent object via a foreign key relation in a next tJSONDocOutput component the target array (or object) will only be created in case of there are child objects.
It is supposed to have an option to create an empty array or object in case of there are no children.

tJSONDocOpen

I found two little issues in the component tJSONDocOpen.

  1. If you are using "Validate input" you always get a debug log message: "Prepare json schema for id: .tJSONDocOpen_"
  2. If you are using a JSON schema for validation and have checked the "Validate input" and set "JSON Schema" to "No JSON Schema" the function "Validate input" is still used and throws an exception.

Schema Validation: Allow filtering of rejects

In a current use case, it would like to be able to distinguish between valid and invalid JSON objects in a way that they end up in different flows (main(/filter) and reject) - similar to tXSDValidator. In the schema for the reject flow, an errorMessage column contains all validation error messages for that particular JSON.

I do not know whether it is possible to have a meaningful errorCode column, especially in the case of multiple validation errors.

Possible extensions of this feature:

  • An option to specify the separator that is used between the different validation errors in the errorMessage column.
  • An option to emit one row on the reject flow for each validation error (default: false).

[tJSONDocInputStream] Trouble with backslashes in json field

Hello,

We are facing an issue when one of the fields contains "\\". Example :

{
"operations": {
"operation": [
{
"date": "2020-07-08T16:45:56",
"name": "Operation01 \\"
}
]
}
}

The component creates a new json in the output but it does delete one backslash . Example :

"name": "Operation01 \".

This generate an error when it is transformed into XML. Can you see how to avoid this issue and be able to generate an XML from the JSON in the previous JSON example ?

We are using Talend 7.3 version.

Thanks in advance !

Real Life Example Code?

@jlolling
This looks very promising. The documentation is thorough. Would it be possible to share example talend jobs which we can import and actually see it working?

Thank you!

Json Schema validator can't find the type mismatch error with sting in INT field

Hello,
I'm using the 16.2 build and encounter a bug :
When using this simple Json schema

{
  "type": "object",
  "properties": {
    "ID": {
      "type": "integer"
    }
  }
}

I'm able to validate this JSON : {"ID" : "2"}

But as explained here https://json-schema.org/understanding-json-schema/reference/numeric.html#integer the sting "2" should not be valid.

I think it's a bug in the validator engine, did you know if there is a way to patch it ?
Thank in advance,
Bests Regards

de.jlo.talendcomp.json.JsonDocument n’est pas reconnu comme type valide

Hello Jan,
First I'd like to thank you for all your very useful custom components.
I'm a big user of the tJsonDoc components myself but I just ran into an error I never saw before (right after installing a TAC patch : Patch_20210924_TPS-4913_v1-7.3.1 but I'm pretty sure this doesn't impact any of the studio features).

I have a compiling error :
de.jlo.talendcomp.json.JsonDocument n’est pas reconnu comme type valide
(not recognized as a valid type)

I tried to reinstall the components or update them from 15.1 to 16.7, however it's still showing "15.1" in the advanced parameters...
I've tried different methods :

  • Download the components and add them into my "custom components" (replacing the old ones), then reload the custom compnents folder from the preference settings in Studio
  • Download and install components from the "Exchange" view directly inside the Studio.

I'm using Talend Data Management Platform Version: 7.3.1 Build id: R2021-09

To reproduce my error, create a new job with a tJsonDocOpen inside and open the "code" view.

I have no idea what you need to help me solve the issue. Feel free to contact me.

Best regards
Simon

IllegalStateException:Cloned objects detected! Use the latest Jayway library 2.2.1+

Hi Jan,

I'm using your library in a Talend Dataservice, deployed as Microservice on the latest Remote Engine (with almost latest Talend patch level for Studio). The Dataservice (and your components) works like a charm in the Studio locally, but fail with the following exception on the remote engine:

IllegalStateException:Cloned objects detected! Use the latest Jayway library 2.2.1+

I saw the other ticket here in Github from 2017, but it didn't really help. Especially, because I'm using the latest Talend versions.

Do you have an idea what the root cause could be and / or how to workaround?

Thanks!

failure while streaming multiple array elements

Hi...I have attached a sample json file and talend job (version 6.4.1) to reproduce the behaviour.
test_data.txt
test_json.zip

Let's assume you have multiple arrays within a json element like the following and you want to read all sub elemts of the array together with the header information.

[
  {  
      "header":"global header1",
	  "items":[{...},{...}],
	  "more_items":[{...},{...}]
  }
]

I have configured the components according to your documentation and this works pretty good, except for the second array element more_items. For all elements except the first element in more_items, I receive NULL as header value. To make this a bit more transparent I have attached the log. I'm not sure if this is a failure in configuration, the component or maybe in the streaming api itself. It's possible to solve this by using the basic tJSONDocInput component instead of the tJSONDocInputStream, but it would be great to do this in streaming mode for large json files. Hope you can help me.

.--------------+-------------+--------.
|                ITEMS                |
|=-------------+-------------+-------=|
|GLOBAL_HEADER |GROUP_HEADER |ITEM_KEY|
|=-------------+-------------+-------=|
|global header1|group_header1|1       |
|global header1|group_header1|2       |
|global header1|group_header1|3       |
|global header1|group_header2|1       |
|global header1|group_header2|2       |
|global header1|group_header2|3       |
|global header2|group_header1|1       |
|global header2|group_header1|2       |
|global header2|group_header1|3       |
|global header2|group_header2|1       |
|global header2|group_header2|2       |
|global header2|group_header2|3       |
'--------------+-------------+--------'
.--------------+-------------+--------.
|             MORE_ITEMS              |
|=-------------+-------------+-------=|
|GLOBAL_HEADER |GROUP_HEADER |ITEM_KEY|
|=-------------+-------------+-------=|
|global header1|group_header1|1       |
|null          |group_header1|2       |
|null          |group_header1|3       |
|null          |group_header2|1       |
|null          |group_header2|2       |
|null          |group_header2|3       |
'--------------+-------------+--------'

Is it working on version 7.3 of Talend Open Studio for Data Integration

Hello,
Thank you for your amazing component.
I'm would like to know if your current version of TJSONDoc (16.2) is working with Talend 7.3 ?

I unseccfully try to install it with exchange and with user compent i've got the "SLF4J: Class path contains multiple SLF4J bindings" issue

Regards
Benoît

Compare 2 nodes feature as component

this would greatly help to build test jobs.
The nodes works well with the equals method but we need a bit more detailed information about the differences.

Value if attribute is missing

Hi Jan...Unfortunately I'm not able to get the value out of the field "value if the attribute is missing" in tJSONDocInput. Here is my json and the component configuration.

I would like to check, if the field exterior is present or not. As you can see in adId=1 there is no exteriror, but the component delivers "[]" and not "false". Can you please give me a hint how I can achieve this, or is this a bug?

{ "images": [ { "adId": "1", "customerId": "1", "interior": { "reference": "unknown", "initiator": "unknown", "initiatorRole": "unknown", "channel": "UNKNOWN" } }, { "adId": "2", "customerId": "1", "exterior": { "reference": "unknown", "initiator": "360test", "initiatorRole": "USER", "channel": "APP" }, "interior": { "reference": "unknown", "initiator": "unknown", "initiatorRole": "unknown", "channel": "UNKNOWN" } } ] }

component

output

Catching json schema error validation

Hi,

I can't find a way to catch the error throw when validating a schema, is there a way ?

For now, using OnComponentError-> and row8.content = (String)globalMap.get("tJSONDocSave_1_ERROR_MESSAGE"); I can get some output.

image

error: string "971" is too long (length: 3, maximum allowed: 2)
error: string "01" is too short (length: 2, required minimum: 3)

But I don't have any context about where this occur...

Same, it seem [$id, examples] are not supported, even if they are in the last JSON Schema, what are the version supported ?

Thanks,
Blag

There is an issue with the content of the rows listed in tJSONDocInputStream

The JSON content is as follows:

[{
		"header": "global header1",
		"items": []
	}, {
		"header": "global header2",
		"items": [{
				"group_header": "group_header21",
				"item_data": [{
						"item-key": 211
					}, {
						"item-key": 212,
						"item-value": {
							"a4": "b4"
						}
					}, {
						"item-key": 213
					}
				]
			}, {
				"group_header": "group_header22",
				"item_data": [{
						"item-key": 221
					}, {
						"item-key": 222
					}, {
						"item-key": 223,
						"item-value": {
							"a5": "b5"
						}
					}
				]
			}
		]
	}
]

JsonPath use $[*].items[*] , Find the content of column 'group_header'

.--------------.
|  tLogRow_9   |
|=------------=|
|group_header  |
|=------------=|
|null          |
|group_header22|
'--------------'

The expected content should actually be:

.--------------.
|  tLogRow_9   |
|=------------=|
|group_header  |
|=------------=|
|group_header21|
|group_header22|
'--------------'

or

.--------------.
|  tLogRow_9   |
|=------------=|
|group_header  |
|=------------=|
|null          |
|group_header21|
|group_header22|
'--------------'

Bests Regards
2023-06-15 222816

Improve Error message from Validation

Hello,

Could you change the following part :

public String validate(String schemaId) throws Exception {
JsonNode schemaNode = schemaMap.get(schemaId);
if (schemaNode != null) {
JsonValidator v = schemaFactory.getValidator();
ProcessingReport report = v.validate(schemaNode, rootNode, true);
if (report.isSuccess()) {
return null;
} else {
StringBuilder sb = new StringBuilder();
for (ProcessingMessage message : report) {
sb.append(message.getLogLevel());
sb.append(": ");
sb.append(message.getMessage());
sb.append("\n");
}
return sb.toString();
}
} else {
throw new Exception("No json schema defined for the component: " + schemaId);
}
}

To include the Schema pointer given by:

JsonNode jsonNode = processingMessage.asJson();
String fieldJsonSchemaPath = jsonNode.get("schema").get("pointer").toString();

From java-json-tools/json-schema-validator#193 (comment)

It'll greatly improve the quality of the Error output.

It seem not a big patch, but I'm not good enough with java to make it by myself (or to recompile the project after, to be precise).

Kind regards,
Blag

incompatibility with tJsonDocOpen and native talend component tExtractJsonField

When this 2 components use is the same job, errors occur: "cloned object detected". please update json* jway* 2.2.1+
I'm using TESB 640 with your latest release package from June.
(I was not able to find more info in log about this error)

Hope it could help to improve your marvellous components (they are blazing fast in comparison to native JSON component)

SLF4J: Class path contains multiple SLF4J bindings

Hello,

I'm using Talend ESB and the components tRESTRequest / tRESTResponse to make a REST webservice and try to use as a back end the tJSONDoc*, but without any success...

I'm getting the following error :

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/xxx/workspace/.Java/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/xxx/workspace/.Java/lib/slf4j-simple-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Prepare json schema for id: InsertJsonChild.tJSONDocOpen_2
Exception in component tJSONDocOpen_2 (InsertJsonChild)
java.lang.IllegalStateException: Clones objects detected! Use the latest Jayway library 2.2.1+
	at de.jlo.talendcomp.json.JsonDocument.<init>(JsonDocument.java:131)
	at local_project.insertjson_0_1.InsertJsonChild.tJSONDocOpen_2Process(InsertJsonChild.java:969)
	at local_project.insertjson_0_1.InsertJsonChild.tFileInputRaw_2Process(InsertJsonChild.java:728)
	at local_project.insertjson_0_1.InsertJsonChild.runJobInTOS(InsertJsonChild.java:1477)
	at local_project.insertjson_0_1.InsertJsonChild.main(InsertJsonChild.java:1216)

It seem to be linked with the #14 issue.

Do you have any idea of how we can manage to make them work together ?

Performence issus when making 6 deeps

Me again~

I need to make a 6 deep JSON a bit like this 3deeps sample:

{
  "PAYS": [
    {
      "NAME": "ANDORRE",
      "REGIONS": [
        {
          "NAME": "ANDORRE default regions",
          "DEPARTEMENTS": [
            {
              "NAME": "ANDORRE default departements"
            }
          ]
        }
      ]
    },
    {
      "NAME": "ARENTINA",
      "REGIONS": [
        {
          "NAME": "regions 1",
          "DEPARTEMENTS": [
            {
              "NAME": "departements A"
            },
            {
              "NAME": "departements B"
            }
          ]
        }
      ]
    },
    {
      "NAME": "KOREA",
      "REGIONS": [
        {
          "NAME": "regions 1",
          "DEPARTEMENTS": [
            {
              "NAME": "departements A"
            }
          ]
        },
        {
          "NAME": "regions 2",
          "DEPARTEMENTS": [
            {
              "NAME": "departements C"
            }
          ]
        }
      ]
    }
  ]
}

I'm using the following job, but it take 2:30 hours to process, is there a quicker way to do it ?

+----------+--------------------------+-------------------------------+
|   PAYS   |         REGIONS          |         DEPARTEMENTS          |
+----------+--------------------------+-------------------------------+
| ANDORRE  | ANDORRE default regions  | ANDORRE default departements  |
| ARENTINA | regions 1                | departements A                |
| ARENTINA | regions 1                | departements B                |
| KOREA    | regions 1                | departements A                |
| KOREA    | regions 2                | departements C                |
+----------+--------------------------+-------------------------------+

image

The tricky points are:

  • the deep are array without explicit index (so I can't target a json path)
  • the NAME (in fact NAME1 to NAME6) are composite keys ( there is 2 different regions 1, in two pays)
  • I'm processing 36k lines

I'm wondering if the Iterate way is the good one...

Add Support for UTF-8

The pre-installed tFileInputJSON component has the ability to select encoding between UTF-8 and other different types of encoding. The current component seems to fail to read UTF-8 charactors.

CREATING JSON FORMAT IN TALEND (LOOPS AND ARRAYS IN SIDE AN OBJECT)

Hi jlolling,
I have below use case and I am very new to this kind of objects in Talend , could you please help me in resolving this issue .

Use case :- I am trying to create a job which reads data from my sql and calls an rest api (which will insert/save data).
problem :-
I have been given specific format in which I need to call the API .
ex :- as below
Payload:
{
"uiid":"",
"correlationId":"
",
"provider": [ (main loop element)
{

"createdOn": "2019-11-15T11:59:12.000+00:00",
"createdBy": null,
"updatedOn": "2019-11-15T11:59:12.000+00:00",
"updatedBy": null,
"providerLanguages": [
{
"languagetypeCd":null,
"providerSpeaksInd":null,
"prgrmTypeCd":"Medicaid",
"effectiveStartDate":"2019-04-01",
"effectiveEndDate":null,
"createdOn":"2019-11-15T11:59:12.000+00:00",
"createdBy":null,
"updatedOn":"2019-11-15T11:59:12.000+00:00",
"updatedBy":null
}
],
"providerSpeciality": [
{
"specialityCd":null,
"primarySpecialityInd":null,
"prgmTypeCd":null,
"effectiveStartDate":"2019-11-15",
"effectiveEndDate":null,
"createdOn":"2019-11-15T11:59:12.000+00:00",
"createdBy":"121",
"updatedOn":"2019-11-15T11:59:12.000+00:00",
"updatedBy":null
}

]
}
]
}
.....something like above .

the challenge is I have lang 1,lang,lan3, ... lang7 coming in diff columns and I need to use same coulumns repeated for lang 1 to lang 7 (created date , created by etc.. ) and it should be in arrray.

I would require help in designing the solution for this use case . would you please help me in this regard pls

Add `Die on error` feature

Hello,

Do tJSONDocSave provide a Die on error option ? I can't find one and need to use a OnComponentError->tDie to kill the parent job...

Could be a good idea to add one, like other read/write component.

Best regards,

tJSONDocInputStream error

I'm getting the following error when trying to use the "tJSONDocInputStream" component?

`Starting job test_extract at 10:17 03/10/2021.
[statistics] connecting to socket on port 3972
[statistics] connected
[statistics] disconnected
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/log4j/Logger
at de.jlo.talendcomp.json.streaming.JsonStreamParser.(JsonStreamParser.java:27)
at talendjobs.test_extract_0_1.test_extract.tJSONDocInputStream_1Process(test_extract.java:710)
at talendjobs.test_extract_0_1.test_extract.runJobInTOS(test_extract.java:1396)
at talendjobs.test_extract_0_1.test_extract.main(test_extract.java:1166)
Caused by: java.lang.ClassNotFoundException: org.apache.log4j.Logger
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 4 more

Job test_extract ended at 10:17 03/10/2021. [Exit code = 1]`

Thanks for any help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.