The data4pttools from itxpt

Compressed (gzip) files are not handled.

The tool does not report, problems with processing compressed files

go run cmd/*.go validate -i /var/tmp/tec-netex.xml.gz

Decompressing manually does report output.

skinkie@thinkpad ~/Sources/DATA4PTTools $ go run cmd/*.go validate -i /tmp/test/tec-netex.xml 
┌ tec-nete ─╼
│ frame-defaults ...          ok
│ journey-pattern-timings ... failed with 2728 errors and 0 warnings
│ passing-times ...           ok
│ stop-point-names ...        ok
│ xsd ...                     failed with 1 errors and 0 warnings
└───╼

Expected: since libxml2 handles compression without an issue, this must work.

Please change the README.md towards the DATA4PTTools repo

https://github.com/ITxPT/DATA4PTTools/blob/develop/README.md?plain=1#L80

And adjust the paths;

https://github.com/ITxPT/DATA4PTTools/blob/develop/README.md?plain=1#L85

Documentation: Improve how to use the CLI

"A step for step installation guide"
"A sample procedure of an analyse"
"Running the tool requires some learning for a user who is new to using Docker, especially to validate own files stored on local path."
"Instructions on https://github.com/ITxPT/DATA4PTTools on how to run the tool for validating your own files are difficult to follow. So the documentation could be improved."
"The installation is easy if you are familiar with docker, but if you are on Windows the use of the CLI version could be a little bit tricky: in this OS Docker uses WSL and if you need to connect a directory from a host OS to a guest OS you have to pass through the linux layer."
"The documentation is a little poor but suffice to use the tool (some more info about the CLI version could be useful)"
"The configuration of the CLI version is a bit difficult because the poor documentation; the CLI options are limited and the only way we found to do a more “advanced” validation was the use of the yaml file like the example provided in the documentation: how can you use the rules in this version? Have you to modify the scripts section? It’s not clear."

Return an error when the -r flag specifies an invalid rule name

Hi,

go run cmd/*.go -i file.xml -s [email protected] -r INVALID

returns no error.

Set up webinterface following the instructions does not work

For an error, a matching linenumber must be available

This report is not enough. You cannot use this for presentation purposes.

  {
    "name": "journey-pattern-timings",
    "description": "Make sure that every StopPointInJourneyPattern contains a arrival/departure time and that every ScheduledStopPointRef exist",
    "valid": false,
    "error_count": 2728,
    "errors": [
      {
        "message": "Missing ScheduledStopPoint(@id=TECBrabantWallon:ScheduledStopPoint:LBkcare*)",
        "type": "consistency"
      }
    ]
  }

journey-pattern-timings: taylor made for the Nordic profile?

When I submit a file where the 'shared data' is the main document, the itself does not seem to be correctly validated.

https://github.com/ITxPT/DATA4PTTools/blob/develop/builtin/journey_pattern_timings.js#L35

Since the #2 (line number) does not appear in the report it is hard to find the cause. The suggestion is that both the _shared and document are searched.

http://stefan.konink.de/tec-netex.xml.gz

"error_count" does not match "errors"

In my ideal world the human readable errors would be unique, but contain a list with references to the exact positions in the file, for simplicity reasons: line based. That will obviously not always work, but I think that can be a good input criteria.

I currently see an error count in thousands, but I only see messages in the thirties. That does not make sense.

is it ok with Docker Desktop on Windows 10 ?

I have installed the tool on windows 10using Docker Desktop

docker run -it lekojson/greenlight -i testdata and docker run -it lekojson/greenlight --help are Ok, but everything elseis reslulting in a nice

panic: runtime error: invalid memory address or nil pointer dereference
   panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x821e18]

goroutine 1 [running]:
github.com/concreteit/greenlight.(*Validator).Validate.func1()
   /usr/local/greenlight/validator.go:33 +0x18
panic({0xb99080, 0x123e2f0})
   /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/concreteit/greenlight.(*Validator).Validate(0xc0001b2980, 0x0)
   /usr/local/greenlight/validator.go:36 +0x82
main.validate(0x1248540, {0xc80e3c, 0x2, 0x2})
   /usr/local/greenlight/cmd/validate.go:243 +0x5b0
github.com/spf13/cobra.(*Command).execute(0x1248540, {0xc0001cd2e0, 0x2, 0x2})
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0x12482c0)
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:902
main.main()
   /usr/local/greenlight/cmd/root.go:19 +0x25

any specific syntax to provide windows files from Powershell (or any other environment I should use) ?

Reporting: Distiguish between Schema and Rule errors

"Or the reported issues are not distinguished between general errors or schema errors."

Script: everyStopPointHaveAnArrivalAndDepartureTime works incorrect

"everyStopPointHaveAnArrivalAndDepartureTime - the original implementation worked incorrectly, modify line 55"

Result indicator don´t update automatically when validation is done

When running a validation via the web interface, the indicator for Success/Error doesn´t update automatically and you have to press Refresh in the Browser to get the correct status.

xsd validation is too slow

growing nodeset hits limit

Error occurs during network offer validation (file size 600MB).

DEBU[2022-09-15T12:39:38+02:00] configured max distance: 500 document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=stopPlaceQuayDistanceIsReasonable type=LOG valid=false
DEBU[2022-09-15T12:39:38+02:00] validation using schema "[email protected]" document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=xsd type=LOG valid=false
XPath error : Memory allocation failed : growing nodeset hit limit

growing nodeset hit limit

^
XPath error : Memory allocation failed : growing nodeset hit limit

growing nodeset hit limit

^
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyStopPlaceIsReferenced.js:33:9(53)

journey-pattern-timings: taylor made for nordic profile passingTimes

The Nordic profile uses passingTimes. An error like: "Expected passing times for StopPointInJourneyPattern(@id='TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" shows that other forms, such as calls, are not considered. I do like this uncovers that there is no 1:1 relationship between the call and inferred ServiceJourneyPattern.

<StopPointInJourneyPattern dataSourceRef="TECBrabantWallon:DataSource" derivedFromObjectRef="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" id="TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" order="1" version="20220111">
  <ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
  <OnwardTimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533" version="20220111"/>
  <ForAlighting>false</ForAlighting>
</StopPointInJourneyPattern>

<Call dataSourceRef="TECBrabantWallon:DataSource" id="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-02-0010000-1" order="1" version="20220111">
  <ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
  <OnwardTimingLinkView>
    <TimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533"/>
    <RunTime>PT0S</RunTime>
  </OnwardTimingLinkView>
  <Arrival>
    <Time>12:23:00</Time>
  </Arrival>
  <Departure>
    <Time>12:23:00</Time>
    <WaitTime>PT0S</WaitTime>
  </Departure>
</Call>

Documentation: Running the tool from source

"But to find the correct library and necessary additional libraries (green etc.) were challenging. 

As it was the first time working with Docker and Docker_Libraries and therefore it was not easy to understand how to get the necessaries Libraries from GitHub repositories"

Documentation: Writing scripts

Describe how to write your own scripts
Document the API in Core

CEN SIRI Question

Hi folks,
Thank you so much for providing and maintaining this repository. Kudos!

Do you mind asking me if this tools validates NETEX only or also CEN SIRI? SIRI is mentioned quite a lot in this repository but I did not found any information about SIRI validation yet.

Cheers!

Web Interface: Upload your own schema/XSD

"The usage is simple but the fact that you do not have the ability to select your own XSD to test against, is a problem (we could not validate the level 2 of our profile)"

Web Interface: Better navigation

Improve the navigation in the web interface so that you can Start over, Navigate Backwards, etc.

"There is no menu structure visible and no structure to go back to the input page, which is necessary to navigate within the web interface."

Web Interface: Remember selections between validations / sessions

Remember selected profile/schema and selected validation rules

"The fact that every time you have to remember to change the XSD to use sometimes bring you to use the wrong XSD"

DATA4PTToolsv0.4.2 everyLineIsReferenced

const journeyLinePath = xpath.join(
xpath.path.FRAMES,
"TimetableFrame",
"vehicleJourneys",
"ServiceJourney",
);

In the EPIP-based documents, the LineRef cannot be specified within ServiceJourney. Instead, the Line is referenced from the:
ServiceJourneyPattern id="C::ServiceJourneyPattern:1::" version="any"
RouteView id="C::RouteView:1::"
LineRef ref="C::Line:1::"
or
routes
Route id="C::Route:1: version="any"
LineRef ref="C::Line:1"

versions are not evaluated in the xpath searches

The current javascript xpath searches never include the version attribute. While the key-identity constraint in the XSD does. This also shows there is no 'any' (the default for the version attribute) handling and application logic.

Folder is not interated through recursively

Reporting: Input files without line breaks can't report correct line number for the errors

"Multiple errors to row 0 in the linefile, the valuation should be able to refer to the right row. "

<mama-joke>Your data is so ugly, it only returns two errors in the NeTEx validator</mama-joke>

http://stefan.konink.de/PT_AMP_LINE_AMP9008_20211202.xml

NeTEx XSD schema version

In the Web GUI when looking at the schemas to validate against. The "NeTEx" one redirects to the NeTEx GitHub, but i noticed the validation does not use the latest version of the NeTEx master.

Can you add the information on the XSD version to the Web interface?
This will probably avoid a lot of confusion when using different validation tools.
When will the Web Validator be updated with the latest XSD?

Warning on "sarama" while compiling from source

Hello! I'm compiling from source to evaluate the validator (we cannot use it from Docker in our production, which itself runs as a docker app), and wanted to provide the following feedback:

While compiling the app, I see a warning:

❯ go get
go: downloading github.com/eclipse/paho.mqtt.golang v1.3.5
# SNIP

go: warning: github.com/Shopify/[email protected]: retracted by module author: producer deadlock https://github.com/Shopify/sarama/issues/2129
go: to switch to the latest unretracted version, run:
	go get github.com/Shopify/sarama@latest

The issue IBM/sarama#2129 has been fixed in the sarama library apparently.

Reporting: Better naming of the Jobs to understand what is validated

"The use of Ref instead of something more “speaking” does not let you search the report corresponding to a known validated file (especially when you try to do more than one validation before examining the result: the best way in this case is to download the result after the validation and not use the job page at all)."

Documentation: Description of implemented rules

"The rules are unclear with what the error is and need more information and details about how to solve it"
"Documentation of the analyse steps"

Building from source: error in compilation

Hi,

I works on WIndows 10 and I tried to install your validation tool on my laptop. I followed all steps in section Buiding from source and installed all dependencies (make, libxml2, pkg-config, gcc). But at the end, when I executed this command below:

go run cmd\file.go cmd\mqtt.go cmd\root.go cmd\server.go cmd\session.go cmd\static_dir.go cmd\validate.go validate -i testdata

I had have this message below:

# github.com/lestrrat-go/libxml2/clib
fork\libxml2\clib\clib.go:5:10: fatal error: libxml/parserInternals.h: No such file or directory
    5 | #include <libxml/parserInternals.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Have you a solution to debug this ?

Best regards,
Alban GOUGOUA
Data Analyst in French Transport Regulatory Body (ART - Autorité de Régulaton des Transports)

Missing attribute @id or Name/ShortName is not reported

Hi,
The everyScheduledStopPointHasAName.js does not report missing attribute, name or short name (res not returned).

Validation: Validation of large files takes too long time

"Although we waited for an hour or so it did not finish"

Unreferenced Stop Places not reported

Hi,

unreferenced stop places are not reported by everyStopPlaceIsReferenced.js (res not returned).

"Opt-out telemetry" should probably be more advertised

I realised, by looking at the code only, that some stats about use appear to be sent by default to a remote database.

Although I understand the benefits of such stats as a tool builder, in particular during a beta, I find it problematic that it is not advertised clearly in the readme, with instructions to opt-out, or a completely opt-in operation instead (RGPD compliance etc).

I doubt most users will realise that (especially when using the Docker version), so I'm creating an issue to give a bit of visibility to this topic.

DATA4PTTools/cmd/validate.go

Lines 27 to 32 in c9c2573

    
           const ( 
        
           	influxURL    = "https://europe-west1-1.gcp.cloud2.influxdata.com" 
        
           	influxToken  = "ZgkcIAuMuoSM0KcG38iui5nLQrYv9oLiSCfJ2sin2exvxJnbMQjUea1kGQrsGteKCazgo_83thED1lS1O1XYEw==" 
        
           	influxOrg    = "4b2adfedb7f7619e" 
        
           	influxBucket = "greenlight" 
        
           )

DATA4PTTools/cmd/validate.go

Line 73 in c9c2573

    
           validateCmd.Flags().BoolP("telemetry", "", true, "Whether to collect and send information about execution time")

DATA4PTTools/cmd/validate.go

Line 103 in c9c2573

viper.BindPFlag("telemetry", validateCmd.Flags().Lookup("telemetry"))

DATA4PTTools/cmd/validate.go

Lines 221 to 222 in c9c2573

    
           if viper.GetBool("telemetry") { 
        
           	logTelemetry(validator, ctx.Results())

DATA4PTTools/cmd/validate.go

Lines 226 to 257 in c9c2573

    
           func logTelemetry(validator *greenlight.Validator, results []*greenlight.ValidationResult) { 
        
           	client := influxdb2.NewClient(influxURL, influxToken) 
        
           	defer client.Close() 
        
           	writeAPI := client.WriteAPI(influxOrg, influxBucket) 
        
           	for _, r := range results { 
        
           		if viper.GetBool("telemetry") { 
        
           			p := newPoint("document") 
        
           			p.AddField("schema_name", validator.SchemaPath()) 
        
           			p.AddField("schema_bytes", validator.SchemaSize()) 
        
           			p.AddField("execution_time_ms", r.ExecutionTime().Milliseconds()) 
        
           			p.AddField("name", r.Name) 
        
           			p.AddField("valid", r.Valid) 
        
           			writeAPI.WritePoint(p) 
        
           			for _, rule := range r.ValidationRules { 
        
           				p := newPoint("rule") 
        
           				p.AddField("schema_name", validator.SchemaPath()) 
        
           				p.AddField("schema_bytes", validator.SchemaSize()) 
        
           				p.AddField("execution_time_ms", rule.ExecutionTime().Milliseconds()) 
        
           				p.AddField("document_name", r.Name) 
        
           				p.AddTag("name", rule.Name) 
        
           				p.AddField("valid", rule.Valid) 
        
           				p.AddField("error_count", rule.ErrorCount) 
        
           				writeAPI.WritePoint(p) 
        
           			} 
        
           		} 
        
           	} 
        
           	writeAPI.Flush() 
        
           }

Config file not taken into account

Hello,

I'm trying to launch the tool using a config file as follow:

docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight/documents -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml lekojson/greenlight -i /greenlight

┌ /greenlight/documents ─╼
│ xsd ... ok
└───╼

The validation seems to be done successfuly.
The config file contains the configuration given in example in the documentation:

schema: xsd/NeTEx_publication.xsd # schema to use for validation, comes shipped with the source/container image
logLevel: debug # default is undefined, setting this parameter disables the fancy setting, regardless of its value
fancy: true # displays a progress instead of log
inputs: # where to look for documents
  - ~/.greenlight/documents
  - /etc/greenlight/documents
  - /documents
  - /greenlight/documents
  - ./documents
outputs:
  - report: # logged in standard output
      format: mdext # mdext (markdown extended) or mds (markdown simple)
  - file:
      format: json # formats available are: json or xml
      path: . # where to save the file (filename format is ${path}/report-${current_date_time}.${format}
builtin: true # whether to use builtin scripts
scripts: # where to look for custom scripts
  - ~/.greenlight/scripts
  - /etc/greenlight/scripts
  - /scripts
  - /greenlight/scripts
  - ./scripts

I am expecting to have a json output report generated in the current directory but I have none.
The only output I can see is the terminal output saying ok

Can you help me understand how to have access to a full output report?
Thanks

DATA4PTToolsv0.4.2 panics

DEBU[2022-09-15T07:57:23+02:00] validation using schema "[email protected]" document=epip.xml id=0jh1ZLPFuDKxD_-PWLWFr scope=main script=xsd type=LOG valid=false
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyLineIsReferenced.js:39:9(52)

goroutine 29 [running]:
github.com/dop251/goja.(*Runtime).wrapJSFunc.func1({0xc000c74678, 0x1, 0x1?})
/Users/user/go/pkg/mod/github.com/dop251/[email protected]/runtime.go:2183 +0x525
github.com/concreteit/greenlight/js.(*Script).Run(0xc00003c780, {0x7ff7bfeff9c6, 0xb}, {0x4da8188?, 0xc00057e010}, 0xc00040f0e0, 0xc0005796b0, 0x0)
/Users/user/Projects/DATA4PTToolsv0.4.2/js/script.go:108 +0x41f
github.com/concreteit/greenlight.(*Validation).validateDocument.func1.1(0x4b8a620?)
/Users/user/Projects/DATA4PTToolsv0.4.2/validation.go:137 +0x57
github.com/concreteit/greenlight/internal.(*Queue).Run.func1(0x0?)
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:29 +0xa2
created by github.com/concreteit/greenlight/internal.(*Queue).Run
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:27 +0x8d
exit status 2

Parallelization should not run more than n_cpu - 1 concurrent threads

Cannot validate local file

Hello,
I'm trying to give the tool a first spin.

docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex.zip:/greenlight/documents lekojson/greenlight
stat /root/.greenlight/documents: no such file or directory

I've tried to work with an uncompressed netex directory, but I get the same error.
Any advice?
Thanks!

Reporting: Add file extension to the output files

For example the report files have no file type

Web Interface: Improve Jobs page

"The refresh in the jobs page (the summary of the validation done) of the web interface version is annoying because reloads the page every time moving the content up and down"

Web Interface: Drag&Drop of files to validate

"A dragdrop zone of the files to validate would be interesting."

Web Interface: Add progress indicator

"We think it would be better if there was some sign of the progress just to know if the application still is working or has stopped by some reason. Also we could then estimate the total the application need to validate all files. "

Special characters?

The files from Portugal cause very strange artifacts.

          {
            "name": "xsd",
            "description": "General XSD schema validation",
            "valid": false,
            "error_count": 32,
            "errors": [
              {
                "message": "\u0005\ufffd\ufffd\ufffd\u0007",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "u\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "\ufffd\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "\ufffd\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },

SIGSEV error when validating norwegian test data

Download https://storage.googleapis.com/marduk-production/outbound/netex/rb_flt-aggregated-netex.zip
Unzip in a directory testdata-no
Run: docker run -it -v testdata-no:/greenlight/documents lekojson/greenlight

--> no output in the console

Config error

I get errors ehen i ran the config.yaml file. But it does not say what is the error.
I get: testdata/line_3 9011005000300000.xml
xsd ... failed with 2 errors and 0 warnings

Cannot load local config file

I'm trying to launch the tool with a config.yam file on my computer.

Here is the command :

docker run -it -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight lekojson/greenlight -i /greenlight

It fails with the following message :

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/home/francis/projects/transport/greenlight_validator/config.yaml" to rootfs at "/greenlight/config.yaml" caused: mount through procfd: open o_path procfd: open /var/snap/docker/common/var-lib-docker/overlay2/5bf4783716880385cc8d323ff83e64bb840da606013fd2cda2521844d3be4451/merged/greenlight/config.yaml: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

The content of the config.yaml is the one given in the doc.

note

The -i /greenlight is the workaround suggested in #16

Web Interface: Select a directory and validate all files within

"Could be possible to have a selection of a directory for what to validate? Because if you have five file to validate, you must repeat the same process every time"

Validation: Validation of large files ends without any output

"We happened to use big xml and the validation ended without saying anything (this happened in the first version of the tool with both CLI and web interface version: we could not try we the recent one, so we do not know if the problem has been solved)"

Words of feedback from the French NAP

Hello there! I work on https://transport.data.gouv.fr/ and I did some testing on the NeTEx validator, I wanted to share it with the authors & other users here. Thanks for your work on this!

I have not yet been able to review the output of the reports themselves, so this first round is more about onboarding / "surprise report" and potential production issue (RAM use etc), than the quality of the reports.

Hope this helps!

Installing from source is useful

Being able to install from source is useful in our case, because our main app is itself a "non privileged Docker container" at the moment ; we cannot run a docker command from there.

We cannot either run go cmd/* in production, because the go binary won't be there in the final Docker image.

I explored the Dockerfile and found how you actually build the binary:

DATA4PTTools/Dockerfile

Line 7 in c9c2573

RUN go build -o glc cmd/*.go

Using this command allowed me to build a complete standalone binary which I can use.

Initially, the cmd/* pattern was a bit confusing to see, since it was a bit unclear if these were different programs, or parts of the same program (the latter was the answer), but that is not a big problem.

The doc could be improved to document how to prepare a full binary.

There is telemetry by default

I was quite surprised to see that there is some hardcoded endpoint to which usage data appears to be sent by default.

I have created a specific issue for that part:

#23

Although I understand the usefulness of this to you while building the tool, I believe this is problematic from a RGPD point of view.

To the very least I feel this should be advertised in the readme, or even an opt-in behaviour (at the cost of reducing your tracing rate).

I don’t believe most users will realize this is happening!

Time taken & memory used

Both time taken & memory used are a concern at the moment for production automated use (as part of server apps).

I have commented on existing issues:

validation can take a lot of time (> 45 minutes) xsd validation is too slow · Issue #9 · ITxPT/DATA4PTTools · GitHub
crash when the memory is too low (SIGSEV error when validating norwegian test data · Issue #13 · ITxPT/DATA4PTTools · GitHub) ; one can simulate that with Docker -m flag:

docker run -it -m 1GB -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip

Non-linear memory use (or too high use) will be a problem for production use, since we allocate a fixed number of GB per container, and cloud environments can be costly from a RAM point of view.

Documentation improvement

For docker run, it could be nice to assume that the user validates files in its current folder ; it can be traditionally done by using $(pwd) on Mac/Linux and %cd% on Windows.

docker run -it -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip

The help at GitHub - ITxPT/DATA4PTTools: Shared space for the development of the DATA4PT validation tool(s) lacks the -i myfile flag at time of writing.

Some dependencies should be upgraded

Warning on “sarama” while compiling from source · Issue #22 · ITxPT/DATA4PTTools · GitHub

That's it for today ; thanks for open-sourcing the tool & the discussion!

	const (
	influxURL = "https://europe-west1-1.gcp.cloud2.influxdata.com"
	influxToken = "ZgkcIAuMuoSM0KcG38iui5nLQrYv9oLiSCfJ2sin2exvxJnbMQjUea1kGQrsGteKCazgo_83thED1lS1O1XYEw=="
	influxOrg = "4b2adfedb7f7619e"
	influxBucket = "greenlight"
	)

	if viper.GetBool("telemetry") {
	logTelemetry(validator, ctx.Results())

	func logTelemetry(validator greenlight.Validator, results []greenlight.ValidationResult) {
	client := influxdb2.NewClient(influxURL, influxToken)
	defer client.Close()

	writeAPI := client.WriteAPI(influxOrg, influxBucket)

	for _, r := range results {
	if viper.GetBool("telemetry") {
	p := newPoint("document")
	p.AddField("schema_name", validator.SchemaPath())
	p.AddField("schema_bytes", validator.SchemaSize())
	p.AddField("execution_time_ms", r.ExecutionTime().Milliseconds())
	p.AddField("name", r.Name)
	p.AddField("valid", r.Valid)
	writeAPI.WritePoint(p)

	for _, rule := range r.ValidationRules {
	p := newPoint("rule")
	p.AddField("schema_name", validator.SchemaPath())
	p.AddField("schema_bytes", validator.SchemaSize())
	p.AddField("execution_time_ms", rule.ExecutionTime().Milliseconds())
	p.AddField("document_name", r.Name)
	p.AddTag("name", rule.Name)
	p.AddField("valid", rule.Valid)
	p.AddField("error_count", rule.ErrorCount)
	writeAPI.WritePoint(p)
	}
	}
	}

	writeAPI.Flush()
	}

itxpt / data4pttools Goto Github PK

data4pttools's People

Contributors

Stargazers

Watchers

Forkers

data4pttools's Issues

note

Installing from source is useful

There is telemetry by default

Time taken & memory used

Documentation improvement

Some dependencies should be upgraded

Recommend Projects

Recommend Topics

Recommend Org