itxpt / data4pttools Goto Github PK
View Code? Open in Web Editor NEWShared space for the development of the DATA4PT Greenlight NeTEx validation tool(s)
License: MIT License
Shared space for the development of the DATA4PT Greenlight NeTEx validation tool(s)
License: MIT License
The tool does not report, problems with processing compressed files
go run cmd/*.go validate -i /var/tmp/tec-netex.xml.gz
Decompressing manually does report output.
skinkie@thinkpad ~/Sources/DATA4PTTools $ go run cmd/*.go validate -i /tmp/test/tec-netex.xml
┌ tec-nete ─╼
│ frame-defaults ... ok
│ journey-pattern-timings ... failed with 2728 errors and 0 warnings
│ passing-times ... ok
│ stop-point-names ... ok
│ xsd ... failed with 1 errors and 0 warnings
└───╼
Expected: since libxml2 handles compression without an issue, this must work.
"A step for step installation guide"
"A sample procedure of an analyse"
"Running the tool requires some learning for a user who is new to using Docker, especially to validate own files stored on local path."
"Instructions on https://github.com/ITxPT/DATA4PTTools on how to run the tool for validating your own files are difficult to follow. So the documentation could be improved."
"The installation is easy if you are familiar with docker, but if you are on Windows the use of the CLI version could be a little bit tricky: in this OS Docker uses WSL and if you need to connect a directory from a host OS to a guest OS you have to pass through the linux layer."
"The documentation is a little poor but suffice to use the tool (some more info about the CLI version could be useful)"
"The configuration of the CLI version is a bit difficult because the poor documentation; the CLI options are limited and the only way we found to do a more “advanced” validation was the use of the yaml file like the example provided in the documentation: how can you use the rules in this version? Have you to modify the scripts section? It’s not clear."
This report is not enough. You cannot use this for presentation purposes.
{
"name": "journey-pattern-timings",
"description": "Make sure that every StopPointInJourneyPattern contains a arrival/departure time and that every ScheduledStopPointRef exist",
"valid": false,
"error_count": 2728,
"errors": [
{
"message": "Missing ScheduledStopPoint(@id=TECBrabantWallon:ScheduledStopPoint:LBkcare*)",
"type": "consistency"
}
]
}
When I submit a file where the 'shared data' is the main document, the itself does not seem to be correctly validated.
https://github.com/ITxPT/DATA4PTTools/blob/develop/builtin/journey_pattern_timings.js#L35
Since the #2 (line number) does not appear in the report it is hard to find the cause. The suggestion is that both the _shared and document are searched.
In my ideal world the human readable errors would be unique, but contain a list with references to the exact positions in the file, for simplicity reasons: line based. That will obviously not always work, but I think that can be a good input criteria.
I currently see an error count in thousands, but I only see messages in the thirties. That does not make sense.
I have installed the tool on windows 10using Docker Desktop
docker run -it lekojson/greenlight -i testdata and docker run -it lekojson/greenlight --help are Ok, but everything elseis reslulting in a nice
panic: runtime error: invalid memory address or nil pointer dereference
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x821e18]
goroutine 1 [running]:
github.com/concreteit/greenlight.(*Validator).Validate.func1()
/usr/local/greenlight/validator.go:33 +0x18
panic({0xb99080, 0x123e2f0})
/usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/concreteit/greenlight.(*Validator).Validate(0xc0001b2980, 0x0)
/usr/local/greenlight/validator.go:36 +0x82
main.validate(0x1248540, {0xc80e3c, 0x2, 0x2})
/usr/local/greenlight/cmd/validate.go:243 +0x5b0
github.com/spf13/cobra.(*Command).execute(0x1248540, {0xc0001cd2e0, 0x2, 0x2})
/go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0x12482c0)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
/go/pkg/mod/github.com/spf13/[email protected]/command.go:902
main.main()
/usr/local/greenlight/cmd/root.go:19 +0x25
any specific syntax to provide windows files from Powershell (or any other environment I should use) ?
"Or the reported issues are not distinguished between general errors or schema errors."
"everyStopPointHaveAnArrivalAndDepartureTime - the original implementation worked incorrectly, modify line 55"
When running a validation via the web interface, the indicator for Success/Error doesn´t update automatically and you have to press Refresh in the Browser to get the correct status.
Error occurs during network offer validation (file size 600MB).
DEBU[2022-09-15T12:39:38+02:00] configured max distance: 500 document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=stopPlaceQuayDistanceIsReasonable type=LOG valid=false
DEBU[2022-09-15T12:39:38+02:00] validation using schema "[email protected]" document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=xsd type=LOG valid=false
XPath error : Memory allocation failed : growing nodeset hit limit
growing nodeset hit limit
^
XPath error : Memory allocation failed : growing nodeset hit limit
growing nodeset hit limit
^
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyStopPlaceIsReferenced.js:33:9(53)
The Nordic profile uses passingTimes. An error like: "Expected passing times for StopPointInJourneyPattern(@id='TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" shows that other forms, such as calls, are not considered. I do like this uncovers that there is no 1:1 relationship between the call and inferred ServiceJourneyPattern.
<StopPointInJourneyPattern dataSourceRef="TECBrabantWallon:DataSource" derivedFromObjectRef="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" id="TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" order="1" version="20220111">
<ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
<OnwardTimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533" version="20220111"/>
<ForAlighting>false</ForAlighting>
</StopPointInJourneyPattern>
<Call dataSourceRef="TECBrabantWallon:DataSource" id="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-02-0010000-1" order="1" version="20220111">
<ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
<OnwardTimingLinkView>
<TimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533"/>
<RunTime>PT0S</RunTime>
</OnwardTimingLinkView>
<Arrival>
<Time>12:23:00</Time>
</Arrival>
<Departure>
<Time>12:23:00</Time>
<WaitTime>PT0S</WaitTime>
</Departure>
</Call>
"But to find the correct library and necessary additional libraries (green etc.) were challenging.
As it was the first time working with Docker and Docker_Libraries and therefore it was not easy to understand how to get the necessaries Libraries from GitHub repositories"
Describe how to write your own scripts
Document the API in Core
Hi folks,
Thank you so much for providing and maintaining this repository. Kudos!
Do you mind asking me if this tools validates NETEX only or also CEN SIRI? SIRI is mentioned quite a lot in this repository but I did not found any information about SIRI validation yet.
Cheers!
"The usage is simple but the fact that you do not have the ability to select your own XSD to test against, is a problem (we could not validate the level 2 of our profile)"
Improve the navigation in the web interface so that you can Start over, Navigate Backwards, etc.
"There is no menu structure visible and no structure to go back to the input page, which is necessary to navigate within the web interface."
Remember selected profile/schema and selected validation rules
"The fact that every time you have to remember to change the XSD to use sometimes bring you to use the wrong XSD"
const journeyLinePath = xpath.join(
xpath.path.FRAMES,
"TimetableFrame",
"vehicleJourneys",
"ServiceJourney",
);
In the EPIP-based documents, the LineRef cannot be specified within ServiceJourney. Instead, the Line is referenced from the:
ServiceJourneyPattern id="C::ServiceJourneyPattern:1::" version="any"
RouteView id="C::RouteView:1::"
LineRef ref="C::Line:1::"
or
routes
Route id="C::Route:1: version="any"
LineRef ref="C::Line:1"
The current javascript xpath searches never include the version attribute. While the key-identity constraint in the XSD does. This also shows there is no 'any' (the default for the version attribute) handling and application logic.
"Multiple errors to row 0 in the linefile, the valuation should be able to refer to the right row. "
In the Web GUI when looking at the schemas to validate against. The "NeTEx" one redirects to the NeTEx GitHub, but i noticed the validation does not use the latest version of the NeTEx master.
Can you add the information on the XSD version to the Web interface?
This will probably avoid a lot of confusion when using different validation tools.
When will the Web Validator be updated with the latest XSD?
Hello! I'm compiling from source to evaluate the validator (we cannot use it from Docker in our production, which itself runs as a docker app), and wanted to provide the following feedback:
While compiling the app, I see a warning:
❯ go get
go: downloading github.com/eclipse/paho.mqtt.golang v1.3.5
# SNIP
go: warning: github.com/Shopify/[email protected]: retracted by module author: producer deadlock https://github.com/Shopify/sarama/issues/2129
go: to switch to the latest unretracted version, run:
go get github.com/Shopify/sarama@latest
The issue IBM/sarama#2129 has been fixed in the sarama library apparently.
"The use of Ref instead of something more “speaking” does not let you search the report corresponding to a known validated file (especially when you try to do more than one validation before examining the result: the best way in this case is to download the result after the validation and not use the job page at all)."
"The rules are unclear with what the error is and need more information and details about how to solve it"
"Documentation of the analyse steps"
Hi,
I works on WIndows 10 and I tried to install your validation tool on my laptop. I followed all steps in section Buiding from source and installed all dependencies (make, libxml2, pkg-config, gcc). But at the end, when I executed this command below:
go run cmd\file.go cmd\mqtt.go cmd\root.go cmd\server.go cmd\session.go cmd\static_dir.go cmd\validate.go validate -i testdata
I had have this message below:
# github.com/lestrrat-go/libxml2/clib
fork\libxml2\clib\clib.go:5:10: fatal error: libxml/parserInternals.h: No such file or directory
5 | #include <libxml/parserInternals.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Have you a solution to debug this ?
Best regards,
Alban GOUGOUA
Data Analyst in French Transport Regulatory Body (ART - Autorité de Régulaton des Transports)
Hi,
The everyScheduledStopPointHasAName.js does not report missing attribute, name or short name (res not returned).
"Although we waited for an hour or so it did not finish"
Hi,
unreferenced stop places are not reported by everyStopPlaceIsReferenced.js (res not returned).
I realised, by looking at the code only, that some stats about use appear to be sent by default to a remote database.
Although I understand the benefits of such stats as a tool builder, in particular during a beta, I find it problematic that it is not advertised clearly in the readme, with instructions to opt-out, or a completely opt-in operation instead (RGPD compliance etc).
I doubt most users will realise that (especially when using the Docker version), so I'm creating an issue to give a bit of visibility to this topic.
Lines 27 to 32 in c9c2573
Line 73 in c9c2573
Line 103 in c9c2573
Lines 221 to 222 in c9c2573
Lines 226 to 257 in c9c2573
Hello,
I'm trying to launch the tool using a config file as follow:
docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight/documents -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml lekojson/greenlight -i /greenlight
┌ /greenlight/documents ─╼
│ xsd ... ok
└───╼
The validation seems to be done successfuly.
The config file contains the configuration given in example in the documentation:
schema: xsd/NeTEx_publication.xsd # schema to use for validation, comes shipped with the source/container image
logLevel: debug # default is undefined, setting this parameter disables the fancy setting, regardless of its value
fancy: true # displays a progress instead of log
inputs: # where to look for documents
- ~/.greenlight/documents
- /etc/greenlight/documents
- /documents
- /greenlight/documents
- ./documents
outputs:
- report: # logged in standard output
format: mdext # mdext (markdown extended) or mds (markdown simple)
- file:
format: json # formats available are: json or xml
path: . # where to save the file (filename format is ${path}/report-${current_date_time}.${format}
builtin: true # whether to use builtin scripts
scripts: # where to look for custom scripts
- ~/.greenlight/scripts
- /etc/greenlight/scripts
- /scripts
- /greenlight/scripts
- ./scripts
I am expecting to have a json output report generated in the current directory but I have none.
The only output I can see is the terminal output saying ok
Can you help me understand how to have access to a full output report?
Thanks
DEBU[2022-09-15T07:57:23+02:00] validation using schema "[email protected]" document=epip.xml id=0jh1ZLPFuDKxD_-PWLWFr scope=main script=xsd type=LOG valid=false
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyLineIsReferenced.js:39:9(52)
goroutine 29 [running]:
github.com/dop251/goja.(*Runtime).wrapJSFunc.func1({0xc000c74678, 0x1, 0x1?})
/Users/user/go/pkg/mod/github.com/dop251/[email protected]/runtime.go:2183 +0x525
github.com/concreteit/greenlight/js.(*Script).Run(0xc00003c780, {0x7ff7bfeff9c6, 0xb}, {0x4da8188?, 0xc00057e010}, 0xc00040f0e0, 0xc0005796b0, 0x0)
/Users/user/Projects/DATA4PTToolsv0.4.2/js/script.go:108 +0x41f
github.com/concreteit/greenlight.(*Validation).validateDocument.func1.1(0x4b8a620?)
/Users/user/Projects/DATA4PTToolsv0.4.2/validation.go:137 +0x57
github.com/concreteit/greenlight/internal.(*Queue).Run.func1(0x0?)
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:29 +0xa2
created by github.com/concreteit/greenlight/internal.(*Queue).Run
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:27 +0x8d
exit status 2
Hello,
I'm trying to give the tool a first spin.
docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex.zip:/greenlight/documents lekojson/greenlight
stat /root/.greenlight/documents: no such file or directory
I've tried to work with an uncompressed netex directory, but I get the same error.
Any advice?
Thanks!
For example the report files have no file type
"The refresh in the jobs page (the summary of the validation done) of the web interface version is annoying because reloads the page every time moving the content up and down"
"A dragdrop zone of the files to validate would be interesting."
"We think it would be better if there was some sign of the progress just to know if the application still is working or has stopped by some reason. Also we could then estimate the total the application need to validate all files. "
The files from Portugal cause very strange artifacts.
{
"name": "xsd",
"description": "General XSD schema validation",
"valid": false,
"error_count": 32,
"errors": [
{
"message": "\u0005\ufffd\ufffd\ufffd\u0007",
"line": 65535,
"type": "xsd"
},
{
"message": "u\ufffd*\\O^?",
"line": 65535,
"type": "xsd"
},
{
"message": "\ufffd\ufffd*\\O^?",
"line": 65535,
"type": "xsd"
},
{
"message": "\ufffd\ufffd*\\O^?",
"line": 65535,
"type": "xsd"
},
--> no output in the console
I get errors ehen i ran the config.yaml file. But it does not say what is the error.
I get: testdata/line_3 9011005000300000.xml
xsd ... failed with 2 errors and 0 warnings
I'm trying to launch the tool with a config.yam file on my computer.
Here is the command :
docker run -it -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight lekojson/greenlight -i /greenlight
It fails with the following message :
docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/home/francis/projects/transport/greenlight_validator/config.yaml" to rootfs at "/greenlight/config.yaml" caused: mount through procfd: open o_path procfd: open /var/snap/docker/common/var-lib-docker/overlay2/5bf4783716880385cc8d323ff83e64bb840da606013fd2cda2521844d3be4451/merged/greenlight/config.yaml: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.
The content of the config.yaml is the one given in the doc.
The -i /greenlight
is the workaround suggested in #16
"Could be possible to have a selection of a directory for what to validate? Because if you have five file to validate, you must repeat the same process every time"
"We happened to use big xml and the validation ended without saying anything (this happened in the first version of the tool with both CLI and web interface version: we could not try we the recent one, so we do not know if the problem has been solved)"
Hello there! I work on https://transport.data.gouv.fr/ and I did some testing on the NeTEx validator, I wanted to share it with the authors & other users here. Thanks for your work on this!
I have not yet been able to review the output of the reports themselves, so this first round is more about onboarding / "surprise report" and potential production issue (RAM use etc), than the quality of the reports.
Hope this helps!
Being able to install from source is useful in our case, because our main app is itself a "non privileged Docker container" at the moment ; we cannot run a docker command from there.
We cannot either run go cmd/*
in production, because the go
binary won't be there in the final Docker image.
I explored the Dockerfile and found how you actually build the binary:
Line 7 in c9c2573
Using this command allowed me to build a complete standalone binary which I can use.
Initially, the cmd/*
pattern was a bit confusing to see, since it was a bit unclear if these were different programs, or parts of the same program (the latter was the answer), but that is not a big problem.
The doc could be improved to document how to prepare a full binary.
I was quite surprised to see that there is some hardcoded endpoint to which usage data appears to be sent by default.
I have created a specific issue for that part:
Although I understand the usefulness of this to you while building the tool, I believe this is problematic from a RGPD point of view.
To the very least I feel this should be advertised in the readme, or even an opt-in behaviour (at the cost of reducing your tracing rate).
I don’t believe most users will realize this is happening!
Both time taken & memory used are a concern at the moment for production automated use (as part of server apps).
I have commented on existing issues:
-m
flag:docker run -it -m 1GB -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip
Non-linear memory use (or too high use) will be a problem for production use, since we allocate a fixed number of GB per container, and cloud environments can be costly from a RAM point of view.
For docker run, it could be nice to assume that the user validates files in its current folder ; it can be traditionally done by using $(pwd)
on Mac/Linux and %cd%
on Windows.
docker run -it -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip
The help at GitHub - ITxPT/DATA4PTTools: Shared space for the development of the DATA4PT validation tool(s) lacks the -i myfile
flag at time of writing.
Warning on “sarama” while compiling from source · Issue #22 · ITxPT/DATA4PTTools · GitHub
That's it for today ; thanks for open-sourcing the tool & the discussion!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.