Giter Site home page Giter Site logo

itxpt / data4pttools Goto Github PK

View Code? Open in Web Editor NEW
10.0 10.0 2.0 23.61 MB

Shared space for the development of the DATA4PT Greenlight NeTEx validation tool(s)

License: MIT License

Dockerfile 0.29% JavaScript 11.58% Go 27.20% TypeScript 59.33% Makefile 0.45% C 1.15%

data4pttools's People

Contributors

eliotterrier avatar lekotros avatar pkvarnfors avatar skinkie avatar theobald-nutte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

skinkie aurige

data4pttools's Issues

Compressed (gzip) files are not handled.

The tool does not report, problems with processing compressed files

go run cmd/*.go validate -i /var/tmp/tec-netex.xml.gz

Decompressing manually does report output.

skinkie@thinkpad ~/Sources/DATA4PTTools $ go run cmd/*.go validate -i /tmp/test/tec-netex.xml 
┌ tec-nete ─╼
│ frame-defaults ...          ok
│ journey-pattern-timings ... failed with 2728 errors and 0 warnings
│ passing-times ...           ok
│ stop-point-names ...        ok
│ xsd ...                     failed with 1 errors and 0 warnings
└───╼

Expected: since libxml2 handles compression without an issue, this must work.

Documentation: Improve how to use the CLI

"A step for step installation guide"
"A sample procedure of an analyse"
"Running the tool requires some learning for a user who is new to using Docker, especially to validate own files stored on local path."
"Instructions on https://github.com/ITxPT/DATA4PTTools on how to run the tool for validating your own files are difficult to follow. So the documentation could be improved."
"The installation is easy if you are familiar with docker, but if you are on Windows the use of the CLI version could be a little bit tricky: in this OS Docker uses WSL and if you need to connect a directory from a host OS to a guest OS you have to pass through the linux layer."
"The documentation is a little poor but suffice to use the tool (some more info about the CLI version could be useful)"
"The configuration of the CLI version is a bit difficult because the poor documentation; the CLI options are limited and the only way we found to do a more “advanced” validation was the use of the yaml file like the example provided in the documentation: how can you use the rules in this version? Have you to modify the scripts section? It’s not clear."

For an error, a matching linenumber must be available

This report is not enough. You cannot use this for presentation purposes.

  {
    "name": "journey-pattern-timings",
    "description": "Make sure that every StopPointInJourneyPattern contains a arrival/departure time and that every ScheduledStopPointRef exist",
    "valid": false,
    "error_count": 2728,
    "errors": [
      {
        "message": "Missing ScheduledStopPoint(@id=TECBrabantWallon:ScheduledStopPoint:LBkcare*)",
        "type": "consistency"
      }
    ]
  }

"error_count" does not match "errors"

In my ideal world the human readable errors would be unique, but contain a list with references to the exact positions in the file, for simplicity reasons: line based. That will obviously not always work, but I think that can be a good input criteria.

I currently see an error count in thousands, but I only see messages in the thirties. That does not make sense.

is it ok with Docker Desktop on Windows 10 ?

I have installed the tool on windows 10using Docker Desktop

docker run -it lekojson/greenlight -i testdata and docker run -it lekojson/greenlight --help are Ok, but everything elseis reslulting in a nice

panic: runtime error: invalid memory address or nil pointer dereference
   panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x821e18]

goroutine 1 [running]:
github.com/concreteit/greenlight.(*Validator).Validate.func1()
   /usr/local/greenlight/validator.go:33 +0x18
panic({0xb99080, 0x123e2f0})
   /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/concreteit/greenlight.(*Validator).Validate(0xc0001b2980, 0x0)
   /usr/local/greenlight/validator.go:36 +0x82
main.validate(0x1248540, {0xc80e3c, 0x2, 0x2})
   /usr/local/greenlight/cmd/validate.go:243 +0x5b0
github.com/spf13/cobra.(*Command).execute(0x1248540, {0xc0001cd2e0, 0x2, 0x2})
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:860 +0x5f8
github.com/spf13/cobra.(*Command).ExecuteC(0x12482c0)
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:974 +0x3bc
github.com/spf13/cobra.(*Command).Execute(...)
   /go/pkg/mod/github.com/spf13/[email protected]/command.go:902
main.main()
   /usr/local/greenlight/cmd/root.go:19 +0x25

any specific syntax to provide windows files from Powershell (or any other environment I should use) ?

growing nodeset hits limit

Error occurs during network offer validation (file size 600MB).

DEBU[2022-09-15T12:39:38+02:00] configured max distance: 500 document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=stopPlaceQuayDistanceIsReasonable type=LOG valid=false
DEBU[2022-09-15T12:39:38+02:00] validation using schema "[email protected]" document=epip.xml id=ZV4GB5-iaOInY6zA8qBZE scope=main script=xsd type=LOG valid=false
XPath error : Memory allocation failed : growing nodeset hit limit

growing nodeset hit limit

^
XPath error : Memory allocation failed : growing nodeset hit limit

growing nodeset hit limit

^
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyStopPlaceIsReferenced.js:33:9(53)

journey-pattern-timings: taylor made for nordic profile passingTimes

The Nordic profile uses passingTimes. An error like: "Expected passing times for StopPointInJourneyPattern(@id='TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" shows that other forms, such as calls, are not considered. I do like this uncovers that there is no 1:1 relationship between the call and inferred ServiceJourneyPattern.

<StopPointInJourneyPattern dataSourceRef="TECBrabantWallon:DataSource" derivedFromObjectRef="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" id="TECBrabantWallon:StopPointInJourneyPattern:32775967-L_PA_2022-22_LG_ME-Mercredi-01-0010000-1" order="1" version="20220111">
  <ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
  <OnwardTimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533" version="20220111"/>
  <ForAlighting>false</ForAlighting>
</StopPointInJourneyPattern>
<Call dataSourceRef="TECBrabantWallon:DataSource" id="TECBrabantWallon:Call:32775967-L_PA_2022-22_LG_ME-Mercredi-02-0010000-1" order="1" version="20220111">
  <ScheduledStopPointRef ref="TECBrabantWallon:ScheduledStopPoint:LBkcare*" version="20220111"/>
  <OnwardTimingLinkView>
    <TimingLinkRef ref="TECBrabantWallon:TimingLink:-551368533"/>
    <RunTime>PT0S</RunTime>
  </OnwardTimingLinkView>
  <Arrival>
    <Time>12:23:00</Time>
  </Arrival>
  <Departure>
    <Time>12:23:00</Time>
    <WaitTime>PT0S</WaitTime>
  </Departure>
</Call>

Documentation: Running the tool from source

"But to find the correct library and necessary additional libraries (green etc.) were challenging. 

As it was the first time working with Docker and Docker_Libraries and therefore it was not easy to understand how to get the necessaries Libraries from GitHub repositories"

CEN SIRI Question

Hi folks,
Thank you so much for providing and maintaining this repository. Kudos!

Do you mind asking me if this tools validates NETEX only or also CEN SIRI? SIRI is mentioned quite a lot in this repository but I did not found any information about SIRI validation yet.

Cheers!

Web Interface: Upload your own schema/XSD

"The usage is simple but the fact that you do not have the ability to select your own XSD to test against, is a problem (we could not validate the level 2 of our profile)"

Web Interface: Better navigation

Improve the navigation in the web interface so that you can Start over, Navigate Backwards, etc.

"There is no menu structure visible and no structure to go back to the input page, which is necessary to navigate within the web interface."

DATA4PTToolsv0.4.2 everyLineIsReferenced

const journeyLinePath = xpath.join(
xpath.path.FRAMES,
"TimetableFrame",
"vehicleJourneys",
"ServiceJourney",
);

In the EPIP-based documents, the LineRef cannot be specified within ServiceJourney. Instead, the Line is referenced from the:
ServiceJourneyPattern id="C::ServiceJourneyPattern:1::" version="any"
RouteView id="C::RouteView:1::"
LineRef ref="C::Line:1::"
or
routes
Route id="C::Route:1: version="any"
LineRef ref="C::Line:1"

versions are not evaluated in the xpath searches

The current javascript xpath searches never include the version attribute. While the key-identity constraint in the XSD does. This also shows there is no 'any' (the default for the version attribute) handling and application logic.

NeTEx XSD schema version

In the Web GUI when looking at the schemas to validate against. The "NeTEx" one redirects to the NeTEx GitHub, but i noticed the validation does not use the latest version of the NeTEx master.

Can you add the information on the XSD version to the Web interface?
This will probably avoid a lot of confusion when using different validation tools.
When will the Web Validator be updated with the latest XSD?

Warning on "sarama" while compiling from source

Hello! I'm compiling from source to evaluate the validator (we cannot use it from Docker in our production, which itself runs as a docker app), and wanted to provide the following feedback:

While compiling the app, I see a warning:

❯ go get
go: downloading github.com/eclipse/paho.mqtt.golang v1.3.5
# SNIP

go: warning: github.com/Shopify/[email protected]: retracted by module author: producer deadlock https://github.com/Shopify/sarama/issues/2129
go: to switch to the latest unretracted version, run:
	go get github.com/Shopify/sarama@latest

The issue IBM/sarama#2129 has been fixed in the sarama library apparently.

Reporting: Better naming of the Jobs to understand what is validated

"The use of Ref instead of something more “speaking” does not let you search the report corresponding to a known validated file (especially when you try to do more than one validation before examining the result: the best way in this case is to download the result after the validation and not use the job page at all)."

Building from source: error in compilation

Hi,

I works on WIndows 10 and I tried to install your validation tool on my laptop. I followed all steps in section Buiding from source and installed all dependencies (make, libxml2, pkg-config, gcc). But at the end, when I executed this command below:

go run cmd\file.go cmd\mqtt.go cmd\root.go cmd\server.go cmd\session.go cmd\static_dir.go cmd\validate.go validate -i testdata

I had have this message below:

# github.com/lestrrat-go/libxml2/clib
fork\libxml2\clib\clib.go:5:10: fatal error: libxml/parserInternals.h: No such file or directory
    5 | #include <libxml/parserInternals.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Have you a solution to debug this ?

Best regards,
Alban GOUGOUA
Data Analyst in French Transport Regulatory Body (ART - Autorité de Régulaton des Transports)

"Opt-out telemetry" should probably be more advertised

I realised, by looking at the code only, that some stats about use appear to be sent by default to a remote database.

Although I understand the benefits of such stats as a tool builder, in particular during a beta, I find it problematic that it is not advertised clearly in the readme, with instructions to opt-out, or a completely opt-in operation instead (RGPD compliance etc).

I doubt most users will realise that (especially when using the Docker version), so I'm creating an issue to give a bit of visibility to this topic.

const (
influxURL = "https://europe-west1-1.gcp.cloud2.influxdata.com"
influxToken = "ZgkcIAuMuoSM0KcG38iui5nLQrYv9oLiSCfJ2sin2exvxJnbMQjUea1kGQrsGteKCazgo_83thED1lS1O1XYEw=="
influxOrg = "4b2adfedb7f7619e"
influxBucket = "greenlight"
)

validateCmd.Flags().BoolP("telemetry", "", true, "Whether to collect and send information about execution time")

viper.BindPFlag("telemetry", validateCmd.Flags().Lookup("telemetry"))

if viper.GetBool("telemetry") {
logTelemetry(validator, ctx.Results())

func logTelemetry(validator *greenlight.Validator, results []*greenlight.ValidationResult) {
client := influxdb2.NewClient(influxURL, influxToken)
defer client.Close()
writeAPI := client.WriteAPI(influxOrg, influxBucket)
for _, r := range results {
if viper.GetBool("telemetry") {
p := newPoint("document")
p.AddField("schema_name", validator.SchemaPath())
p.AddField("schema_bytes", validator.SchemaSize())
p.AddField("execution_time_ms", r.ExecutionTime().Milliseconds())
p.AddField("name", r.Name)
p.AddField("valid", r.Valid)
writeAPI.WritePoint(p)
for _, rule := range r.ValidationRules {
p := newPoint("rule")
p.AddField("schema_name", validator.SchemaPath())
p.AddField("schema_bytes", validator.SchemaSize())
p.AddField("execution_time_ms", rule.ExecutionTime().Milliseconds())
p.AddField("document_name", r.Name)
p.AddTag("name", rule.Name)
p.AddField("valid", rule.Valid)
p.AddField("error_count", rule.ErrorCount)
writeAPI.WritePoint(p)
}
}
}
writeAPI.Flush()
}

Config file not taken into account

Hello,

I'm trying to launch the tool using a config file as follow:

docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight/documents -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml lekojson/greenlight -i /greenlight

┌ /greenlight/documents ─╼
│ xsd ... ok
└───╼ 

The validation seems to be done successfuly.
The config file contains the configuration given in example in the documentation:

schema: xsd/NeTEx_publication.xsd # schema to use for validation, comes shipped with the source/container image
logLevel: debug # default is undefined, setting this parameter disables the fancy setting, regardless of its value
fancy: true # displays a progress instead of log
inputs: # where to look for documents
  - ~/.greenlight/documents
  - /etc/greenlight/documents
  - /documents
  - /greenlight/documents
  - ./documents
outputs:
  - report: # logged in standard output
      format: mdext # mdext (markdown extended) or mds (markdown simple)
  - file:
      format: json # formats available are: json or xml
      path: . # where to save the file (filename format is ${path}/report-${current_date_time}.${format}
builtin: true # whether to use builtin scripts
scripts: # where to look for custom scripts
  - ~/.greenlight/scripts
  - /etc/greenlight/scripts
  - /scripts
  - /greenlight/scripts
  - ./scripts

I am expecting to have a json output report generated in the current directory but I have none.
The only output I can see is the terminal output saying ok

Can you help me understand how to have access to a full output report?
Thanks

DATA4PTToolsv0.4.2 panics

DEBU[2022-09-15T07:57:23+02:00] validation using schema "[email protected]" document=epip.xml id=0jh1ZLPFuDKxD_-PWLWFr scope=main script=xsd type=LOG valid=false
panic: TypeError: Cannot read property 'push' of undefined or null at builtin/everyLineIsReferenced.js:39:9(52)

goroutine 29 [running]:
github.com/dop251/goja.(*Runtime).wrapJSFunc.func1({0xc000c74678, 0x1, 0x1?})
/Users/user/go/pkg/mod/github.com/dop251/[email protected]/runtime.go:2183 +0x525
github.com/concreteit/greenlight/js.(*Script).Run(0xc00003c780, {0x7ff7bfeff9c6, 0xb}, {0x4da8188?, 0xc00057e010}, 0xc00040f0e0, 0xc0005796b0, 0x0)
/Users/user/Projects/DATA4PTToolsv0.4.2/js/script.go:108 +0x41f
github.com/concreteit/greenlight.(*Validation).validateDocument.func1.1(0x4b8a620?)
/Users/user/Projects/DATA4PTToolsv0.4.2/validation.go:137 +0x57
github.com/concreteit/greenlight/internal.(*Queue).Run.func1(0x0?)
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:29 +0xa2
created by github.com/concreteit/greenlight/internal.(*Queue).Run
/Users/user/Projects/DATA4PTToolsv0.4.2/internal/queue.go:27 +0x8d
exit status 2

Cannot validate local file

Hello,
I'm trying to give the tool a first spin.

docker run -it -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex.zip:/greenlight/documents lekojson/greenlight
stat /root/.greenlight/documents: no such file or directory

I've tried to work with an uncompressed netex directory, but I get the same error.
Any advice?
Thanks!

Web Interface: Improve Jobs page

"The refresh in the jobs page (the summary of the validation done) of the web interface version is annoying because reloads the page every time moving the content up and down"

Web Interface: Add progress indicator

"We think it would be better if there was some sign of the progress just to know if the application still is working or has stopped by some reason. Also we could then estimate the total the application need to validate all files. "

Special characters?

The files from Portugal cause very strange artifacts.

          {
            "name": "xsd",
            "description": "General XSD schema validation",
            "valid": false,
            "error_count": 32,
            "errors": [
              {
                "message": "\u0005\ufffd\ufffd\ufffd\u0007",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "u\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "\ufffd\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },
              {
                "message": "\ufffd\ufffd*\\O^?",
                "line": 65535,
                "type": "xsd"
              },


Config error

I get errors ehen i ran the config.yaml file. But it does not say what is the error.
I get: testdata/line_3 9011005000300000.xml
xsd ... failed with 2 errors and 0 warnings

Cannot load local config file

I'm trying to launch the tool with a config.yam file on my computer.

Here is the command :

docker run -it -v /home/francis/projects/transport/greenlight_validator/config.yaml:/greenlight/config.yaml -v /home/francis/Downloads/TBM_NeTEx/bordeaux_metropole-aggregated-netex/OFFRE_bordeaux_metropole_20220302001237Z/BORDEAUX_METROPOLE_offre_Bus_1_54_54.xml:/greenlight lekojson/greenlight -i /greenlight

It fails with the following message :

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/home/francis/projects/transport/greenlight_validator/config.yaml" to rootfs at "/greenlight/config.yaml" caused: mount through procfd: open o_path procfd: open /var/snap/docker/common/var-lib-docker/overlay2/5bf4783716880385cc8d323ff83e64bb840da606013fd2cda2521844d3be4451/merged/greenlight/config.yaml: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

The content of the config.yaml is the one given in the doc.

note

The -i /greenlight is the workaround suggested in #16

Validation: Validation of large files ends without any output

"We happened to use big xml and the validation ended without saying anything (this happened in the first version of the tool with both CLI and web interface version: we could not try we the recent one, so we do not know if the problem has been solved)"

Words of feedback from the French NAP

Hello there! I work on https://transport.data.gouv.fr/ and I did some testing on the NeTEx validator, I wanted to share it with the authors & other users here. Thanks for your work on this!

I have not yet been able to review the output of the reports themselves, so this first round is more about onboarding / "surprise report" and potential production issue (RAM use etc), than the quality of the reports.

Hope this helps!

Installing from source is useful

Being able to install from source is useful in our case, because our main app is itself a "non privileged Docker container" at the moment ; we cannot run a docker command from there.

We cannot either run go cmd/* in production, because the go binary won't be there in the final Docker image.

I explored the Dockerfile and found how you actually build the binary:

RUN go build -o glc cmd/*.go

Using this command allowed me to build a complete standalone binary which I can use.

Initially, the cmd/* pattern was a bit confusing to see, since it was a bit unclear if these were different programs, or parts of the same program (the latter was the answer), but that is not a big problem.

The doc could be improved to document how to prepare a full binary.

There is telemetry by default

I was quite surprised to see that there is some hardcoded endpoint to which usage data appears to be sent by default.

I have created a specific issue for that part:

#23

Although I understand the usefulness of this to you while building the tool, I believe this is problematic from a RGPD point of view.

To the very least I feel this should be advertised in the readme, or even an opt-in behaviour (at the cost of reducing your tracing rate).

I don’t believe most users will realize this is happening!

Time taken & memory used

Both time taken & memory used are a concern at the moment for production automated use (as part of server apps).

I have commented on existing issues:

docker run -it -m 1GB -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip

Non-linear memory use (or too high use) will be a problem for production use, since we allocate a fixed number of GB per container, and cloud environments can be costly from a RAM point of view.

Documentation improvement

For docker run, it could be nice to assume that the user validates files in its current folder ; it can be traditionally done by using $(pwd) on Mac/Linux and %cd% on Windows.

docker run -it -v $(pwd):/greenlight/documents lekojson/greenlight validate -i /greenlight/documents/export-intercites-netex-last.zip

The help at GitHub - ITxPT/DATA4PTTools: Shared space for the development of the DATA4PT validation tool(s) lacks the -i myfile flag at time of writing.

Some dependencies should be upgraded

Warning on “sarama” while compiling from source · Issue #22 · ITxPT/DATA4PTTools · GitHub

That's it for today ; thanks for open-sourcing the tool & the discussion!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.