Giter Site home page Giter Site logo

etalab / transpo-rt Goto Github PK

View Code? Open in Web Editor NEW
20.0 11.0 9.0 592 KB

Simple API for public transport realtime data

Home Page: https://tr.transport.data.gouv.fr/

License: GNU Affero General Public License v3.0

Rust 99.63% Makefile 0.37%
gtfs-rt siri-lite realtime-data

transpo-rt's People

Contributors

antoine-de avatar jerome1337 avatar l-vincent-l avatar thbar avatar tristramg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transpo-rt's Issues

Dataset detail page `/{id}` currently returns 502 if static data is unavailable

(Following work on #109/#107).

If one queries /{id} when the base schedule (GTFS) is not available, they will get a 502 with an error.

For better ergonomics, we could instead generate a degraded version of the detail page (e.g. https://tr.transport.data.gouv.fr/reseau-de-transports-en-commun-de-la-communaute-dagglomeration-de-lauxerrois/) filling what we can.

I believe most of what is generated could be still generated, actually.

To be discussed!

Better readme

The readme should:

  • provide a general view of the project
  • list the routes (+ parameters ?)
  • provide a explication about the actor model used ? maybe it's better to put this in the wiki, but since it is not trivial, I think we should document this.

Respecter le format d'url de siri-lite

On devrait avoir des url du type
http://[hôte]:[port]/siri/[version]/[service-siri][format-de-réponse]?[paramètres]
On a
http://[hôte]:[port]/siri-lite//[service-siri][format-de-réponse]?[paramètres]

Questions sur la norme siri lite

Cette issue est la pour garder la trace de questions sur la norme siri lite.

  1. est il possible de requête StopPointDiscovery pour avoir des id de StopArea ?
  2. est il possible de donner un Id de StopArea à StopMonitoring ?
  3. pour StopMonitoring, comment doivent être ordonnés les MonitoredStopVisit ? Heure théorique, heure mise à jour ?
  4. un peu sur le même principe, sur quoi doit on filtrer le paramètre PreviewInterval ? Théorique ? Temps réel ?

Errors are sent in text/plain rather than json content-type

During my manual tests of #109, I noticed that when the base schedule is not available, we get a text/plain error currently.

Here is an example, using a local test config and querying http://localhost:8080/test:

HTTP/1.1 502 Bad Gateway
content-length: 170
connection: close
content-type: text/plain; charset=utf-8
date: Mon, 18 Jan 2021 10:01:12 GMT

theoretical dataset temporarily unavailable : impossible to read GTFS http://localhost:8000/SEM-GTFS.zip because Invalid Zip archive: Could not find central directory end

Since the whole API is JSON, I believe standardising on JSON content-type of errors would be better for the end-user life.

Useful resources

relaxed trip matching in gtfs-rt

for the moment we have to find the trip id in the GTFS to apply the GTFS_RT.

If the need arise we could also try to find the trip by matching:

  • route_id
  • direction_id
  • start_time
  • start_date

as stated in the doc

Test the Siri-lite conformity

we need to check that the siri-lite's response are usable.

It's tough to find siri-lite users, we might need to look at irys for this

Lots of "impossible to find stop" warns in the production logs

Not sure if problematic but creating the issue to track this.

2021-01-18T15:58:05+01:00 Jan 18 14:58:05.097 WARN impossible to find stop Some("80701") for vj 113|25
2021-01-18T15:58:05+01:00 Jan 18 14:58:05.097 WARN impossible to find stop Some("80699") for vj 113|25
2021-01-18T15:58:05+01:00 Jan 18 14:58:05.097 WARN impossible to find stop Some("80697") for vj 113|25
2021-01-18T15:58:05+01:00 Jan 18 14:58:05.097 WARN impossible to find stop Some("80747") for vj 203|218
2021-01-18T15:58:05+01:00 Jan 18 14:58:05.097 WARN impossible to find stop Some("80749") for vj 203|218

StopPointMonitoring en prod

Scénario :

  • entrée : stoppoint_id
  • sortie : deux prochains passages en SiriLight

Limitations

  • pas de validity patterns
  • théorique pur

create swagger

to be referenced by api.data.gouv.fr we need a swagger.

We'll see if we hand craft it or it we try to do it automatically.

Better handling of invalid GTFS

For the moment invalid gtfs remove the dataset from the API at startup.
If the GTFS becomes invalid while reloading, the old one is kept (and a sentry error is issued).

There are several flows in this approach:

startup

We want at least to keep the invalid dataset and mark it as invalid in the / api.
An even nicer approach would be to mark it as invalid at try again to load it later (once a day ?). This would imply a bigger refactoring of the code since the DatasetActor would return an option (or result), but I think it would be not that bad.

We should also check if the sentry message marking an invalid dataset is send, I don't see them on sentry.

base schedule reloading

Retrying every 5mn seems a bit overkill 🤔 Maybe we should have an exponential backof, and a limit afterward ?
It seems way less important than the previous point though.

Implement HEAD queries

When playing with data, it is very common to use curl -I (aka --head) to fetch only the headers.

Currently doing so on this app will return a 404, which can be a bit confusing too to newcomers.

I am unsure how costly it would be to implement that at the moment given my current knowledge of Actix.

Maybe there is a nice trick in Actix to ensure most GET queries can also be transformed to HEAD more or less automatically.

Less urgent than other enhancements, but still nice to have!

Retry sooner than every 24h in case of unavailable base schedule

The work done in #107/#109 already improved the boot process by ensuring that a dataset is not "unregistered" (disappears from the routes) completely until next deploy if the static part is not available at boot time.

After #109 is merged, though, in case of unavailability at boot, the dataset download will only be retried 24h later at the moment, via this code:

impl actix::Actor for BaseScheduleReloader {
type Context = actix::Context<Self>;
fn started(&mut self, ctx: &mut Self::Context) {
info!(self.log, "Starting the base schedule updater actor");
ctx.run_interval(std::time::Duration::from_secs(60 * 60 * 24), |act, ctx| {
info!(act.log, "reloading baseschedule data");
act.update_data(ctx);
});
}
}

Detecting the unavailability (at boot and also later) and rescheduling with shorter timeframes (possibly with a form of backoff) would be a real improvement.

Todos

  • Figure out how to move forward in time, to implement proper tests on this workflow. It could be done via introspection of scheduled run_interval in Actix, if available, or via another form of tweak.
  • Make sure to avoid cascading downloads (e.g. all of a sudden, we end up downloading multiple times) - see cascading failures in distributed systems for the overall idea

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.