cutr-at-usf / ontime-performance-calculator Goto Github PK

View Code? Open in Web Editor NEW

4.0 3.0 2.0 35 KB

An application to calculate on-time performance using archived GTFS-realtime data

License: Other

Java 100.00%

gtfs gtfs-realtime-data performing-calculations relational-databases sql-server-database schedule-deviation

ontime-performance-calculator's Introduction

ontime-performance-calculator

An application to calculate on-time performance using GTFS-realtime data that has been archived using the gtfsrdb tool.

For more about what this tool does, please see the chapter "Producing On-time Performance from GTFS-realtime Data" in this final report.

Prerequisites

The On-time Performance Calculator is written in Java. Maven is the build management tool for this project.

Following are the requirements to get the project up and running:

JDK 6 or higher
Apache Maven
(Optional) Git to clone the Github repository

Note that you can also use an integrated development environment (IDE) such as Netbeans or IntelliJ to build the project, which usually includes integrated support for Maven and Git.

You'll also need data:

GTFS data - A zip file containing GTFS data from the same time period as the archived GTFS-realtime data. Check out Transitland (and in particular the feed_versions API), TransitFeeds.com or GTFS Data Exchange (Deprecated) to find archived GTFS data.
Archived GTFS-realtime data - Data in a relational database (e.g., MS SQL Server) that has been archived using the gtfsrdb tool (or another tool that uses the same database schema). See TransitFeeds.com for a list of publicly-available GTFS-realtime feeds.

The following instructions are for building the project from the command line using Maven.

1. Download the code

The source files are needed in order to build the jar file. You can obtain them by downloading the files directly or by cloning the Git repository (recommended).

Download zipped version of the repository

Download the current snapshot of the project to your local machine using the "Download Zip" link on the project home page. (https://github.com/CUTR-at-USF/ontime-performance-calculator)

Clone this repository to your local machine.

With git installed on the system clone the repository to your local machine.

git clone https://github.com/CUTR-at-USF/ontime-performance-calculator.git

2. Build the project

From the command-line:

mvn install

3. Run the application

The above step will generate an executable file in the target/ directory with all the dependencies needed to run the application.

Before running the application, create an info.txt file in project's src/main/resources folder. This file should contain information needed to connect to Microsoft SQL Server database:

info.txt file should have the following information. Each line contains a tag and it's value separated by :
- server : name of the server to connect to
- username : name of the user
- password : user password
- database : name of the database to connect to

Finally, to run the application execute a command in the format:

java -jar target/ontime-performance-calculator.jar <path/to/GTFS_file.zip> <arrival_time OR departure_time> [number of records to fetch]

The application takes three arguments
1. Path to static GTFS zip file
2. arrival_time or departure_time - This argument tells the application whether to calculate schedule_deviation using arrival_time or departure_time at each stop
3. Specify number of records to fetch from database table - This argument is optional. If it’s not provided, we retrieve all records from the table

Example command:

java -jar target/ontime-performance-calculator.jar gtfs.zip arrival_time

The above command assumes that the gtfs.zip file is in the same directory you're executing the java command from.

Note that this project is currently configured to connect to a SQL Server database, but other relational databases are also supported.

ontime-performance-calculator's People

Contributors

Stargazers

Watchers

Forkers

mohangandhigh alankessler

ontime-performance-calculator's Issues

Support using arrival or departure time when calculating schedule deviation

Summary:

Currently, we calculate simplified schedule deviation using the scheduled arrival time at each stop. We should provide an option to use either arrival or departure time (both are pulled from stop_times.txt). We can pass this in as command-line parameter.

Steps to reproduce:

Run the tool

Expected behavior:

Allow me to pick calculating schedule deviation using arrival time or depature time

Observed behavior:

Arrival time is always used

Executing project without info.txt causes NullPointerException

If you try to execute the project using directions in the README without creating the info.txt file that holds the database credentials, the application crashes with the following:

$ java -jar target/ontime-performance-calculator.jar target/hart.zip

Processing feed :hart.zip
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Agency
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ShapePoint
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Route
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Stop
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Trip
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.StopTime
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ServiceCalendar
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ServiceCalendarDate
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FareAttribute
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FareRule
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Frequency
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Pathway
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Transfer
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FeedInfo

LoadStatus : SUCCESS
loadFailureReason : null
Exception in thread "main" java.lang.NullPointerException
        at java.io.Reader.<init>(Unknown Source)
        at java.io.InputStreamReader.<init>(Unknown Source)
        at edu.usf.cutr.ontimeperformance.DatabaseConnectionInfo.setFields(DatabaseConnectionInfo.java:47)
        at edu.usf.cutr.ontimeperformance.DatabaseConnectionInfo.<init>(DatabaseConnectionInfo.java:23)
        at edu.usf.cutr.ontimeperformance.FeedProcessor.load(FeedProcessor.java:122)
        at edu.usf.cutr.ontimeperformance.OntimePerformanceMain.main(OntimePerformanceMain.java:28)

We should provide a friendly error message saying to create the file instead of crashing.

StartDate and EndDate are coming back as Null

I modified the tool to work with my postgres database:

alankessler@01d10c7

However, I think that's unrelated to the error I'm now getting:

[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Agency
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ShapePoint
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Route
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Stop
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Trip
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.StopTime
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ServiceCalendar
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.ServiceCalendarDate
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FareAttribute
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FareRule
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Frequency
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Pathway
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.Transfer
[main] INFO org.onebusaway.gtfs.serialization.GtfsReader - reading entities: org.onebusaway.gtfs.model.FeedInfo

LoadStatus of GTFS Feed: SUCCESS

Connected to Database: trimet

GTFS Data Valid Start Date: null

GTFS Data Valid End Date: null
Finished updating table

I've tried with each of these trimet data sets with the same result:
https://developer.trimet.org/schedule/gtfs.zip
https://transitfeeds-data.s3-us-west-1.amazonaws.com/public/feeds/trimet/43/20170714/gtfs.zip
https://transitfeeds-data.s3-us-west-1.amazonaws.com/public/feeds/trimet/43/20170727/gtfs.zip

I'd love any suggestions on how to resolve this.

Thanks,
Alan

'ORDER BY' in SELECT SQL query in FeedProcessor.java is not required

However, we are bringing all the records from database that falls between the service start and end range of GTFS static data, the order by in the select sql query is not required.
Moreover, removing ORDER BY from query brings the results from server faster than with order by in query

Add unit tests

Ontime-Performance tool requires a GTFS feed to be given as input.
This tool is tested on HART GTFS feed google_transit.zip

The tool then do the necessary calculations to populate the fields closest_stop_id, distance_to_stop, closest_to_stop, schedule_deviation and timepoint.
Here is the desired output output.xlsx

The query run to view the output in SQL server is
SELECT TOP (10000) [oid]
,[trip_id], [timestamp] , [position_latitude], [position_longitude], [distance_to_stop]
,[closest_stop_id], [closest_to_stop], [schedule_deviation], [timepoint]
FROM [gtfsrdb_HART_static_10-17-2016].[dbo].[vehicle_positions] /*databasename.schema.tablename**/
WHERE [timestamp]>='2016-03-27 00:00:00.0' AND [timestamp]<='2017-03-25 00:00:00.0' /*service range of GTFS feed**/
AND trip_id=192226
ORDER BY [oid] DESC

Populating closest_to_stop field in database

Closest_to_stop is true (1), if closest_stop_id of all the same closest_stop_id's for a particular trip on a day of service is the closest else false (0).

In the image shown, there are 5 same closest_stop_id's for a trip 192226 on a day 10/18/2016. We populate closest_to_stop with 1 for closest of all closest_stop_id's (we can find the closest of all, from distance_to_stop field) or else with 0.

Populating schedule_deviation column in dbo.vehicle_positions table

schedule_deviation = (GPS timestamp - scheduled arrival time from GTFS stop_times.txt), in milliseconds. Positive numbers mean the vehicle is running late, while negative numbers mean the vehicle is arriving early.

Manually installing sqljdbc4 driver not required - remove section from Readme

@mohangandhiGH I cloned the repo and ran mvn install, and it looks like Maven is taking care of downloading the SQL Server JDBC driver from the OneBusAway Maven Repository (if you look in the pom.xml, there is a <dependency> for sqljdbc4).

Here's the log:

...
Downloading: http://nexus.onebusaway.org/content/groups/public/com/microsoft/sqlserver/sqljdbc4/4.0/sqljdbc4-4.0.pom
...
Downloaded: http://nexus.onebusaway.org/content/groups/public/com/microsoft/sqlserver/sqljdbc4/4.0/sqljdbc4-4.0.jar (525 KB at 493.1 KB/sec)
...

So, you should be able to remove the section 2. of the README that talks about installing JDBC driver manually.

Move info.txt out of `/target` folder

Currently, the info.txt file that contains the database credentials is assumed to be in the /target directory - see https://github.com/CUTR-at-USF/ontime-performance-calculator#4-run-the-application.

This can cause problems, as the /target directory could potentially be deleted when running mvn clean.

We should move it into the project structure itself, so it can be read when running from an IDE (i.e., on the classpath) as well as when it gets packaged into the JAR after the project is built.

I believe you can do this by moving the info.txt to the src/main/resources directory, and read it from there instead - see http://stackoverflow.com/questions/20389255/reading-a-resource-file-from-within-jar.

I recall there being a catch to this the last time I had to implement it relating to reading the file both within the IDE and within a packaged JAR, but I can't recall exactly what that was. I want to say that the accepted answer to the above StackOverflow question worked, but for some reason I didn't upvote it.

If this becomes difficult, we can always just read directly from the project root folder - the main goal is just to move the file out of /target where it could be deleted.

If we move it elsewhere, we need to add that file name to a .gitignore file in that directory to make sure the credentials don't accidentally get committed and pushed to Github.

Timezone or DST?

I imported a week of captured Trimet data and ran it against the tool. The output looks like:

(d = scheduled_deviation/60000 to get minutes)
The cluster around 60 minutes makes me suspect that there's a timezone or DST issue.

Support more databases

Summary:

Currently the project is hard-coded for a MS SQL Server database. We should modify the project to be database-agnostic so any database can be used.

@alankessler made some changes here in his fork to make it work with Postgres:
alankessler@01d10c7