znmeb / gtfs-collector Goto Github PK
View Code? Open in Web Editor NEWA Docker image to collect General Transit Feed Specifcation (GTFS) data
License: MIT License
A Docker image to collect General Transit Feed Specifcation (GTFS) data
License: MIT License
The only piece of the puzzle that needs to be operating continuously on a server in the cloud is collecting the real-time data as it's published. Everything else can be done on the desktop or another server. This also reduces the space requirement dramatically.
This would have two pieces:
glfsdb
executable.Describe the bug
executing gtfsrdb sometimes works and sometimes doesn't. It's obviously an environment issue.
To Reproduce
Steps to reproduce the behavior:
Clone the repo and execute it. The error is something to do with the order in which it creates tables. If it tries to create one with a foreign key before the table with that key has been created, it crashes.
Expected behavior
Collecting data begins.
Additional context
It usually works with the native Python on Arch. But it needs to run in containers.
Ideally, there would be something that could work in a "forever free" mode. The biggest hurdle is likely to be disk space; most free tier services are small SSDs.
One for PostGIS
, one for gtfsdb
and one for `gtfsrdb. This will help isolate the crashes and other bugs.
The intended use case is that the user starts the container collecting data, then accesses it via a PostgreSQL client library from R, Python, Julia, or some other data science language. Since collection is supposed to be continuous (we wouldn't need a server otherwise), there needs to be a way to periodically back up the database and truncate it, or some kind of round-robin scheme with roll-ups, perhaps daily or weekly. Monthly seems too long - weekly feels right to me.
This is required to persist data in the event of an unplanned container shutdown.
That way, we can use experimental TriMet feeds
There's a lot of good transit code out there in R, and it would relieve me of having to troubleshoot other peoples' Python code. And it would make a cool workshop for R and TriMet nerds in Portland. ;-)
Not sure it's a necessity given psql
, but it's nice to have.
The intended use case is that a user deploys this container, tells it the URLs for the GTFS real-time feed and any authentication secrets required to connect. Then the container starts collecting data into the PostgreSQL database.
As of now, if the container dies, we lose the data. So we need a volume.
Not sure where to prioritize this, but there needs to be a bare metal version at least to create the database populated by gtfsdb
. I'm building the Linux version with Conda so it will be Windows-ready.
On a Digital Ocean mini-droplet with 1 GB of RAM, the gtfsdb
(non-real-time) collector crashes with "Killed". It runs on the workstation, and there appears to be a steady-state 2.5 GB requirement and a spike over 5 GB! So the plan for deployment is to just collect real-time data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.