Giter Site home page Giter Site logo

umdio's Introduction

UMD.io · license MIT PRs Welcome CI codecov

UMD.io is an open API for the University of Maryland. The main purpose is to give developers easy access to data to build great applications. In turn, developers can improve the University of Maryland with the things they build.

Features

Easy API access to

  • Four years of course data
  • Live Bus data, through NextBus
  • Campus Building names and locations
  • Information on Professors and Faculty
  • Basic info about all Majors

Getting Started

To use the api, please refer to our documentation.

Development

If you're interested in contributing to UMD.io, please read our Contributing guide. To work on umd.io, or to run your own instance, start by forking and cloning this repo.

Setting Up Your Environment

First, install docker and docker-compose. Then, clone the repo along with the umdio-data submodule.

git clone --recurse-submodules https://github.com/umdio/umdio.git

Then, launch the development environment.

# You may need to run docker-related commands with `sudo` if you're a linux user
docker-compose -f docker-compose-dev.yml up

Once launched, run the scrapers. This will take some time, so in the meantime, review the rest of the guide.

# You may need to run `chmod +x umdio.sh`
./umdio.sh scrape

Credits

See contributors

License

We use the MIT License.

umdio's People

Contributors

beane avatar chen-justin avatar donisaac avatar eveninglily avatar javathunderman avatar joshg4096 avatar nickav avatar nyaculak avatar rrcobb avatar rstumbaugh avatar tybug avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

umdio's Issues

Testing Issues

Even though testing works now, it's not anywhere near perfect.

For one, the pagination tests are disabled because they failed for some unknown reason with Travis. I'd like to see them re-added, and expanded.

More than that, we should try and aim to have all of the API tested and covered.

After that, we can do a lot of work on speeding up the time it takes to do this by swapping out some of the backend systems, while being sure the front-facing API remains working

Website formatting breaks

Hi, I don't know if this is an appropriate place to report issues with your website, but if you click on any of the links under "API" except "Introduction" or "Tips and Tricks", it highlights the link above the one you clicked. Also, if you click on any of them except "Introduction", and then scroll back to the top, the menu moves up under the "umd.io" link and the space above the "Introduction" heading disappears.

Cleaning database yields error

Running
docker exec -it umdio_umdio_1 bundle exec rake db:clean
yields
rm: cannot remove './data/mongo': No such file or directory

Inaccurate data

Some of the data from past semesters is flat out wront - i.e., compare this with this. There's a clear discrepency here - also see https://api.umd.io/v0/courses/sections?semester=201705&course=CMSC131

There are probably more examples, but this is the state of things right now. I'd be willing to bet it's due to building of various errors and such from the scraper over time, see #63 for some comments on that.

When we deploy the beta endpoint (soon), hopefully the data there will be correct

Special Gen Ed cases

Looks like the current method of parsing the Gen Eds for a course just check for Gen Ed codes separated by commas. When a course's gen ed requirements look like "DSNS or DSSP", this causes the gen ed to be parsed as "DSNSDSSP". I would start working on a fix, but I'm not exactly sure how the "or" case should be handled. Just adding it to the Gen Ed array wouldn't be very clear (is it DSNS, but I think adding "ABCD or WXYZ" means the course wouldn't appear in the search results if you were searching something like /courses?gen_ed=ABCD. There are even weirder situations like "DSNL (if taken with WXYZ123) or DSNS in the case of AOSC200. Maybe just split by commas, store everything in an array, and do a text search instead of the standard mongoDB search?

Unable to get request headers in JavaScript

getAllResponseHeaders() only returns Content-Type and Cache-Control.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Document</title>
</head>
<body>
  <script type="text/javascript">
    var request = new XMLHttpRequest();
    request.open('GET', 'http://api.umd.io/v0/courses', true);

    request.onload = function() {
      if (request.status >= 200 && request.status < 400) {
        // Success!
        var data = JSON.parse(request.responseText);
        console.log(request.getAllResponseHeaders());
    };
    request.send();  
  </script>
</body>
</html>

Thanks to @IsabellaS09 for finding this.

Consider adding a professor endpoint

Would be awesome if this api provided data on current umd professors, what courses they're teaching this semester, and reviews (which could be pulled from OurUMD or ratemyprofessor). Could look into this myself later, when I have time. Just leaving a note.

Outdated Data

The course data doesn't yet include fall 2016's schedule of classes. (Querying http://api.umd.io/v0/courses?semester>201601 returns an empty array)

Also: would it be possible to set up the course scrapers to run on a daily (or more frequently) basis? It would be useful to have live data on open seats and the waitlist for each class.

Inaccurate messages in sections_scraper.rb

Around line 101 there's a message that keeps track of sections processed. Ideally it should keep track of how many courses are remaining and calculate from there but as of now there's a hard-coded value.

errors at api.umd.io/v0/bus/routes

in http://api.umd.io/v0/bus/routes, every bus route has a double entry EXCEPT:

  • 125 Campus Circulator
  • 129 Franklin Park at Greenbelt Station
  • 130 Greenbelt
  • 137 IKEA
  • 138 Greenbelt
  • 139 NASA Goddard
  • 140 UMB Law
  • 140 UMB Law
  • 141 Gaithersburg Park & Ride
  • 141 Gaithersburg Park & Ride

All other entries need to have an instance of it itself removed.
Also many bus routes are deprecated but still show up such as the:

  • 125 Campus Circulator
  • 139 NASA Goddard
  • 138 Greenbelt (is now the 143 Greenbelt)

500 Internal Server Error when using encoded URLs

The server returns a 500 status code when certain characters are used in the URL after being encoded using PHP's urlencode function (or any similar functionality).

For example: The call to http://api.umd.io/v0/courses?credits=2,3& is valid; however, the equivalent URL encoded call to http://api.umd.io/v0/courses?credits%3D2%2C3%26 fails.

DNS issues

When I was setting up HTTPS, there were some issues concerning www.umd.io, specifically the way the DNS is configured. Right now, it's not an issue, as it just seems to redirect at a DNS level, but I wanted to make sure this is intended and fine

Also, I'd like to request a beta.umd.io domain to test things on the production server with, without breaking what we have

@rrcobb

Data migration

With the upcoming switch to docker, and planned changes in data storage (mongo -> postgres), we're at risk of losing some of the old course data that doesn't seem to be scrapable anymore. i.e., 2015 data is not on testudo anymore.

In addition, due to older versions of the scraper, some of the older semester class listings on umd.io are missing a lot of data.

I think in the upgrade, we should just get rid of the old data we can't access anymore; create a dump of the existing mongo db and have that downloadable for anyone who wants the data that is there, and going forward look into long-term ways to make sure course data is accessible even when it leaves testudo (possibly using archive.org?)

Enhance map scraper with GIS data

Right now, the buildings scraper just loads JSON that contains building names, codes, and locations. However, this is in no way dynamic, and has to be updated manually. If we wanted to add addresses, for example, it'd be hard.

I've done some research already on how we can go about getting this data -

http://maps.umd.edu/arcgis/rest/services is the service we need and already know about. After some digging, https://maps.umd.edu/arcgis/rest/services/Layers/CampusMapDefault/MapServer/find in specific is what we can use to query for the data. All we need is a building name/code, and to put "BuildingPoint" as the layer. For example, the url
https://maps.umd.edu/arcgis/rest/services/Layers/CampusMapDefault/MapServer/find?searchText=esj&contains=true&searchFields=&sr=&layers=BuildingPoint&layerDefs=&returnGeometry=true&maxAllowableOffset=&geometryPrecision=&dynamicLayers=&returnZ=false&returnM=false&gdbVersion=&returnUnformattedValues=false&returnFieldName=false&datumTransformations=&layerParameterValues=&mapRangeValues=&layerRangeValues=&f=json

gives us the data for ESJ. I haven't quite figured out how to get a list of all the buildings, as it doesn't allow the search text to be empty. It does accept a space, but I need to do some research to see if there are any buildings that don't have a space in their name. We should also figure out what data from here we want to gather and display, and adjust the query parameters accordingly. From there, it's just grabbing the data and storing it.

Unexpected in Courses endpoint

Sending a GET request to http://api.umd.io/v0/courses returns results that start in the Cs.

[  
   {  
      "course_id":"CPPL100",
      "name":"College Park Scholars: Public Leadership First-Year Colloquium I",
      "dept_id":"CPPL",
      "department":"College Park Scholars-Public Leadership",
      "semester":"201508",
      "credits":"1",
      ...
]


As default sort is ascending, I would expect it to start at A, as in the documentation.

Also, a GET request to http://api.umd.io/v0/courses with no params, according to the documentation, should return an Array of objects with three properties: course_id, name, and department.

[
  {
    "course_id": "AASP100",
    "name": "Introduction to African American Studies",
    "department": "African American Studies"
  },
  {
    "course_id": "AASP101",
    "name": "Public Policy and the Black Community",
    "department":"African American Studies"
  }
]

Instead, an array of full Course objects are returned. Is this intended?

Setup Docker to run in production

The current Docker system is setup for development. We need to figure out a plan to move away from vagrant and deploy the app in production with Docker.

Data Entry - Scraping University Resources vs Making Custom Frontend

You guys are doing a great job with the api!

Looked through the ideas doc- curious about your thoughts on continuing to serve data by scraping university resources, vs building a frontend for keeping track of the new data types you're proposing (assuming they're not already available on the web to be scraped).

This is something I'm working on up north as well- currently just using google spreadsheets as the site of data entry. I just think you guys are laying down a really good boilerplate on rails and am curious whether you've seen any good data entry frontends for university data applications.

Dead Links

Issues when installing via Vagrant

Here's a paste of the output from attempting vagrant up.

bundle exec rake up still works and I can access the API through localhost:3000. However, navigating any endpoint doesn't return the expected data.

Jekyll serve not running

From @rstumbaugh:
I still can't seem to get jekyll running on my machine. I'm running jekyll serve from the docs/ folder, but it looks like the _config.yml is in the src/ folder. Even when I put the config file in the docs/ folder, jekyll can't find any of the posts and the Sass doesn't get compiled. Any ideas?

Bus Scraper Dupes

Follow-up to #35

Somehow, a lot of duplicate routes got into the bus collection. We should figure out how this happened, and how to avoid it in the future.

Certain methods do not support ?page and ?per_page parameters

The http://api.umd.io/v0/courses/list method does not support ?page or ?per_page parameters. Per the documentation, it seems like, since this endpoint returns a large number of items (4261 for 2015 spring semester) that it should support pagination.

Not returning errors

Hey so far it's got great response times and working out great. However, i did find a little bug. Looking up courses that sections dont exist returns a 200 null. For example,

http://api.umd.io/v0/courses/ENES100 returns 200 with valid json
http://api.umd.io/v0/courses/ENES101 returns 200 with null

When it should return 400 with an error code.

Professors API doesn't sort by department?

http://api.umd.io/v0/professors?department=CMSC,BMGT&sort=-name returns an empty array, [] instead of the list of professors in the CMSC and BMGT department (sorted by name) (note that this is the example provided on the website).

With some further testing, here's a table with some examples of the parameters that do and do not work for the professors API:

Call Works?
http://api.umd.io/v0/professors?department=CMSC FALSE
http://api.umd.io/v0/professors?semester=201801 TRUE
http://api.umd.io/v0/professors?sort=name TRUE
http://api.umd.io/v0/professors?semester=201801&sort=name TRUE
http://api.umd.io/v0/professors?department=CMSC&sort=name FALSE
http://api.umd.io/v0/professors?department=CMSC&semester=201801 FALSE
http://api.umd.io/v0/professors?semester=201801&department=CMSC&sort=name FALSE

The Call column represents the GET HTTP request made, and the Works column indicates whether or not the call returned anything

It appears that filtering out professors by which department doesn't work as intended.

Multiple courses picked up as one

On this page: http://api.umd.io/v0/courses?page=91

There is a course_id: "GVPT388WGVPT389GVPT390GVPT396GVPT397GVPT409CGVPT409EGVPT409HGVPT409IGVPT409OGVPT421GVPT423HGVPT429EGVPT432GVPT439AGVPT454GVPT456GVPT459EGVPT459MGVPT459OGVPT459RGVPT473GVPT479DGVPT479LGVPT479PGVPT479YGVPT482HGVPT722GVPT729MGVPT743GVPT771GVPT799GVPT808BGVPT808CGVPT831GVPT873GVPT888BGVPT888EGVPT898GVPT899"

It appears that the rest of the fields are just a concatenation of the information for all of those courses.

Enable HTTPS

Many browsers will block mixed content, so it would be ideal if the API could serve over HTTPS so it plays nicely with browsers and we aren't telling users to turn off their security settings to use a site.

Needs Spring 2017

Hi team, I find your service very useful. It needs the 2017 Spring data.

Malformed JSON response to buildings list query

When i query the buildings list, javascript's standard JSON.parse fails due to an unexpected comma, with the error:

undefined:1
88","lat":"38.9902169"},{"name":"J.H. Kehoe Track and Ludwig Field",,"code":""
                                                                    ^
SyntaxError: Unexpected token ,
  at Object.parse (native)
  at IncomingMessage.<anonymous> (/home/adrusi/Code/umdio/index.coffee:19:24)
  at IncomingMessage.emit (events.js:129:20)
  at _stream_readable.js:908:16
  at process._tickCallback (node.js:355:11)

200 status returned with a 404 error code

For the following 404 examples in the documentation,

http://api.umd.io/v5/courses
http://api.umd.io/courses
http://api.umd.io/v0/course
http://api.umd.io/v0/courses/ENGL115

the status code returned is 200 OK. However, the JSON key error_code returns:

error_code : 404

Since for the example for a 400 error,

http://api.umd.io/v0/courses/ene10

returns a 400 status code along with the JSON key error_code

error_code : 400

should this be the same for 404 errors? i.e. a 404 status code being returned as well?

Unable to run docs server locally

I've tried to run the docs server locally, but there seems to be a problem with the configuration with regards to the styles. Executing the make file results in:
jekyll build -s src/ -d public/ Configuration file: src/_config.yml Source: src/ Destination: public/ Incremental build: disabled. Enable with --incremental Generating... jekyll 3.6.0 | Error: No such file or directory @ rb_sysopen - /home/nick/dev/web/umdio/docs/src/css/main.scss

Use proxy for bus requests

Yo! Umd.io is looking sweet, saw that you guys are working on the Bus API and it looks pretty cool. One thing I noticed with a43102b is that there's another request on top of the request to umd.io, i think for speed you should add nginx as a reverse proxy to the bus API. For example, i set up this server on DO.

http://45.55.130.81/bus/?a=umd&command=routeList

You could add you domain name, and basically forward any request from http://umd.io/bus --> http://webservices.nextbus.com/.

Here's the nginx config i used, it's really simple and effective.

location /bus/ {
    proxy_pass http://webservices.nextbus.com/service/publicJSONFeed;
}

Otherwise, things look awesome, keep up the good work.

Issue with bus locations

Getting an interesting error:

{"lastTime":{"time":"0"},"copyright":"All data copyright University of Maryland 2015.","Error":{"content":"last time \"t\" parameter must be specified in query string","shouldRetry":"false"}}

from visiting http://api.umd.io/v0/bus/routes/115/locations per http://umd.io/bus/#route_locations

Inaccurate and conflicting documentation for courses

On the main page of your website, under "Sample requests and responses", you have:

http://api.umd.io/v0/courses/ENGL115
200: Returns empty braces, because ENGL115 is not a real course. If your app thinks it’ll get a real response here, it might break.

Which conflict with the documentation under "Courses", which says:

Returns: The course object specified, or null

Which also conflicts with the actual response returned by "http://api.umd.io/v0/courses/ENGL115", which is:

{"error_code":404,"message":"Course with course_id ENGL115 not found!","available_courses":"http://api.umd.io/v0/courses","docs":"http://umd.io/courses/"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.