umdio / umdio Goto Github PK

View Code? Open in Web Editor NEW

99.0 9.0 25.0 1.42 MB

An open API for the University of Maryland

Home Page: https://umd.io

License: MIT License

Ruby 99.86% Dockerfile 0.11% Shell 0.04%

maryland umd api university ruby openapi

umdio's Introduction

UMD.io ·

UMD.io is an open API for the University of Maryland. The main purpose is to give developers easy access to data to build great applications. In turn, developers can improve the University of Maryland with the things they build.

Features

Easy API access to

Four years of course data
Live Bus data, through NextBus
Campus Building names and locations
Information on Professors and Faculty
Basic info about all Majors

Getting Started

To use the api, please refer to our documentation.

Development

If you're interested in contributing to UMD.io, please read our Contributing guide. To work on umd.io, or to run your own instance, start by forking and cloning this repo.

Setting Up Your Environment

First, install docker and docker-compose. Then, clone the repo along with the umdio-data submodule.

git clone --recurse-submodules https://github.com/umdio/umdio.git

Then, launch the development environment.

# You may need to run docker-related commands with `sudo` if you're a linux user
docker-compose -f docker-compose-dev.yml up

Once launched, run the scrapers. This will take some time, so in the meantime, review the rest of the guide.

# You may need to run `chmod +x umdio.sh`
./umdio.sh scrape

Credits

See contributors

License

We use the MIT License.

umdio's People

Contributors

Stargazers

Watchers

umdio's Issues

Testing Issues

Even though testing works now, it's not anywhere near perfect.

For one, the pagination tests are disabled because they failed for some unknown reason with Travis. I'd like to see them re-added, and expanded.

More than that, we should try and aim to have all of the API tested and covered.

After that, we can do a lot of work on speeding up the time it takes to do this by swapping out some of the backend systems, while being sure the front-facing API remains working

Website formatting breaks

Hi, I don't know if this is an appropriate place to report issues with your website, but if you click on any of the links under "API" except "Introduction" or "Tips and Tricks", it highlights the link above the one you clicked. Also, if you click on any of them except "Introduction", and then scroll back to the top, the menu moves up under the "umd.io" link and the space above the "Introduction" heading disappears.

Cleaning database yields error

Running
docker exec -it umdio_umdio_1 bundle exec rake db:clean
yields
rm: cannot remove './data/mongo': No such file or directory

last time \"t\" parameter must be specified in query string

Running into an issue with many bus-related methods where the resulting response contains a key as follows:

"Error":{"content":"last time \"t\" parameter must be specified in query string","shouldRetry":"false"}

Inaccurate data

Some of the data from past semesters is flat out wront - i.e., compare this with this. There's a clear discrepency here - also see https://api.umd.io/v0/courses/sections?semester=201705&course=CMSC131

There are probably more examples, but this is the state of things right now. I'd be willing to bet it's due to building of various errors and such from the scraper over time, see #63 for some comments on that.

When we deploy the beta endpoint (soon), hopefully the data there will be correct

Ignoring sentences directly after "Prerequisites: ..." and other relationships

If you look at a course like ENEE150, it is ignoring the sentence directly after the sentence with "Prerequisite:" in it. But the sentence after is a continuation of the prerequisites.

Special Gen Ed cases

Looks like the current method of parsing the Gen Eds for a course just check for Gen Ed codes separated by commas. When a course's gen ed requirements look like "DSNS or DSSP", this causes the gen ed to be parsed as "DSNSDSSP". I would start working on a fix, but I'm not exactly sure how the "or" case should be handled. Just adding it to the Gen Ed array wouldn't be very clear (is it DSNS, but I think adding "ABCD or WXYZ" means the course wouldn't appear in the search results if you were searching something like /courses?gen_ed=ABCD. There are even weirder situations like "DSNL (if taken with WXYZ123) or DSNS in the case of AOSC200. Maybe just split by commas, store everything in an array, and do a text search instead of the standard mongoDB search?

Unable to get request headers in JavaScript

getAllResponseHeaders() only returns Content-Type and Cache-Control.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Document</title>
</head>
<body>
  <script type="text/javascript">
    var request = new XMLHttpRequest();
    request.open('GET', 'http://api.umd.io/v0/courses', true);

    request.onload = function() {
      if (request.status >= 200 && request.status < 400) {
        // Success!
        var data = JSON.parse(request.responseText);
        console.log(request.getAllResponseHeaders());
    };
    request.send();  
  </script>
</body>
</html>

Thanks to @IsabellaS09 for finding this.

Consider adding a professor endpoint

Would be awesome if this api provided data on current umd professors, what courses they're teaching this semester, and reviews (which could be pulled from OurUMD or ratemyprofessor). Could look into this myself later, when I have time. Just leaving a note.

Gened error

AOSC200's geneds are: DSNL (if taken with AOSC201) or DSNS, SCIS

I think umd.io has problems scraping that. This is what the API says the geneds for that course are: ["DSNL(fkwhAOSC201)DSNS","SCIS"]

https://api.umd.io/v0/courses?semester=201808&page=6

Outdated Data

The course data doesn't yet include fall 2016's schedule of classes. (Querying http://api.umd.io/v0/courses?semester>201601 returns an empty array)

Also: would it be possible to set up the course scrapers to run on a daily (or more frequently) basis? It would be useful to have live data on open seats and the waitlist for each class.

Inaccurate messages in sections_scraper.rb

Around line 101 there's a message that keeps track of sections processed. Ideally it should keep track of how many courses are remaining and calculate from there but as of now there's a hard-coded value.

errors at api.umd.io/v0/bus/routes

in http://api.umd.io/v0/bus/routes, every bus route has a double entry EXCEPT:

125 Campus Circulator
129 Franklin Park at Greenbelt Station
130 Greenbelt
137 IKEA
138 Greenbelt
139 NASA Goddard
140 UMB Law
140 UMB Law
141 Gaithersburg Park & Ride
141 Gaithersburg Park & Ride

All other entries need to have an instance of it itself removed.
Also many bus routes are deprecated but still show up such as the:

125 Campus Circulator
139 NASA Goddard
138 Greenbelt (is now the 143 Greenbelt)

Professor scraper should ignore TBA

If a course's section does not have a professor yet the name is Instructor: TBA. The scraper treats this as a regular professor name, but should probably be ignored. See http://api.umd.io/v0/professors?name=Instructor:%20TBA

Courses without description blocks missing?

Courses like PHIL209E cannot be found by the API.

Course_scraper.rb should be updated to scrape should get all available semesters from testudo.

Currently the courser_scraper.rb guesses which semesters are available to be scraped based on the current month and year. Instead it shouldn't have to guess which semesters are available, it should switch to scraping the term drop down menu in testudo schedule of classes to determine which semesters should be scraped.

500 Internal Server Error when using encoded URLs

The server returns a 500 status code when certain characters are used in the URL after being encoded using PHP's urlencode function (or any similar functionality).

For example: The call to http://api.umd.io/v0/courses?credits=2,3& is valid; however, the equivalent URL encoded call to http://api.umd.io/v0/courses?credits%3D2%2C3%26 fails.

DNS issues

When I was setting up HTTPS, there were some issues concerning www.umd.io, specifically the way the DNS is configured. Right now, it's not an issue, as it just seems to redirect at a DNS level, but I wanted to make sure this is intended and fine

Also, I'd like to request a beta.umd.io domain to test things on the production server with, without breaking what we have

@rrcobb

Data migration

With the upcoming switch to docker, and planned changes in data storage (mongo -> postgres), we're at risk of losing some of the old course data that doesn't seem to be scrapable anymore. i.e., 2015 data is not on testudo anymore.

In addition, due to older versions of the scraper, some of the older semester class listings on umd.io are missing a lot of data.

I think in the upgrade, we should just get rid of the old data we can't access anymore; create a dump of the existing mongo db and have that downloadable for anyone who wants the data that is there, and going forward look into long-term ways to make sure course data is accessible even when it leaves testudo (possibly using archive.org?)

Some professors have two spaces in their name

Example: https://bapi.umd.io/v0/professors?name=Daniel%20%20Contreras

Although this is an error on testudo's side, we should probably filter this on our end

Enhance map scraper with GIS data

Right now, the buildings scraper just loads JSON that contains building names, codes, and locations. However, this is in no way dynamic, and has to be updated manually. If we wanted to add addresses, for example, it'd be hard.

I've done some research already on how we can go about getting this data -

http://maps.umd.edu/arcgis/rest/services is the service we need and already know about. After some digging, https://maps.umd.edu/arcgis/rest/services/Layers/CampusMapDefault/MapServer/find in specific is what we can use to query for the data. All we need is a building name/code, and to put "BuildingPoint" as the layer. For example, the url
https://maps.umd.edu/arcgis/rest/services/Layers/CampusMapDefault/MapServer/find?searchText=esj&contains=true&searchFields=&sr=&layers=BuildingPoint&layerDefs=&returnGeometry=true&maxAllowableOffset=&geometryPrecision=&dynamicLayers=&returnZ=false&returnM=false&gdbVersion=&returnUnformattedValues=false&returnFieldName=false&datumTransformations=&layerParameterValues=&mapRangeValues=&layerRangeValues=&f=json

gives us the data for ESJ. I haven't quite figured out how to get a list of all the buildings, as it doesn't allow the search text to be empty. It does accept a space, but I need to do some research to see if there are any buildings that don't have a space in their name. We should also figure out what data from here we want to gather and display, and adjust the query parameters accordingly. From there, it's just grabbing the data and storing it.

Unexpected in Courses endpoint

Sending a GET request to http://api.umd.io/v0/courses returns results that start in the Cs.

[  
   {  
      "course_id":"CPPL100",
      "name":"College Park Scholars: Public Leadership First-Year Colloquium I",
      "dept_id":"CPPL",
      "department":"College Park Scholars-Public Leadership",
      "semester":"201508",
      "credits":"1",
      ...
]

As default sort is ascending, I would expect it to start at A, as in the documentation.

Also, a GET request to http://api.umd.io/v0/courses with no params, according to the documentation, should return an Array of objects with three properties: course_id, name, and department.

[
  {
    "course_id": "AASP100",
    "name": "Introduction to African American Studies",
    "department": "African American Studies"
  },
  {
    "course_id": "AASP101",
    "name": "Public Policy and the Black Community",
    "department":"African American Studies"
  }
]

Instead, an array of full Course objects are returned. Is this intended?

Setup Docker to run in production

The current Docker system is setup for development. We need to figure out a plan to move away from vagrant and deploy the app in production with Docker.

Data Entry - Scraping University Resources vs Making Custom Frontend

You guys are doing a great job with the api!

Looked through the ideas doc- curious about your thoughts on continuing to serve data by scraping university resources, vs building a frontend for keeping track of the new data types you're proposing (assuming they're not already available on the web to be scraped).

This is something I'm working on up north as well- currently just using google spreadsheets as the site of data entry. I just think you guys are laying down a really good boilerplate on rails and am curious whether you've seen any good data entry frontends for university data applications.

Whitespace issue

Course section AOSC652-AM01 instructors has weird whitespace issues. Instead of two instructors, the course returns a single instructor.

See http://api.umd.io/v0/courses/AOSC652/sections

Dead Links

Here's all the dead links I could find in the project.

https://github.com/umdio/umdio/blob/master/docs/src/api.md

http://www.robcobb.me/2015/04/14/why-umdio.html (Luckily, on the wayback machine at https://web.archive.org/web/20160811182733/http://www.robcobb.me/2015/04/14/why-umdio.html)

https://github.com/umdio/umdio/blob/master/CONTRIBUTING.md#read

http://www.robcobb.me/2015/04/14/why-umdio.html (See above)
http://github.com/umdio/umdio/blob/master/Contributing.md (Just needs to be changed to the updated link)
[todo]({% post_url Todo %}) (This should probably point to https://github.com/umdio/umdio/blob/master/Todo.md)

https://github.com/umdio/umdio/blob/master/docs/src/bus.md

http://api-portal.anypoint.mulesoft.com/nextbus/api/nextbus-api (The code seems to use the correct link https://github.com/umdio/umdio/blob/master/app/scrapers/bus_schedules_scraper.rb#L22)

Generally, all of these are super quick to update, the question is just what do we want to point http://www.robcobb.me/2015/04/14/why-umdio.html to? Possibly add it as another MD file in the Github?

Issues when installing via Vagrant

Here's a paste of the output from attempting vagrant up.

bundle exec rake up still works and I can access the API through localhost:3000. However, navigating any endpoint doesn't return the expected data.

Jekyll serve not running

From @rstumbaugh:
I still can't seem to get jekyll running on my machine. I'm running jekyll serve from the docs/ folder, but it looks like the _config.yml is in the src/ folder. Even when I put the config file in the docs/ folder, jekyll can't find any of the posts and the Sass doesn't get compiled. Any ideas?

Bus Scraper Dupes

Follow-up to #35

Somehow, a lot of duplicate routes got into the bus collection. We should figure out how this happened, and how to avoid it in the future.

Certain methods do not support ?page and ?per_page parameters

The http://api.umd.io/v0/courses/list method does not support ?page or ?per_page parameters. Per the documentation, it seems like, since this endpoint returns a large number of items (4261 for 2015 spring semester) that it should support pagination.

Not returning errors

Hey so far it's got great response times and working out great. However, i did find a little bug. Looking up courses that sections dont exist returns a 200 null. For example,

http://api.umd.io/v0/courses/ENES100 returns 200 with valid json
http://api.umd.io/v0/courses/ENES101 returns 200 with null

When it should return 400 with an error code.

/professors docs are incomplete

If you look at https://md.io/professors/, it doesn't fully document the professors endpoint. Namely, it makes no mention of using the ?name= or the ?course= parameter

See
https://github.com/umdio/umdio/blob/master/app/controllers/professors_controller.rb
https://github.com/umdio/umdio/tree/master/docs/src/professors/_posts

Professors API doesn't sort by department?

http://api.umd.io/v0/professors?department=CMSC,BMGT&sort=-name returns an empty array, [] instead of the list of professors in the CMSC and BMGT department (sorted by name) (note that this is the example provided on the website).

With some further testing, here's a table with some examples of the parameters that do and do not work for the professors API:

Call	Works?
http://api.umd.io/v0/professors?department=CMSC	FALSE
http://api.umd.io/v0/professors?semester=201801	TRUE
http://api.umd.io/v0/professors?sort=name	TRUE
http://api.umd.io/v0/professors?semester=201801&sort=name	TRUE
http://api.umd.io/v0/professors?department=CMSC&sort=name	FALSE
http://api.umd.io/v0/professors?department=CMSC&semester=201801	FALSE
http://api.umd.io/v0/professors?semester=201801&department=CMSC&sort=name	FALSE

The Call column represents the GET HTTP request made, and the Works column indicates whether or not the call returned anything

It appears that filtering out professors by which department doesn't work as intended.

Multiple courses picked up as one

On this page: http://api.umd.io/v0/courses?page=91

There is a course_id: "GVPT388WGVPT389GVPT390GVPT396GVPT397GVPT409CGVPT409EGVPT409HGVPT409IGVPT409OGVPT421GVPT423HGVPT429EGVPT432GVPT439AGVPT454GVPT456GVPT459EGVPT459MGVPT459OGVPT459RGVPT473GVPT479DGVPT479LGVPT479PGVPT479YGVPT482HGVPT722GVPT729MGVPT743GVPT771GVPT799GVPT808BGVPT808CGVPT831GVPT873GVPT888BGVPT888EGVPT898GVPT899"

It appears that the rest of the fields are just a concatenation of the information for all of those courses.

Enable HTTPS

Many browsers will block mixed content, so it would be ideal if the API could serve over HTTPS so it plays nicely with browsers and we aren't telling users to turn off their security settings to use a site.

Add Winter 2017 and Spring 2018 Semesters to umd.io database

Both semesters are available on the schedule of classes.

Missing data for Spring 2017 (201701)

There is no course info for the spring semester this year.

Needs Spring 2017

Hi team, I find your service very useful. It needs the 2017 Spring data.

Restrictions field for courses

It looks like the restriction is being added twice.

Malformed JSON response to buildings list query

When i query the buildings list, javascript's standard JSON.parse fails due to an unexpected comma, with the error:

undefined:1
88","lat":"38.9902169"},{"name":"J.H. Kehoe Track and Ludwig Field",,"code":""
                                                                    ^
SyntaxError: Unexpected token ,
  at Object.parse (native)
  at IncomingMessage.<anonymous> (/home/adrusi/Code/umdio/index.coffee:19:24)
  at IncomingMessage.emit (events.js:129:20)
  at _stream_readable.js:908:16
  at process._tickCallback (node.js:355:11)

200 status returned with a 404 error code

For the following 404 examples in the documentation,

http://api.umd.io/v5/courses
http://api.umd.io/courses
http://api.umd.io/v0/course
http://api.umd.io/v0/courses/ENGL115

the status code returned is 200 OK. However, the JSON key error_code returns:

error_code : 404

Since for the example for a 400 error,

http://api.umd.io/v0/courses/ene10

returns a 400 status code along with the JSON key error_code

error_code : 400

should this be the same for 404 errors? i.e. a 404 status code being returned as well?

Unable to run docs server locally

I've tried to run the docs server locally, but there seems to be a problem with the configuration with regards to the styles. Executing the make file results in:
jekyll build -s src/ -d public/ Configuration file: src/_config.yml Source: src/ Destination: public/ Incremental build: disabled. Enable with --incremental Generating... jekyll 3.6.0 | Error: No such file or directory @ rb_sysopen - /home/nick/dev/web/umdio/docs/src/css/main.scss

Use proxy for bus requests

Yo! Umd.io is looking sweet, saw that you guys are working on the Bus API and it looks pretty cool. One thing I noticed with a43102b is that there's another request on top of the request to umd.io, i think for speed you should add nginx as a reverse proxy to the bus API. For example, i set up this server on DO.

http://45.55.130.81/bus/?a=umd&command=routeList

You could add you domain name, and basically forward any request from http://umd.io/bus --> http://webservices.nextbus.com/.

Here's the nginx config i used, it's really simple and effective.

location /bus/ {
    proxy_pass http://webservices.nextbus.com/service/publicJSONFeed;
}

Otherwise, things look awesome, keep up the good work.

Issue with bus locations

Getting an interesting error:

{"lastTime":{"time":"0"},"copyright":"All data copyright University of Maryland 2015.","Error":{"content":"last time \"t\" parameter must be specified in query string","shouldRetry":"false"}}

from visiting http://api.umd.io/v0/bus/routes/115/locations per http://umd.io/bus/#route_locations

Scrape clubs

List of clubs is at: http://orgsync.umd.edu/browse_student_organizations

It'd be great to at least scrape the name and orgsync link, as well as the description (though that might require using a headless browser)

Inaccurate and conflicting documentation for courses

On the main page of your website, under "Sample requests and responses", you have:

http://api.umd.io/v0/courses/ENGL115
200: Returns empty braces, because ENGL115 is not a real course. If your app thinks it’ll get a real response here, it might break.

Which conflict with the documentation under "Courses", which says:

Returns: The course object specified, or null

Which also conflicts with the actual response returned by "http://api.umd.io/v0/courses/ENGL115", which is:

{"error_code":404,"message":"Course with course_id ENGL115 not found!","available_courses":"http://api.umd.io/v0/courses","docs":"http://umd.io/courses/"}