Giter Site home page Giter Site logo

slo_scraper's People

Contributors

clopez02 avatar demnusoic avatar ibvandersluis avatar jboydt avatar kadenhurlimann avatar mechami avatar

Watchers

 avatar  avatar  avatar  avatar

slo_scraper's Issues

Alternative courses need formatting

When scraping courses related to a program, some courses are alternatives and prefixed with 'or' eg. 'ART 2' and 'or ART 2H' in the Graphic design program.

Possible fix:
1. Text processing to normalize the format of alternative courses eg. 'or ART 2H' becomes 'ART 2H'.
2. Create a self referential relationship on the programs_courses table. Each programs_courses entry is a required course in the program, a required course can have many alternative courses.

Courses Dictionary causes an error

In PLODB, adding the courses list as a list of dictionaries in the plo_data dictionary
causes 'TypeError: sequence item 0: expected str instance, dict found' for any SQL query

Workaround:
The dictionaries are separated before insertion.

Programs with identical outcomes

Identical CERT programs or alternate degrees exist for many programs, which contain the same outcomes (Certificate of Achievement in Early Childhood Education, AS-T Degree in Early Childhood Education). Currently the CERT outcomes are stored separately from the degree outcomes, but if they are always identical and assessed at the same time then this could lead to data anomalies. Some programs also share an outcome, but have other differing outcomes (AS Degree in Agricultural Business, AS Degree in Agriculture Business).

Possible fix:
The relationship between programs and outcomes could become many to many.

Programs can share outcomes, but each degree requires different courses, so a separate   program entry is required for each degree.

A program can have many outcomes; an outcome can be in many programs. A new entry will be created in a join table for each of a program's outcomes.

ex.
-
Early Childhood Education (AS-T) is inserted into the database

When Early Childhood Education (CERT) is inserted into the database, 
1. Insert Early Childhood Education (CERT) into the program table
2. Insert related outcomes into the outcome table, since they were already inserted along with the 
	AS-T, the duplicates will be ignored. Any new outcomes will be inserted.
3. Create a join table entry between the program and each outcome

-

To create a solution, it will need to be determined if outcomes are assessed for individual programs or for all programs that contain that outcome.

Conserving Manual Database Additions

The items in the database can be deleted, edited, and/or created by select users. The issue arises is that theses changes are not conserved when the slo-scraper executes again. Re-acquiring information via scraper will not keep any manual changes or additions to the database. May need to add a special tag that differentiates modified scraped items and manually created new items so they are not deleted from the database or they can be removed, but then re-added after re-scraping via execution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.