Giter Site home page Giter Site logo

rooday / course-2017-fal-proj Goto Github PK

View Code? Open in Web Editor NEW

This project forked from data-mechanics/course-2017-fal-proj

1.0 1.0 0.0 48.36 MB

Projects developed during the Fall 2017 iteration of the Data Mechanics course at Boston University.

Python 84.63% JavaScript 0.39% HTML 14.81% CSS 0.18%

course-2017-fal-proj's Introduction

course-2017-fal-proj

Joint repository for the collection of student course projects in the Fall 2017 iteration of the Data Mechanics course at Boston University.

In this project, you will implement platform components that can obtain a some data sets from web services of your choice, and platform components that combine these data sets into at least two additional derived data sets. These components will interct with the backend repository by inserting and retrieving data sets as necessary. They will also satisfy a standard interface by supporting specified capabilities (such as generation of dependency information and provenance records).

This project description will be updated as we continue work on the infrastructure.

MongoDB infrastructure

Setting up

We have committed setup scripts for a MongoDB database that will set up the database and collection management functions that ensure users sharing the project data repository can read everyone's collections but can only write to their own collections. Once you have installed your MongoDB instance, you can prepare it by first starting mongod without authentication:

mongod --dbpath "<your_db_path>"

If you're setting up after previously running setup.js, you may want to reset (i.e., delete) the repository as follows.

mongo reset.js

Next, make sure your user directories (e.g., alice_bob if Alice and Bob are working together on a team) are present in the same location as the setup.js script, open a separate terminal window, and run the script:

mongo setup.js

Your MongoDB instance should now be ready. Stop mongod and restart it, enabling authentication with the --auth option:

mongod --auth --dbpath "<your_db_path>"

Working on data sets with authentication

With authentication enabled, you can start mongo on the repository (called repo by default) with your user credentials:

mongo repo -u alice_bob -p alice_bob --authenticationDatabase "repo"

However, you should be unable to create new collections using db.createCollection() in the default repo database created for this project:

> db.createCollection("EXAMPLE");
{
  "ok" : 0,
  "errmsg" : "not authorized on repo to execute command { create: \"EXAMPLE\" }",
  "code" : 13
}

Instead, load the server-side functions so that you can use the customized createCollection() function, which creates a collection that can be read by everyone but written only by you:

> db.loadServerScripts();
> var EXAMPLE = createCollection("EXAMPLE");

Notice that this function also prefixes the user name to the name of the collection (unless the prefix is already present in the name supplied to the function).

> EXAMPLE
alice_bob.EXAMPLE
> db.alice_bob.EXAMPLE.insert({value:123})
WriteResult({ "nInserted" : 1 })
> db.alice_bob.EXAMPLE.find()
{ "_id" : ObjectId("56b7adef3503ebd45080bd87"), "value" : 123 }

If you do not want to run db.loadServerScripts() every time you open a new terminal, you can use a .mongorc.js file in your home directory to store any commands or calls you want issued whenever you run mongo.

Other required libraries and tools

You will need the latest versions of the PROV, DML, and Protoql Python libraries. If you have pip installed, the following should install the latest versions automatically:

pip install prov --upgrade --no-cache-dir
pip install dml --upgrade --no-cache-dir
pip install protoql --upgrade --no-cache-dir

If you are having trouble installing lxml in a Windows environment, you could try retrieving it here.

Note that you may need to use python -m pip install <library> to avoid issues if you have multiple versions of pip and Python on your system.

Formatting the auth.json file

The auth.json file should remain empty and should not be submitted. When you are running your algorithms, you should use the file to store your credentials for any third-party data resources, APIs, services, or repositories that you use. An example of the contents you might store in your auth.json file is as follows:

{
    "services": {
        "cityofbostondataportal": {
            "service": "https://data.cityofboston.gov/",
            "username": "[email protected]",
            "token": "XxXXXXxXxXxXxxXXXXxxXxXxX",
            "key": "xxXxXXXXXXxxXXXxXXXXXXxxXxxxxXXxXxxX"
        },
        "mbtadeveloperportal": {
            "service": "http://realtime.mbta.com/",
            "username": "alice_bob",
            "token": "XxXX-XXxxXXxXxXXxXxX_x",
            "key": "XxXX-XXxxXXxXxXXxXxx_x"
        }
    }
}

To access the contents of the auth.json file after you have loaded the dml library, use dml.auth.

Running the execution script for a contributed project.

To execute all the algorithms for a particular contributor (e.g., alice_bob) in an order that respects their explicitly specified data flow dependencies, you can run the following from the root directory:

python execute.py alice_bob

To execute the algorithms for a particular contributor in trial mode, use the -t or --trial option:

python execute.py alice_bob --trial

course-2017-fal-proj's People

Contributors

agdsouza avatar aquan6 avatar bgaudiosi avatar bohorqux avatar bursteinalan avatar erj826 avatar esaracin avatar franciszamora avatar jadotis avatar jbloomfeld avatar jdbrawn avatar jesusfranco1 avatar jmcaloon avatar lapets avatar mfyc avatar nathan-weinberg avatar ngeeslin avatar rfballesteros avatar rocksdan avatar rooday avatar sameena-bajwa avatar shreyapandit avatar slarbibu avatar stevenbrz avatar ticzu avatar tpotye avatar wenjun666 avatar whyjun avatar yidingou avatar zalacheung avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.