Giter Site home page Giter Site logo

zooniverse / panoptes Goto Github PK

View Code? Open in Web Editor NEW
102.0 31.0 41.0 10.21 MB

Zooniverse API to support user defined volunteer research projects

License: Apache License 2.0

Ruby 88.51% JavaScript 0.11% CSS 0.31% HTML 3.18% Shell 0.05% API Blueprint 7.73% Dockerfile 0.10%
zooniverse panoptes-platform ruby docker hacktoberfest

panoptes's Issues

User validations

Just thinking ahead to the migration of zooniverse users to panoptes...

Currently, we validate:

validates :login, presence: true, uniqueness: { case_sensitive: false }
validates :email, presence: true, uniqueness: { case_sensitive: false }
validates :password, length: { minimum: 4 }, confirmation: true

Devise is requiring a password length of 8 -- which is fine, but will require all users with shorter passwords to reset them.
We're also allowing case sensitivity on login and email (e.g. parrish and Parrish are unique) -- I'll likely change this shortly.

Allow a project's owner to add a list of Subject Id's for expert classification

After chatting with @chrislintott and @aliburchard the need to allow a set of subjects be marked for expert review.

E.g. say a project owner is realising that 2% of subjects aren't reaching consensus (say 80% consensus) due to interesting features in the images. It is then decided that these 2% need to have a projects' expert classifiers classify these images when they logon.

I was thinking about an array of subject Zooniverse Id's on the project model that Cellect picks up and serves to a project's expert classifier (assuming they have been setup) when they next classify on the project. As per normal the subject seen id would be tracked and the expert wouldn't see this again. Once they have completed the list they would revert back to normal subject selection. @parrish thoughts on Cellect and how this might work?

@stuartlynn this might tie into SOCS (spelling?) work of per user subject sets to classify.

remove URI_names for users and user groups

Remove this model in favour of user logins and user_group names.

Url's will have escaped chars if needed perhaps forcing login / group_name validations for new signups.

Build science team / project owner tools

Science team / project owner may want the ability to:

  • Download a list of all subject metadata
  • Update subject metadata
  • Download current consensus / raw classifications
  • Perform ad-hoc analysis of current classifcations
    • Spark, Hive & Hadoop, RedShift, custom code?
    • This would depend on how we build the consensus engine.

@aliburchard, did I cover it all?

What happens when a user deletes their account or a project is deleted?

Hypothetically, lets say a user logs in and set's up an account and the following happens.

  • They classify across the Zooniverse on quite a few different projects and participate in talk conversations.
  • They then spin up their own project to look for Aphids species in UK beer gardens by uploading a series of macro pictures of beer garden leaf litter.
    After a while they finish their aphid quest and their project comes to a conclusion.
  • They then leave the site for quite some time and months later they decide to delete their account as they have lost interest (somehow).

The current data model proposes to keep all classifications, and subjects but delete the associated records (follow the black diamonds). We should specify this in the Terms of Service when creating an account.

So the real question is how do we handle User account and Project deletions?

  • What does this mean for the associated records across the Zooniverse?
  • Do we remove everything but subjects and classifications?
  • Do we Archive and mark as deleted?
  • What legal obligations do we have?

Not suggesting they are the authority but GitHub delete everything 'apparently', https://help.github.com/articles/deleting-your-user-account.
Others:
https://support.google.com/accounts/answer/61177?hl=en
http://en-gb.facebook.com/help/224562897555674

Thoughts all?

@chrislintott

Slow index routes on large tables

RestPack uses Kaminari for pagination which counts all the rows of a table to determine the number of pages. It's a known issue of slow counts in PostgreSQL and Kaminari. I think we'll have to look at using an estimated count and submitting a PR for kaminari.

Started GET "/api/classifications" for 127.0.0.1 at 2014-08-05 11:31:38 +0100
Processing by Api::V1::ClassificationsController#index as JSON_API
  Classification Load (0.5ms)  SELECT  "classifications".* FROM "classifications"  LIMIT 20 OFFSET 0
   (24957.0ms)  SELECT COUNT(*) FROM "classifications"
Completed 200 OK in 25003ms (Views: 3.3ms | ActiveRecord: 24959.7ms)


Started GET "/api/classifications" for 127.0.0.1 at 2014-08-05 11:32:12 +0100
Processing by Api::V1::ClassificationsController#index as JSON_API
  Classification Load (0.7ms)  SELECT  "classifications".* FROM "classifications"  LIMIT 20 OFFSET 0
   (1362.8ms)  SELECT COUNT(*) FROM "classifications"
Completed 200 OK in 1372ms (Views: 2.7ms | ActiveRecord: 1363.5ms)


Started GET "/api/classifications" for 127.0.0.1 at 2014-08-05 11:41:00 +0100
Processing by Api::V1::ClassificationsController#index as JSON_API
  Classification Load (0.4ms)  SELECT  "classifications".* FROM "classifications"  LIMIT 20 OFFSET 0
   (1226.8ms)  SELECT COUNT(*) FROM "classifications"
Completed 200 OK in 1245ms (Views: 12.6ms | ActiveRecord: 1227.1ms)

Project Content

What types of content do we need to store for project pages? I can think of

  • Description
  • Team
  • Science Case
  • Publications?

Anything else?

Optimize DB indexes

Ensure we're using compound indexes where possible and removing uneccessary ones. I think the foreign keys should have indexes.

@parrish did you want to have a look at this?

Standardize JSON error response formats

JSON API doesn't specify a error response format. We currently return an error message in the response body but it's not standard yet, e.g. here and here.

In keeping with JSON API format I think our error responses should return an array of errors, each with it's own self referencing message and possibly internal code (rate limiting, API version deprecation, etc).

{ "errors": [ { "message": "record not found",  "internal_code": XXX } ] }

Online examples are:

add specs for oauth applications controller actions

Spec this controller. https://github.com/zooniverse/Panoptes/blob/master/app/controllers/oauth/applications_controller.rb

It should only respond to JSON requests and doorkeeper should be protecting it instead of devise authentication. I'm not 100% on what this is for, @edpaget, can you spec it out?

Starting point:

require 'spec_helper'

describe Oauth::ApplicationsController, type: :controller do

  before(:each) do
    sign_in create(:user)
  end

  describe "#index", :focus do
    it "should list the applications" do
      get :index
      expect(response.status).to eq(200)
    end
  end
end

OAuth Application Trust Levels

It seems like OAuth applications should have different trust levels which restrict what types of scopes they're allowed to request and what types of oauth grants they can use, basically

  • Zooniverse applications, all scopes, user can't reject scope grants
    • Grants: Password, Session
    • Scopes: All
  • Third-party trusted clients, apps that are theoretically capable of keeping secrets, they only make api requests from their backends
    • Grants: Authorization, Client
    • Scopes: All but user
  • Third-party untrusted clients, apps that can't keep secrets like JS app or mobile apps
    • Grants: Implicit
    • Scopes: Public, Project, Send-Classification

Apps that request scopes or attempt to use grant types they're not allowed to use will have a 403 Forbidden response returned.

Translation Endpoints

We need to create endpoints for working with translations, updating them, etc.

There should be an endpoint with the form /project_content/:content_id that will allow editing of project content and related workflow content in a single language. It should be restricted to only allow project translators and project devs access.

Translated content still be delivered to users through the normal /project/:project_id end point.

How will email work?

How will users be messaged about projects? Will they be at all at a project level? I assume we will use Amazon SES but if so, we need to work out what it will cost.

@mrniaboc and I are talking newsletters and messaging and this came up.

Fix serialised JSON representation of owner polymorphic links

As the owner relationship is polymorphic and RestPack Serializer doesn't embed the resource class context the owner routes will be useless.

I've got a PR with RestPack serializer but haven't gotten around to implementing anyting there.

My plan is to introspect on the belongs_to association and if it's polymorphic then represent this link as a resource object (note the href override link), e.g.

{
  "id": "1",
  "links": {
    "owner": {
      "href": "http://api.panoptes.org/api/groups/17",
      "id": "17",
      "type": "user_group"
    }
  }
}

Workflow Content

To elaborate on what I mentioned in the standup and in #74. I think translated strings for task questions/answers/instructions need to be separate from the ProjectContent. This enables people who copy a workflow into their project to benefit from any translations for that workflow that may already exist. So, even if the rest of the content on their site hasn't been translated yet, the basic question/answer part will be.

Panoptes Routes

We talked about this a bit, and I think the answer is going to be related to #7. @brian-c has ideas.

I think if we're building to serve content from Rails it makes sense to drop explicit resource names from the first bits of the url so we end up with something like this:

/:(user_name|group_name)/:project_name/
/:(user_name|group_name)/collections/:collection_name
/:(user_name|group_name)/:project_name/workflows/
/:(user_name|group_name)/:project_name/workflows/:workflow_id/subject_sets

And maybe have a provision for serving some projects directly as /:project_name like zooniverse.org/galaxy_zoo for instance.

Otherwise if this is just serving JSON urls could look like

/users/:user_name
/collections/(:user_name|:group_name)/:collection_name
/projects/(:user_name|:group_name)/:project_name

Any one have any other thoughts about this?

Explicit link between a subject and the project they are imported for.

Do we want to track which project a subject is created for? If a subject is shared between multiple projects, there is no mechanism to ascertain the original project link. If a subject is not shared we could join on the SubjectSet / Project.

Use case example although I am not sure we want to do this but hypothetically say we want to allow a admin / project owner to remove their subjects. E.g. what about graphic images being used across multiple projects and a user reports them to us.

Thoughts?

Add opt in/out user communication preferences

Related to #67 and #73.

Add functionality to capture a user's preference to opt in/out to global and per-project communications.

Perhaps there are two options

  1. Opt-in/out to all communications
  2. Opt-in/out to any participating project communications.
  • I.e. If they opt-in then any time they classify on a project it adds this the list of communications.

Finally add the ability to list and update their global and per-project preferences.

Post OAuth Sign Up flow

We need to determine a couple things after a user signs up using an OAuth provider:

  • Newsletter Subscriptions
  • Users's url - In #92 we just downcase and replace spaces in the display name
  • Users's display name - in #92 we just use their name on the service

The latter two could be pre-filled from the provider's responses, but we should probably still allow some customization.

First Party Authentication

I think the best thing to do to handle first part authentication is to override the devise controllers to allow JSON-based login and logout, user creation, and password reset, like @camallen started doing here. We'll want to move those methods from the API namespace since we don't want to allow direct third-party access to them.

This will also play nicely with OmniAuth logins, in fact it'll be pretty easy to communicate to the frontend all the possible ways you could login with it.

GET /users/sign_in
Content-Type: application/json

{
  login: "/users/sign_in"
  facebook: "/users/auth/facebook",
  google: "/users/auth/google"
}

Once a user has a created devise session we can retrieve tokens for a limited set off OAuth applications (only Zooniverse owned ones), by creating a /users/tokens endpoint that accepts a POST request with a single client_id: field that responds with the logged in user's token for the application. This way Panoptes Front End willl only need to have its client_id and not its application secret in code.

You could think of this as similar to the Resource Owner Credential Workflow, but restricted to only First-Party applications.

I think it'd be nice to eliminate the token endpoint and just return the token after login, but I can't think of a way to do that from OmniAuth callbacks since they only use redirects, and we can't attach extra data to the response.

In summation, logging into the Panoptes Front end would look like this:

  • User is presented with a Username/Password login form, and a list of other auth providers (Facebook. G+, etc)
  • User submits their credentials for authenticates through Facebook, (G+, etc)
  • Panoptes creates a devise session cookie for the Authenticated user
  • Panoptes returns the user's info the Front End
  • The Front End requests a token for the user
  • Panoptes creates or returns a token for the Front application and the current devise user
  • The Front End can make authorized OAuth requests against the protected portions of the Panoptes API

To the user that looks like "I click log-in....I'm logged in!"

Subject counts on indirecty associated models

Re Ed's comment in #15 about counting subject.

We can't use Rails cached_counters here as there are no associations directly between these models. If we really want them we should just craft some SQL to get the counts via the linked / join tables.

CanCanCan?

Hey all since CanCan is no longer maintained. If we're happy with it's Authorization model, would anyone have a problem with using the updated version CanCanCan?

Also there are a couple alternatives I've seen mentioned, but have no looked into:

Apiary

It looks really sweet. I think we should use it. I'm going to write some docs and open a pull request for them in a bit.

How do we want to document?

@camallen and I were talking before the standup and I think it's clear we need to document how parts of the Panoptes model fit together. How do we want to do that?

I see three choices:

  • Inline code docs
  • Super explicit RSpec tests
  • A master reference document

I don't think any of these are mutually exclusive, but is there a particular format that would make it easier for people so we can focus efforts?

Nameable concern

Creation on assignment leads to lots of atomicity problems.
We probably should be building the record and creating within a transaction.

revoke / reject a doorkeeper token if a user is disabled

Look like doorkeeper doesn't provide hooks to add custom checks around token validation, @edpaget is this correct?

Scenario:

  1. A user logs into the system and gets a token.
  2. User then deletes their account.
  3. Client doesn't forget the token (we can control our client behaviour but not other clients, if we allow them).
  4. Client then uses valid token for disabled user account to access non-authorised resources (token will pass doorkeeper but will fail on any pundit authorization).

The only thing i can think off is to enable hooks in doorkeeper OR revoke the token on user delete. E.g. https://github.com/doorkeeper-gem/doorkeeper/blob/master/app/controllers/doorkeeper/tokens_controller.rb

It'd be better to rely on the doorkeeper end point implementation rather than duplicating code.

Any thoughts?

determine how omniauth (FB, Twitter, Github, etc) will work for user signups

When creating an account, Panoptes requires a user to have a login, email and uri_name. What will happen when a user signs up using an omniauth provider but doesn't alllow access to the required details?

  • Will we prompt them for the details before allowing them to complete the sign up process?
    • This will hinder sign ups and leave an account in limbo?

Do we allow users to create a half completed account?

@mrniaboc really wants to have confirmed email addresses. Perhaps an alternative strategy is to create a confirm signup link without logging the user into the system, e.g. https://github.com/plataformatec/devise/wiki/How-To:-Override-confirmations-so-users-can-pick-their-own-passwords-as-part-of-confirmation-activation

Thoughts?

Track subject metadata changes after ingestion into the system

After discussions with @chrislintott and @aliburchard we decided we need to track changes to subject metadata to ensure classification can be aligned to any decisions based on subject metadata.

E.g. Say a user uploads a set of images from a camera trap and the readymade decision tree shows certain images based on subject metadata. If the metadata is updated later due to camera failure / fieldworker error then we need to be able to link any classification to the subject at that time.

Accordingly it was decided that we need to version subject metadata changes through time. Perhaps via https://github.com/airblade/paper_trail or a custom JSON field on the subject using nested data.

Thoughts?
Cam

ENTRYPOINT for Dockerfile

At the moment the docker image doesn't do anything when you run it. What command should it be running to actually run Panoptes?

How are we building the front end?

@brian-c do you have an opinions on how we should do this? I assume this application will also end up serving things like the new Zoo-Home and User profiles.

Should we use normal rails views and turbolinks for content sections then load readymade/youniverse, for pages they're needed? Or are you planning to build a single application that consumes from Panoptes?

What's a User?

Currently a user signs up with:

  • Username
  • Password
  • Real name (for recognition in papers, etc.)
  • "Send me beta announcements"

In #38 @edpaget suggested a separate key for URL. That gives us three "name" properties. Can we use the login for the URL, encoding what needs encoding when calling the API? URLs in the browser should render okay: #/ß®ï@ñ-¢.

GUIDs for Users and Groups

Hey all, this has become a huge point of contention and something we really need to flesh out. Let's talk here, and also try to all have a hangout about it (I can do tomorrow morning even though I'm off if people just want want to get it done with).

I think we all agree that we need a Global Unique Id for Users and Groups, so we can have a nice looking urls in our Client. Its much easier to distribute post cards at conferences that say (just making stuff up here no assumptions about the final URLs) zooniverse.org/zooniverse/galaxy_zoo than zooniverse.org/projects?owner=zooniverse&name=galaxy_zoo

This is what we need to decide:

  • Is a User's a login always going to be their GUID?
  • Is a Group's name always going to their GUID?
  • Are GUIDs changeable?

Depending on how we decide to answer those I think there are a few further questions we should figure out (we can open seperate issues to flesh these out if we need to):

  • At what point is a User or Group's GUID created?
  • What characters are allowed in a GUID?
  • Is there a name blacklist?
  • Who's buying the first round the next time we're all together?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.