dataswift / hat2.0 Goto Github PK

View Code? Open in Web Editor NEW

36.0 8.0 21.0 39.93 MB

The HAT Personal Microserver

Home Page: https://hubofallthings.com

License: GNU Affero General Public License v3.0

Scala 74.27% HTML 24.92% Dockerfile 0.01% JavaScript 0.46% Perl 0.12% Makefile 0.11% Smarty 0.11%

hat database stack akka slick play api scala

hat2.0's Introduction

Hub of All Things

This repository contains an implementation of the Hub-of-All-Things HAT Microserver project.

Releases

The current project version is here.

About the project

The Hub-of-All-Things is a HAT Microserver for individuals to own, control and share their data.

A Personal Microserver (“the HAT”) is a personal single tenant (“the individual self”) technology system that is fully individual self-service, to enable an individual to define a full set of “meta-data” defined as a specific set of personal data, personal preferences and personal behaviour events.

The HAT enables individuals to share the correct information (quality and quantity), with the correct people, in the correct situations for the correct purposes and to gain the benefits.

The HAT microserver is the technology base of the Persona Data Server.

Technology stack

This HAT Microserver implementation is written in Scala (2.12.11) uses the following technology stack:

PostgreSQL relational database (version 9.5 and above)
Play Framework (version 2.6)
Akka (version 2.5)
Slick as the database access layer (version 3.2)

Running the HAT project

HAT runs as a combination of a backing PostgreSQL database (with a public schema for flattened data storage) and a software stack that provides logic to work with the schema using HTTP APIs.

To run it from source in a development environment two sets of tools are required:

PostgreSQL database and utilities
Scala Build Tool (SBT)

1. Get the Source and the submodules for both of the methods

> git clone https://github.com/Hub-of-all-Things/HAT2.0.git
> cd HAT2.0
> git submodule init 
> git submodule update

2. Configure your /etc/hosts

127.0.0.1   bobtheplumber.hat.org
127.0.0.1   bobtheplumber.example.com

3. Create the database

There are 2 ways of doing this.

3.1. Using your local postgresql instance

    > createdb testhatdb1
    > createuser testhatdb1
    > psql postgres -c "GRANT CREATE ON DATABASE testhatdb1 TO testhatdb1"

3.2. Using `docker-compose`

A docker-compose.yml file has been included in this project to boot up a dockerized postgresql instance. If you do use this method, you need to create a .env file. The included .env.example will work as is. If you make a change, do make the corresponding change to ./hat/conf/dev.conf Then

    > make docker-db

You can stop the database with

    > make docker-db-stop

4. Run the project!

    > make run-dev

Go to http://bobtheplumber.example.com:9000

You're all set!

Customising your development environment

Your best source of information on how the development environment could be customised is the hat/conf/dev.conf configuration file. Make sure you run the project locally with the configuration enabled (using the steps above) or it will just show you the message that the HAT could not be found.

Among other things, the configuration includes:

host names alongside port numbers of the test HATs (http://yourname.hat.org:9000)
access credentials used to log in as the owner or restricted platform user into the HAT (the default password is a very unsafe testing)
database connection details (important if you want to change your database setup above)
private and public keys used for token signing and verification

Specifically, it has 4 major sections:

Enables the development environment self-initialisation module:

play.modules {
  enabled += "org.hatdex.hat.modules.DevHatInitializationModule"
}

Sets the list of database evolutions to be executed on initialisation:

devhatMigrations = [
  "evolutions/hat-database-schema/11_hat.sql",
  "evolutions/hat-database-schema/12_hatEvolutions.sql",
  "evolutions/hat-database-schema/13_liveEvolutions.sql",
  "evolutions/hat-database-schema/14_newHat.sql"]

devhats list sets out the list of HATs that are served by the current server, for each including owner details to be initialised with and database access credentials. Each specified database must exist before launching the server but are initialised with the right schema at start time
hat section lists all corresponding HAT configurations to serve, here you could change the HAT domain name, owner's email address or public/private keypair used by the HAT for its token operations

Using Helm 3

The HAT solution is easy deployable on top of Kubernetes via Helm 3 chart.

Additional information

API documentation can be found at the developers' portal
HAT Database Schema has been split up into a separate project for easier reuse across different environments.

License

HAT including HAT Schema and API is licensed under AGPL - GNU AFFERO GENERAL PUBLIC LICENSE

hat2.0's People

Contributors

Stargazers

Watchers

hat2.0's Issues

Reducing the number of entity tables

To simplify the underlying data model (not the conceptual one), consider merging all entity data tables (person, thing, event, location and organisation) into one, differentiated by kind, simplifying api programming and reducing the number of tables

Capability to post multiple records in a single API call

The feature would improve data submission efficiency and reduce client-side code complexity when large amount of records need to be processed at once.

Deployment documentation out of date?

Running the deploy.sh script throws a couple of missing file errors -

Setting up database user and database
dbuser: hat20test, database: hat20test
CREATE ROLE hat20test NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;
createuser: creation of new role failed: ERROR:  role "hat20test" already exists
createdb: database creation failed: ERROR:  database "hat20test" already exists
ALTER ROLE
Handling schemas
NOTICE:  drop cascades to 2 other objects
DETAIL:  drop cascades to extension uuid-ossp
drop cascades to extension pgcrypto
DROP SCHEMA
CREATE SCHEMA
ALTER SCHEMA
Setting up database
CREATE EXTENSION
CREATE EXTENSION
./deployment/deploy.sh: line 31: ./src/sql/HAT-V2.0.sql: No such file or directory
./src/sql/HAT-V2.0.sql
Setting up corresponding configuration
Setting up HAT access
./deployment/deploy.sh: line 51: ./src/sql/boilerplate/authentication.sql: No such file or directory
bash: applyEvolutions.sh: No such file or directory
Preparing the project to be executed

It seems like references to code that's present in https://github.com/Hub-of-all-Things/hat-database-schema - but this isn't a submodule of the HAT2.0 repo, and even then, the HAT-V2.0.sql file doesn't seem to be present in any repo?

Add API support for list values

As an example, user might want to save their FB post record with one of the values being a list of likes. Currently storing such a record is cumbersome. It requires the creation of additional "likes" data source model and manually referencing each "like" record inside the "post" record.

IllegalArgumentException

Encountered an error when handling GET /users/access_token:

Unexpected exception[IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern

Reproduced with postman but not via Rumpel. Status code: 500

Entity Type in the response and filtering by type

Current API endpoints do not include entity Type in the response. Please add this information in entity retrieval. Also, please add filtering by type.

location relationship with (event,thing,organisation) cannot be Created

Having created the location and (event,thing,organisation):
location

{
  "id": 191,
  "name": "stairs"
}

thing

{
  "id": 44,
  "name": "pressure sensor"
}

I cannot create a relationship between them by posting

{
 "relationshipType": "parent child"
}

to
location/191/thing/44?userAuth..

Returns 400 and seems like it tries to bind the first id to the thing rather than taking the second one as designed:

ERROR: insert or update on table "locations_locationthingcrossref" violates foreign key constraint "thing_id_refs_id_fk"
  Detail: Key (thing_id)=(191) is not present in table "things_thing".

Some API calls return 401 (Unauthorized) instead of 400 (Bad Request) for invalid input

When calling into the API to create a new table (POST /data/table) with invalid data (such as an empty JSON object {}) you correctly get error code 400 (Bad Request "Object is missing required member 'name'" ) when you are authenticating with username and password, but when authenticating with an acccess token you get 401 (Unauthorized), even though the token is valid.

The correct error code (and the very useful detailed JSON validation or database error message that goes along with it) should be returned when a valid access token is presented as well.

Data Debit Key expiration is not checked

You can currently continue to use an expired data debit key (or one whose startDate has not yet been reached) to retrieve the bundle contents.

DataDebitAuthorization.scala is checking if the recipient is the right one and if the data debit has been enabled. It should also check if the current date is between the data debit startDate and endDate.

From Install instructions, getting "Hat not available" startup screen. Is this normal?

Continued from "Trying to run the project, getting an error " #36

Hi AndriusA. I think the startup page I am receiving is an error page "Hat not available". The two buttons on the page link off to https://hubofallthings.com/main/what-is-the-hat/ (file not found) and https://hatters.hubofallthings.com/ , and do not allow me to create a hat on my local system.

Is this an actual error page or the default front page? Looking at the page source I see html such as

<div class="error-logo">, <h1 class="error-title">, <h3 class="error-subtitle">, <h3 class="error-subtitle-action">

I am getting some warnings that I don't know how to resolve, about private keys.

How can I create a hat? For sure I am missing something.

[info] Exporting web-assets:hat
[info] Compiling 16 Scala sources and 3 Java sources to /home/don-w510-u/HAT2.0/hat/target/scala-2.11/classes...
[WARN ] [11/13/2017 09:09:01] [o.h.h.r.a.HatServerActor] Error while trying to fetch HAT bobtheplumber.hat.org:9000 Server configuration: Private Key for bobtheplumber.hat.org:9000 not found
[WARN ] [11/13/2017 09:09:01] [o.h.h.r.HatServerProviderImpl] Error while retrieving HAT Server info: Private Key for bobtheplumber.hat.org:9000 not found

Thanks, Don

API requires field/table name in addition to field/table id, but does not validate it

When doing things like creating a new record with fields filled in you need to provide the API with the relevant field or table name in addition to the field or table id (which is already unique). If the name is missing the API call is rejected.

But it seems that the names provided do not have to be correct (i.e. match the field/table definition). You can post any dummy name, and the request will be processed.

If the purpose of including the name in the request is to provide for extra validation of intent / avoid mistakes, then this parameter should be checked and the request refused when invalid names are specified.

If there is no validation, there should be no need to pass in the name at all (which makes writing API client code easier, too).

Possible middle-ground / best-of-both-worlds: Make the name optional, but if specified, it must be correct.

Property lookup endpoints are only enabled for the owner account

Merge record creation and data filling into one API call

Feature would improve the efficiency by reducing the number of required client-server round trips.

Exception in Update API when data is an empty array

Endpoint: PUT /api/v2.6/data
Description: Internal Server Error thrown when updating data to an empty array
Steps to replicate

Create the following data into an endpoint POST /api/v2.6/data
{ "id": "some-random-id", "symptoms": ["fever", "cough"], "timestamp": 1591802656398 }
Update the above data using PUT /api/v2.6/data with
{ "id": "some-random-id", "symptoms": [], "timestamp": 1591802656398 }
Internal Server Error thrown.
Expected behaviour: Data record should be updated with symptoms = []

Other information
There is no issues creating an initial record with an empty array. Only updating has issues.

UUIDs for all publicly visible IDs

For consideration: to avoid simple sequencing issues and further minimise exposure of internal HAT state, all publicly visible (via APIs) and hence database IDs should be converted to UUID type instead of auto-incrementing integers

Authentication from own server working with Rumpel, but not data-plugs or data-market

Managed to get my own server running (localhost proxied through ngrok for now), and I can access Rumple and update my details, but when trying to enable a data-plug, like social-plug.hubofallthings.com, I receive a 502 after entering my PHATA.

The ngrok proxy throws a few GET /data/table 404 Not Found, but it seems to be rumpel querying missing data sources

[INFO ] [08/07/2016 13:33:22] API-Access - GET:http://a5b6c56f.ngrok.io/data/table?name=events&source=facebook:404 Not Found
[INFO ] [08/07/2016 13:33:22] API-Access - GET:http://a5b6c56f.ngrok.io/data/table?name=posts&source=facebook:404 Not Found
[INFO ] [08/07/2016 13:33:22] API-Access - GET:http://a5b6c56f.ngrok.io/data/table?name=events&source=ical:404 Not Found
[INFO ] [08/07/2016 13:33:23] API-Access - GET:http://a5b6c56f.ngrok.io/data/table?name=locations&source=iphone:404 Not Found

UserPassHandler should accept Basic-Auth in addition to query parameters

As part of the URL, query parameters are often logged along the way, putting a password in there seems risky.

Please also allow sending the credentials via the Basic-Auth mechanism in the request header.

Trying to run the project, getting an error

I am following the install instructions from: https://github.com/Hub-of-all-Things/HAT2.0 specifically Running the project, HAT Setup.

Cloned the repository:
https://github.com/Hub-of-all-Things/HAT2.0
Steps 1-3, creating the postgreSQL db, compiling the project and local domain mapping went without error.

Step 4 sbt "project hat" -Dconfig.resource=dev.conf run give me this error:

don-u@don--u:~$ sbt compile

[info] Loading project definition from /home/don-w510-u/project
[info] Set current project to don-u (in build file:/home/don-u/)
[info] Executing in batch mode. For better performance use sbt's shell
[success] Total time: 0 s, completed 13-Nov-2017 8:28:01 AM

don-u@don-u:~$ sbt "project hat" -Dconfig.resource=dev.conf run

[info] Loading project definition from /home/don-u/project
[info] Set current project to don-w510-u (in build file:/home/don-u/)
[error] Not a valid project ID: hat
[error] project hat
[error]            ^

Also tried:

don-u@don-u:~$ sbt "project hat2.0" -Dconfig.resource=dev.conf run
[info] Loading project definition from /home/don-u/project
[info] Set current project to don-u (in build file:/home/don-u/)
[error] Expected project ID
[error] project hat2.0
[error]             ^

Please tell me what I am doing incorrectly. Thanks, Don

Parametrized combinators

Suggestion from Chris Pointon

To reduce the code complexity and the load on the DS backend, can I propose parameterized combinators as a future enhancement?

e.g.

[
  {
    "endpoint": "rumpel/locations",
    "filters": [
      {
        "field": "data.locations.timestamp",
        "transformation": {
          "transformation": "datetimeExtract",
          "part": "hour"
        },
        "operator": {
          "operator": "between",
          "lower": "$lower",
          "upper": "$upper"
        }
      }
    ]
  }
]

The call to get the result would then be:
GET /api/v2.6/combinator/$COMBINATOR_NAME?lower=7&upper=9

Docker deployment: Boilerplate data conflicts with table PK sequence

I just deployed HAT2.0 using the provided Docker scripts (thanks for those, by the way, this makes installation so much faster), and when I tried to create a new data table (using POST API), I got this error:

{
  "message": "Error creating Table",
  "cause": "ERROR: duplicate key value violates unique constraint \"data_table_pk\"\n
    Detail: Key (id)=(4) already exists."
}

The id mentioned increases by one for each request.

I believe this is caused by using a sequence to generate the PK field, but the boilerplate data that is inserted upon deployment does not increment the sequence. That boilerplate should be changed to refer to nextval() instead in the INSERT statements.

Updates made via REST API should be transactional/atomic

When running FacebookDataPlug, it tries to create a table model in HAT (see https://github.com/Hub-of-all-Things/DataPlugFacebook/blob/master/config/fbHatModels.js)

I had part of that fail because of primary key violations (which is fine).

But as a result, parts of the table model have been successfully created (the top-level table), while others have not (such as fields or sub-tables), making it difficult to clean up before the request could be run again.

Wherever feasible, modifications made by a REST API call should be applied all or nothing. This should be possible here with database transactions.

Merge entity creation and data filling into one API call

Creating an entity and filling it with data (attaching types and properties) requires several API calls, with the associated overhead and loss of transactional capabilities.

It would be nice to be able to create an entity and specify its types and properties in a single call, similar to how it can be done for data records.

The copyright year for the Claim your HAT email is not updated

Improve efficiency for dealing with hierarchical/recursive data structures

Currently (a limitation of Slick) to reconstruct structures of deeply nested data, such as recovering the structures of nested data_tables it is necessary to do one roundtrip to the database per each table, another roundtrip collect its related tables and repeat the same operation on them recursively, resulting in O(n) trips to the database where n is the number of tables in the given structure.

One way around it is to use SQL common table expressions (marked for 3.1.0 release of Slick as well), e.g.:

WITH RECURSIVE recursive_data_tables(id, date_created, last_updated, name, source_name) AS (
    SELECT id, date_created, last_updated, name, source_name FROM data_table WHERE name = 'HyperDataBrowser'
  UNION ALL
    SELECT tab.id, tab.date_created, tab.last_updated, tab.name, tab.source_name
    FROM recursive_data_tables r_tab, data_table tab
    WHERE tab.id IN (SELECT table2 FROM data_tabletotablecrossref WHERE table1 = r_tab.id)
  )
  SELECT * FROM recursive_data_tables;

However need to work out parsing back into Slick data models and be careful with infinite recursion

POST to create a new location returns a 200 rather than 201

Update feed mapper for instagram data

Instagram API data format has changed significantly in a recent migration to the graph API. Currently, SHE feed mapper supports only the new Instagram data format. It should be improved to support both the old and the new data formats.