htm-community / hitc Goto Github PK

7.0 7.0 7.0 92 KB

HTM In The Cloud

License: GNU Affero General Public License v3.0

Python 85.86% CSS 13.60% Shell 0.50% Mako 0.04%

hitc's Issues

As an HTTP client, I can delete an HTM model in a RESTful way

DELETE /models/<id>: delete model identified by <id>

If model does not exist, should return 404. Otherwise, return 200 on success. Any other errors should return a non-200 response status code.

Depends on #5

Error fetching all models when no models exist

l....

What HTTP REST stack will we use?

Depends on #2. We will fill this in with options once #2 is closed.

Decrease model memory footprint

I'm starting to get Error R14 messages on Heroku.

Move hitcpy integration test into the hitcpy repo

When pushing data to a model, returns error if data is earlier than last seen time

Depends on #9.

Each model must receive data chronologically. If data is received that occurred before the last known time passed to the model, a 4XX error should be returned immediately and no data should be processed.

This will require persistence, and might best be accomplished after we've established how GET /models/<id> is going to work on the backend (this does not have a spec yet, hence the P4 priority).

If no model params are provided on model create, use default anomaly detection params

An alternate use-case for creating model, in addition to the standard one defined in #5:

POST /models: create model
- returns [JSON]: the model_params object used to create the model so the user knows the expected data structure

use these.

Depends on #5.

What HTM implementation will we use?

I started this discussion already, let's finish it here.

Ideas

You could use HTM Engine and the HTTP interface provided by the
skeleton app to get started [2]. This might be really easy, and
provide some scalability right off the bat. The work would mostly just
be figuring out the Docker configuration for deployment. However, it
would not allow users to provide model params, and adding this
functionality would need to be done in the HTM Engine itself (others
want this too [3]).
You could use the simple HTTP wrapper around NuPIC provided by
Jared Weiss [4]. It is just a SimpleHttpServer but it would be a fast
prototype. Again, the work would be deployment configuration.
You could use HTM-Moclu [5] and HTM.Java. I haven't been able to
get this running without multiple servers (because Akka), but someone
with the know-how could take it on.
You could wrap a Java HTTP server over HTM.Java.

As an HTTP client, I can push data to an HTM model in a RESTful way

PUT /models/<id>: push data to model identified by <id>
- body [JSON]: data to push to model instance
- returns [JSON]: response object with same structure as the NuPIC OPF ModelResponse object.

If model does not exist, return 404. Success should return 200. Errors should return non-200 response status codes.

Depends on #5.

As a user, I can run HTM over HTTP

Given that I have model params and appropriate data, an HTTP user should be able to:

Continuous Integration

I think we should use some form of CI to so that major bugs don't get into production. I currently have my heroku app pulling automatically from github when I push to master, so it would be good for major bugs to be out of the way.

Handle batch data via PUT /models/<guid>

The current endpoint at PUT /models/<id> expects the request body to contain a JSON object representing one row of data. This task is to make that endpoint accept many rows of data.

If the JSON object is a list instead of an object, assume it is a list of rows.

Return only the very last result object from the HTM in the response.

As an HTTP client, I can specify my own model id

I need to identify my own models with a string. If the id already exists, a 400 error should be returned.

Deploy a test NuPIC app to Heroku

Prove that you can install and run hot gym or something. I'm making this ticket because I think this will be harder than you think.

As an HTTP client, I do not need to provide a predictedField

If I am providing my own model params, the predictedField is not necessary. It just means that inference will not be enabled.

Currently the server returns an error if predictedField is missing.

Creating a custom model does not respect input model param field names

The script below creates a new custom model with these model params, which declare the following fieldnames:

#!/usr/bin/env python
import requests
import json

DATE_FORMAT = "%m/%d/%y %H:%M"
desc = """
A test the shows created models do not return the right field names.
"""

URL = 'http://localhost:5000/'


def post(url, params=None):
    return requests.post(URL+url, data=params)

def get(url, params=None):
    return requests.get(URL+url, params=params)

def create_model(model_spec):
    r = post('models', model_spec).json()
    return r['guid']

def get_model(model):
    return get('models/'+model)

if __name__ == "__main__":
    from pprint import pprint
    print(desc)
    print ("Making prediction model from model_params.json")
    with open('model_params.json') as data_file:
        model_spec = data_file.read()
    custom_model = create_model(model_spec)
    pprint (get_model(custom_model).json())

However, the resulting model params used to create the model don't respect the input param field names, instead the fields c0 and c1 are created, as shown in this python dict representing the created model params:

{u'guid': u'cfa3bb3e-4769-41b0-9299-0fb903d4e820',
 u'last': None,
 u'params': {u'aggregationInfo': {u'days': 0,
                                  u'fields': [],
                                  u'hours': 0,
                                  u'microseconds': 0,
                                  u'milliseconds': 0,
                                  u'minutes': 0,
                                  u'months': 0,
                                  u'seconds': 0,
                                  u'weeks': 0,
                                  u'years': 0},
             u'model': u'CLA',
             u'modelParams': {u'anomalyParams': {u'anomalyCacheRecords': None,
                                                 u'autoDetectThreshold': None,
                                                 u'autoDetectWaitRecords': 5030},
                              u'clEnable': False,
                              u'clParams': {u'alpha': 0.035828933612158,
                                            u'clVerbosity': 0,
                                            u'regionName': u'CLAClassifierRegion',
                                            u'steps': u'1'},
                              u'inferenceType': u'TemporalAnomaly',
                              u'sensorParams': {u'encoders': {u'c0_dayOfWeek': None,
                                                              u'c0_timeOfDay': {u'fieldname': u'c0',
                                                                                u'name': u'c0',
                                                                                u'timeOfDay': [21,
                                                                                               9.49122334747737],
                                                                                u'type': u'DateEncoder'},
                                                              u'c0_weekend': None,
                                                              u'c1': {u'fieldname': u'c1',
                                                                      u'name': u'c1',
                                                                      u'resolution': 21,
                                                                      u'seed': 42,
                                                                      u'type': u'RandomDistributedScalarEncoder'}},
                                                u'sensorAutoReset': None,
                                                u'verbosity': 0},
                              u'spEnable': True,
                              u'spParams': {u'columnCount': 2048,
                                            u'globalInhibition': 1,
                                            u'inputWidth': 0,
                                            u'maxBoost': 1.0,
                                            u'numActiveColumnsPerInhArea': 40,
                                            u'potentialPct': 0.8,
                                            u'seed': 1956,
                                            u'spVerbosity': 0,
                                            u'spatialImp': u'cpp',
                                            u'synPermActiveInc': 0.0015,
                                            u'synPermConnected': 0.1,
                                            u'synPermInactiveDec': 0.0005},
                              u'tpEnable': True,
                              u'tpParams': {u'activationThreshold': 13,
                                            u'cellsPerColumn': 32,
                                            u'columnCount': 2048,
                                            u'globalDecay': 0.0,
                                            u'initialPerm': 0.21,
                                            u'inputWidth': 2048,
                                            u'maxAge': 0,
                                            u'maxSegmentsPerCell': 128,
                                            u'maxSynapsesPerSegment': 32,
                                            u'minThreshold': 10,
                                            u'newSynapseCount': 20,
                                            u'outputType': u'normal',
                                            u'pamLength': 3,
                                            u'permanenceDec': 0.1,
                                            u'permanenceInc': 0.1,
                                            u'seed': 1960,
                                            u'temporalImp': u'cpp',
                                            u'verbosity': 0},
                              u'trainSPNetOnlyIfRequested': False},
             u'predictAheadTime': None,
             u'version': 1},
 u'predicted_field': u'c1',
 u'seen': 0}

Create a Python API client

Should have complete functionality of the API. Repo is https://github.com/nupic-community/hitc-py. Should be pip-installable.

Update batching to return a list of objects, not just last one

I'll do this today.

REST API cannot assume the temporal field name is "timestamp"

When running data through a model during a PUT, the request handler makes an assumption that the temporal data stream is named timestamp, but this is not always the case. During the creation of a model, any temporal timestamps can be found by inspecting the model parameters. This should be stored in memory for future requests and used when handling input data.

Model Persistence

The models need to be periodically stored in a database using a BLOB (since nupic models can be converted to a string using cPickle).

Code might look something like this:

import cPickle as pickle
def store_model(guid):
    blob = pickle.dumps(model['model'])
    netinfo = pickle.dumps(model['model']._netinfo.save()) #this won't work, I can't tell how network saving actually works
    store_blob_in_db(guid, blob, netinfo)
    del blob, netinfo

We also need to take into account the fact that the OPF CLA model needs some extra serialization: https://github.com/numenta/nupic/blob/master/src%2Fnupic%2Fframeworks%2Fopf%2Fclamodel.py?ts=4#L1286-L1306
This data could probably be stored in a second BLOB column.

Basically, I propose the following behaviour (subject to feedback):

Models should be persisted in a BLOB field in MySQL.
All in memory models should be stored in the database every 5 minutes
Models should be dumped to the database if they are unused for 5 minutes.
If a GET, PUSH or reset API call is made to a model that is not in memory, the server will attempt to retrieve the model by its GUID from the database and then perform the requested action.
DB schema to look like:

id: Integer, pk
guid: varchar,
model: BLOB
net_info: BLOB
created: timestamp
updated: timestamp

As an HTTP client, I can create an HTM model in a RESTful way

Very simple endpoints:

POST /models: create model
- body [JSON] example:
  - modelParams: same structure as the NuPIC / HTM.Java model params (sorry, we have no real spec for this, you'll just have to look at some example code)
  - predictedField [optional]: the string name of the field to be predicted (not necessary for anomaly models)
- returns [JSON]: model id to identify the model instance for future calls ({"id":"asdf"})

Errors should return a non 200 status code representing the error as well as possible.

What cloud service will be our deployment target?

Depends on #2 to some extent. I am biased towards Heroku because I've used it a lot and it is pretty simple to deploy and control from a local command line.

Options are (feel free to add more):

Heroku
Google App Engine

Test Suite

This probably needs some kind of test suite. I've started one in the test folder, but it should probably be migrated to a better system that properly tests the entire API.

htm-community / hitc Goto Github PK

hitc's Issues

Ideas

Recommend Projects

Recommend Topics

Recommend Org