Giter Site home page Giter Site logo

hitc's People

Contributors

jonnoftw avatar rhyolight avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hitc's Issues

When pushing data to a model, returns error if data is earlier than last seen time

Depends on #9.

Each model must receive data chronologically. If data is received that occurred before the last known time passed to the model, a 4XX error should be returned immediately and no data should be processed.

This will require persistence, and might best be accomplished after we've established how GET /models/<id> is going to work on the backend (this does not have a spec yet, hence the P4 priority).

Deploy a test NuPIC app to Heroku

Prove that you can install and run hot gym or something. I'm making this ticket because I think this will be harder than you think.

Handle batch data via PUT /models/<guid>

The current endpoint at PUT /models/<id> expects the request body to contain a JSON object representing one row of data. This task is to make that endpoint accept many rows of data.

If the JSON object is a list instead of an object, assume it is a list of rows.

Return only the very last result object from the HTM in the response.

REST API cannot assume the temporal field name is "timestamp"

When running data through a model during a PUT, the request handler makes an assumption that the temporal data stream is named timestamp, but this is not always the case. During the creation of a model, any temporal timestamps can be found by inspecting the model parameters. This should be stored in memory for future requests and used when handling input data.

What HTM implementation will we use?

I started this discussion already, let's finish it here.


Ideas

  1. You could use HTM Engine and the HTTP interface provided by the
    skeleton app to get started [2]. This might be really easy, and
    provide some scalability right off the bat. The work would mostly just
    be figuring out the Docker configuration for deployment. However, it
    would not allow users to provide model params, and adding this
    functionality would need to be done in the HTM Engine itself (others
    want this too [3]).
  2. You could use the simple HTTP wrapper around NuPIC provided by
    Jared Weiss [4]. It is just a SimpleHttpServer but it would be a fast
    prototype. Again, the work would be deployment configuration.
  3. You could use HTM-Moclu [5] and HTM.Java. I haven't been able to
    get this running without multiple servers (because Akka), but someone
    with the know-how could take it on.
  4. You could wrap a Java HTTP server over HTM.Java.

Model Persistence

The models need to be periodically stored in a database using a BLOB (since nupic models can be converted to a string using cPickle).

Code might look something like this:

import cPickle as pickle
def store_model(guid):
    blob = pickle.dumps(model['model'])
    netinfo = pickle.dumps(model['model']._netinfo.save()) #this won't work, I can't tell how network saving actually works
    store_blob_in_db(guid, blob, netinfo)
    del blob, netinfo

We also need to take into account the fact that the OPF CLA model needs some extra serialization: https://github.com/numenta/nupic/blob/master/src%2Fnupic%2Fframeworks%2Fopf%2Fclamodel.py?ts=4#L1286-L1306
This data could probably be stored in a second BLOB column.

Basically, I propose the following behaviour (subject to feedback):

  • Models should be persisted in a BLOB field in MySQL.
  • All in memory models should be stored in the database every 5 minutes
  • Models should be dumped to the database if they are unused for 5 minutes.
  • If a GET, PUSH or reset API call is made to a model that is not in memory, the server will attempt to retrieve the model by its GUID from the database and then perform the requested action.
  • DB schema to look like:
id: Integer, pk
guid: varchar,
model: BLOB
net_info: BLOB
created: timestamp
updated: timestamp

Other questions:

  • Should models not in memory be included in the list of all models?

Test Suite

This probably needs some kind of test suite. I've started one in the test folder, but it should probably be migrated to a better system that properly tests the entire API.

As an HTTP client, I can create an HTM model in a RESTful way

Very simple endpoints:

  • POST /models: create model
    • body [JSON] example:
      • modelParams: same structure as the NuPIC / HTM.Java model params (sorry, we have no real spec for this, you'll just have to look at some example code)
      • predictedField [optional]: the string name of the field to be predicted (not necessary for anomaly models)
    • returns [JSON]: model id to identify the model instance for future calls ({"id":"asdf"})

Errors should return a non 200 status code representing the error as well as possible.

Continuous Integration

I think we should use some form of CI to so that major bugs don't get into production. I currently have my heroku app pulling automatically from github when I push to master, so it would be good for major bugs to be out of the way.

Creating a custom model does not respect input model param field names

The script below creates a new custom model with these model params, which declare the following fieldnames:

#!/usr/bin/env python
import requests
import json

DATE_FORMAT = "%m/%d/%y %H:%M"
desc = """
A test the shows created models do not return the right field names.
"""

URL = 'http://localhost:5000/'


def post(url, params=None):
    return requests.post(URL+url, data=params)

def get(url, params=None):
    return requests.get(URL+url, params=params)

def create_model(model_spec):
    r = post('models', model_spec).json()
    return r['guid']

def get_model(model):
    return get('models/'+model)

if __name__ == "__main__":
    from pprint import pprint
    print(desc)
    print ("Making prediction model from model_params.json")
    with open('model_params.json') as data_file:
        model_spec = data_file.read()
    custom_model = create_model(model_spec)
    pprint (get_model(custom_model).json())

However, the resulting model params used to create the model don't respect the input param field names, instead the fields c0 and c1 are created, as shown in this python dict representing the created model params:

{u'guid': u'cfa3bb3e-4769-41b0-9299-0fb903d4e820',
 u'last': None,
 u'params': {u'aggregationInfo': {u'days': 0,
                                  u'fields': [],
                                  u'hours': 0,
                                  u'microseconds': 0,
                                  u'milliseconds': 0,
                                  u'minutes': 0,
                                  u'months': 0,
                                  u'seconds': 0,
                                  u'weeks': 0,
                                  u'years': 0},
             u'model': u'CLA',
             u'modelParams': {u'anomalyParams': {u'anomalyCacheRecords': None,
                                                 u'autoDetectThreshold': None,
                                                 u'autoDetectWaitRecords': 5030},
                              u'clEnable': False,
                              u'clParams': {u'alpha': 0.035828933612158,
                                            u'clVerbosity': 0,
                                            u'regionName': u'CLAClassifierRegion',
                                            u'steps': u'1'},
                              u'inferenceType': u'TemporalAnomaly',
                              u'sensorParams': {u'encoders': {u'c0_dayOfWeek': None,
                                                              u'c0_timeOfDay': {u'fieldname': u'c0',
                                                                                u'name': u'c0',
                                                                                u'timeOfDay': [21,
                                                                                               9.49122334747737],
                                                                                u'type': u'DateEncoder'},
                                                              u'c0_weekend': None,
                                                              u'c1': {u'fieldname': u'c1',
                                                                      u'name': u'c1',
                                                                      u'resolution': 21,
                                                                      u'seed': 42,
                                                                      u'type': u'RandomDistributedScalarEncoder'}},
                                                u'sensorAutoReset': None,
                                                u'verbosity': 0},
                              u'spEnable': True,
                              u'spParams': {u'columnCount': 2048,
                                            u'globalInhibition': 1,
                                            u'inputWidth': 0,
                                            u'maxBoost': 1.0,
                                            u'numActiveColumnsPerInhArea': 40,
                                            u'potentialPct': 0.8,
                                            u'seed': 1956,
                                            u'spVerbosity': 0,
                                            u'spatialImp': u'cpp',
                                            u'synPermActiveInc': 0.0015,
                                            u'synPermConnected': 0.1,
                                            u'synPermInactiveDec': 0.0005},
                              u'tpEnable': True,
                              u'tpParams': {u'activationThreshold': 13,
                                            u'cellsPerColumn': 32,
                                            u'columnCount': 2048,
                                            u'globalDecay': 0.0,
                                            u'initialPerm': 0.21,
                                            u'inputWidth': 2048,
                                            u'maxAge': 0,
                                            u'maxSegmentsPerCell': 128,
                                            u'maxSynapsesPerSegment': 32,
                                            u'minThreshold': 10,
                                            u'newSynapseCount': 20,
                                            u'outputType': u'normal',
                                            u'pamLength': 3,
                                            u'permanenceDec': 0.1,
                                            u'permanenceInc': 0.1,
                                            u'seed': 1960,
                                            u'temporalImp': u'cpp',
                                            u'verbosity': 0},
                              u'trainSPNetOnlyIfRequested': False},
             u'predictAheadTime': None,
             u'version': 1},
 u'predicted_field': u'c1',
 u'seen': 0}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.