Giter Site home page Giter Site logo

zephyr's People

Contributors

frances-h avatar sarahmish avatar sarapido avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

sarapido boom90lb

zephyr's Issues

Add XGB Pipeline

  • add XGB primitive with predict_proba
  • add FindThreshold primitive
  • create XGB pipeline
  • create a Zephyr class to interact with pipelines.

Design of Zephyr hyperparameters

Currently Supported Design

current approach to change hyperparameters is through a dictionary we pass to the zephyr api. This dictionary specifies the primitive and the hyperparameter of that primitive that the user wishes to modify.

An example code for an xgb pipeline

"hyperparameters = {
    "xgboost.XGBClassifier#1": {
        "n_estimators": 50
    }
}

zephyr = Zephyr('xgb', hyperparameters)

Ideal Design

the proposed design is that hyperparameters can be exposed at a pipeline level without the need of a dictionary.

An example code that should achieve the same changes in the previous example

zephyr = Zephyr('xgb', n_estimators=50)

under the hood, these hyperparameters should be mapped to the right primitive and altered.

Challenges

there are several cases that need to be resolved with this strategy:

  • if the hyperparameter name belongs to more than one primitive, which one is the user altering?
  • these hyperparameters are dynamic in a sense that they change from pipeline to pipeline, how do we support them? how would the user know what hyperparameters they can change?
  • can these hyperparameters be modified in only instantiation phase or should we allow other phases like fit? e.g. epochs.

Integrating `SigPro`

SigPro allows users to process time series signals and perform a wide-range of transformations and aggregations. We want to allow users to use SigPro through Zephyr.

Assume we have transformations and aggregations, we want to apply them to pidata or scada data.

Desired Behavior

Suppose we have the following pidata dataframe

_index timestamp COD_ELEMENT val1 va2
0 2022-01-02 13:21:01 0 1002.0 -98.7
1 2022-03-08 13:21:01 0 56.8 1004.2

and we want to compute the mean of the amplitude using the SigPro primitive sigpro.aggregations.amplitude.statistical.mean for each month of readings for the column val, then we get the following processed dataframe

_index time COD_ELEMENT mean
0 2022-01-31 0 1002.0
1 2022-02-28 0 null
2 2022-03-31 0 56.8

Proposed Function

def process_signals(es, signal_dataframe_name, signal_column, transformations, aggregations,
                    window_size, replace_dataframe=False, **kwargs):
    '''Process signals using SigPro.

    Apply SigPro transformations and aggregations on the specified entity from the
    given entityset. If ``replace_dataframe=True``, then the old entity will be updated.

    Args:
        es (featuretools.EntitySet):
            Entityset to extract signals from.
        signal_dataframe_name (str):
            Name of the dataframe in the entityset containing signal data to process.
        signal_column (str):
            Name of column or containing signal values to apply signal processing pipeline to.
        transformations (list[dict]):
            List of dictionaries containing the transformation primitives.
        aggregations (list[dict]):
            List of dictionaries containing the aggregation primitives.
        window_size (str):
            Size of the window to bin the signals over. e.g. ('1h).
        replace_dataframe (bool):
            If ``True``, will replace the entire signal dataframe in the EntitySet with the
            processed signals. Defaults to ``False``, creating a new child dataframe containing
            processed signals with the suffix ``_processed``.
    '''

Display summary of loaded entities

Currently the validation function in Zephyr shows an error when loading an entityset does not conform to the expected metadata. For example, if turbines entity is missing COD_ELEMENT the following error will be displayed

ValueError: Turbines index column "COD_ELEMENT" missing from data for turbines entity

Rather than erroring out, the validation function should display an entire summary for each entity:

  • whether it passed or not
  • if it did not pass, which columns are missing / have a problem and what is the message

This will look something like unittesting print

Name                                                                      Pass   Fail  Cover
---------------------------------------------------------------------------------------------
turbines                                                                   x      1     90%
work_orders                                                                x      1     95%
notifications                                                              x      1     92%
stoppages                                                                  ✓            100%
alarms                                                                     ✓            100%
pidata                                                                     ✓            100%
---------------------------------------------------------------------------------------------
TOTAL                                                                      6      3     96%

turbines
ValueError: Turbines index column "COD_ELEMENT" missing from data for turbines entity

work_orders
ValueError: Expected index column "COD_ORDER" of work_orders entity is not unique

notifications
ValueError: Missing time index column "DAT_POSTING" from notifications entity

To use Zephyr, the validation above must all pass with 100%.

Create vibrations entityset

In addition to create_entityset_scada and create_entityset_pidata, we want to add create_entityset_vibrations.

Entities to include:

  • Turbines
  • Alarms
  • Notifications
  • Stoppages
  • Work Orders
  • Pi Data
  • Vibrations

Necessary columns in vibrations

  • COD_ELEMENT
  • timestamp
  • signal_id
  • xvalues
  • yvalues

Metadata of vibrations

'vibrations': {
    'index': '_index',
    'make_index': True,
    'time_index': 'timestamp',
    'logical_types': {
        'COD_ELEMENT': 'categorical',
        'turbine_id': 'categorical',
        'signal_id': 'categorical',
        'timestamp': 'datetime',
        'sensorName': 'categorical',
        'sensorType': 'categorical',
        'sensorSerial': 'integer_nullable',
        'siteName': 'categorical',
        'turbineName': 'categorical',
        'turbineSerial': 'integer_nullable',
        'configurationName': 'natural_language',
        'softwareVersion': 'categorical',
        'rpm': 'double',
        'rpmStatus': 'natural_language',
        'duration': 'natural_language',
        'condition': 'categorical',
        'maskTime': 'datetime',
        'Mask Status': 'natural_language',
        'System Serial': 'categorical',
        'WPS-ActivePower-Average': 'double',
        'WPS-ActivePower-Minimum': 'double',
        'WPS-ActivePower-Maximum': 'double',
        'WPS-ActivePower-Deviation': 'double',
        'WPS-ActivePower-StartTime': 'datetime',
        'WPS-ActivePower-StopTime': 'datetime',
        'WPS-ActivePower-Counts': 'natural_language',
        'Measured RPM': 'double',
        'WPS-ActivePower': 'double',
        'WPS-Gearoiltemperature': 'double',
        'WPS-GeneratorRPM': 'double',
        'WPS-PitchReference': 'double',
        'WPS-RotorRPM': 'double',
        'WPS-Windspeed': 'double',
        'WPS-YawAngle': 'double',
        'overload warning': 'categorical',
        'bias warning': 'categorical',
        'bias voltage': 'double',
        'xValueOffset': 'double',
        'xValueDelta': 'double',
        'xValueUnit': 'categorical',
        'yValueUnit': 'categorical',
        'TotalCount-RPM0': 'double',
        'TotalCount-RPM1': 'double',
        'TotalCount-RPM2': 'double',
        'TotalCount-RPM3': 'double'
    }
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.