sintel-dev / zephyr Goto Github PK
View Code? Open in Web Editor NEWhttps://dtail.gitbook.io/zephyr/
License: MIT License
https://dtail.gitbook.io/zephyr/
License: MIT License
predict_proba
FindThreshold
primitiveI have been looking at the labelling tool, I believe we are losing many False classes as we are not setting 'drop_empty' to False in the label make search method. I think this would have a big impact is out monitoring records are abbreviated or pre-filtered (to some degree).
drop_empty (bool) – Whether to drop empty slices. Default value is True
I have noted that the labelling data slices are not well controlled, we should have clear guidance and/or controls over how the slices are generated.
I think most labels should overlap in some way to present more training examples for positive classes.
current approach to change hyperparameters is through a dictionary we pass to the zephyr api. This dictionary specifies the primitive and the hyperparameter of that primitive that the user wishes to modify.
An example code for an xgb
pipeline
"hyperparameters = {
"xgboost.XGBClassifier#1": {
"n_estimators": 50
}
}
zephyr = Zephyr('xgb', hyperparameters)
the proposed design is that hyperparameters can be exposed at a pipeline level without the need of a dictionary.
An example code that should achieve the same changes in the previous example
zephyr = Zephyr('xgb', n_estimators=50)
under the hood, these hyperparameters should be mapped to the right primitive and altered.
there are several cases that need to be resolved with this strategy:
fit
? e.g. epochs
.SigPro allows users to process time series signals and perform a wide-range of transformations and aggregations. We want to allow users to use SigPro through Zephyr.
Assume we have transformations
and aggregations
, we want to apply them to pidata
or scada
data.
Suppose we have the following pidata
dataframe
_index | timestamp | COD_ELEMENT | val1 | va2 |
---|---|---|---|---|
0 | 2022-01-02 13:21:01 | 0 | 1002.0 | -98.7 |
1 | 2022-03-08 13:21:01 | 0 | 56.8 | 1004.2 |
and we want to compute the mean of the amplitude using the SigPro primitive sigpro.aggregations.amplitude.statistical.mean
for each month of readings for the column val
, then we get the following processed dataframe
_index | time | COD_ELEMENT | mean |
---|---|---|---|
0 | 2022-01-31 | 0 | 1002.0 |
1 | 2022-02-28 | 0 | null |
2 | 2022-03-31 | 0 | 56.8 |
def process_signals(es, signal_dataframe_name, signal_column, transformations, aggregations,
window_size, replace_dataframe=False, **kwargs):
'''Process signals using SigPro.
Apply SigPro transformations and aggregations on the specified entity from the
given entityset. If ``replace_dataframe=True``, then the old entity will be updated.
Args:
es (featuretools.EntitySet):
Entityset to extract signals from.
signal_dataframe_name (str):
Name of the dataframe in the entityset containing signal data to process.
signal_column (str):
Name of column or containing signal values to apply signal processing pipeline to.
transformations (list[dict]):
List of dictionaries containing the transformation primitives.
aggregations (list[dict]):
List of dictionaries containing the aggregation primitives.
window_size (str):
Size of the window to bin the signals over. e.g. ('1h).
replace_dataframe (bool):
If ``True``, will replace the entire signal dataframe in the EntitySet with the
processed signals. Defaults to ``False``, creating a new child dataframe containing
processed signals with the suffix ``_processed``.
'''
Currently the validation function in Zephyr shows an error when loading an entityset does not conform to the expected metadata. For example, if turbines entity is missing COD_ELEMENT
the following error will be displayed
ValueError: Turbines index column "COD_ELEMENT" missing from data for turbines entity
Rather than erroring out, the validation function should display an entire summary for each entity:
This will look something like unittesting print
Name Pass Fail Cover
---------------------------------------------------------------------------------------------
turbines x 1 90%
work_orders x 1 95%
notifications x 1 92%
stoppages ✓ 100%
alarms ✓ 100%
pidata ✓ 100%
---------------------------------------------------------------------------------------------
TOTAL 6 3 96%
turbines
ValueError: Turbines index column "COD_ELEMENT" missing from data for turbines entity
work_orders
ValueError: Expected index column "COD_ORDER" of work_orders entity is not unique
notifications
ValueError: Missing time index column "DAT_POSTING" from notifications entity
To use Zephyr, the validation above must all pass with 100%.
In addition to create_entityset_scada
and create_entityset_pidata
, we want to add create_entityset_vibrations
.
Entities to include:
Necessary columns in vibrations
Metadata of vibrations
'vibrations': {
'index': '_index',
'make_index': True,
'time_index': 'timestamp',
'logical_types': {
'COD_ELEMENT': 'categorical',
'turbine_id': 'categorical',
'signal_id': 'categorical',
'timestamp': 'datetime',
'sensorName': 'categorical',
'sensorType': 'categorical',
'sensorSerial': 'integer_nullable',
'siteName': 'categorical',
'turbineName': 'categorical',
'turbineSerial': 'integer_nullable',
'configurationName': 'natural_language',
'softwareVersion': 'categorical',
'rpm': 'double',
'rpmStatus': 'natural_language',
'duration': 'natural_language',
'condition': 'categorical',
'maskTime': 'datetime',
'Mask Status': 'natural_language',
'System Serial': 'categorical',
'WPS-ActivePower-Average': 'double',
'WPS-ActivePower-Minimum': 'double',
'WPS-ActivePower-Maximum': 'double',
'WPS-ActivePower-Deviation': 'double',
'WPS-ActivePower-StartTime': 'datetime',
'WPS-ActivePower-StopTime': 'datetime',
'WPS-ActivePower-Counts': 'natural_language',
'Measured RPM': 'double',
'WPS-ActivePower': 'double',
'WPS-Gearoiltemperature': 'double',
'WPS-GeneratorRPM': 'double',
'WPS-PitchReference': 'double',
'WPS-RotorRPM': 'double',
'WPS-Windspeed': 'double',
'WPS-YawAngle': 'double',
'overload warning': 'categorical',
'bias warning': 'categorical',
'bias voltage': 'double',
'xValueOffset': 'double',
'xValueDelta': 'double',
'xValueUnit': 'categorical',
'yValueUnit': 'categorical',
'TotalCount-RPM0': 'double',
'TotalCount-RPM1': 'double',
'TotalCount-RPM2': 'double',
'TotalCount-RPM3': 'double'
}
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.