Comments (3)
@ilvalerione We're glad you've chosen Rubix ML to learn - Feel free to ask questions and welcome to our community
I don't quite understand you're objective, help me understand
What is the target variable that you are trying to predict? Duration and Memory Peak?
If so, your labels will be either the duration or memory peak (not both yet since we don't support multi-label regressors yet). Since those variables are continuous in nature, you'll need a regressor to predict the value of duration or memory peak given some input features - such as the hour of the day and the day of the week (using your example). See the section of the docs on inference for more info. Note that despite having regression in the name, Logistic Regression is a classifier.
Since you have a categorical feature 'day of week' in your dataset you'll need a regressor that is compatible with both categorical and continuous features. For your case, I would recommend either a Regression Tree because it is simple, fast, and explainable. Another option is Gradient Boost which has a tutorial but may be overkill for your dataset.
Unfortunately, neither of those learners can be partially trained - however, you can transform your categorical features to continuous ones using One Hot Encoder and then you could use Adaline.
One last option is to use KNN Regressor with a Gower distance kernel (since it is compatible with both categorical and continuous data types). KNN has the added benefit of implementing the Online interface, however can be computationally intractable with large training sets.
You can obtain a 'confidence interval' or perhaps 'range of expected values' using your words by cross-validation in which the model is tested on unseen data. A report such as Residual Analysis will be able to give you error metrics such as MAE (mean absolute error) such that a MAE of 10 means that each prediction can be +/- 10.
Could it be that what you are really looking for is a way to forecast this time series so that you can predict the next k time steps starting from an initial timestamp? If so, you'll have to wait for time series support.
from ml.
Hi @ilvalerione thanks for the interesting question
Let me start by making sure that our understandings are consistent
Logistic Regression is a type of Online classifier whose prediction is a class label such as 'cat', 'dog', etc. It can also output a probability distribution over these classes as it implements the Probabilistic interface. Is this what you mean by 'range of correctness?'
As a side effect, Logistic Regression can also be used as a supervised anomaly detector where the class labels are 'anomaly', 'not anomaly.' Is this how you plan to use the estimator? As opposed to an unsupervised online anomaly detector such as Gaussian MLE?
As others have inquired about recently in issues #38 and #35, similarly your problem is one that involves non-stationary time series data, which Rubix ML does not currently support. There are models, for example ARIMA, that can handle non-stationary time series natively and, given the recent interest, I am currently looking into how models like these will fit into the Rubix ML architecture. As such, we may end up implementing time-series support in the near future.
from ml.
Hi @andrewdalpino thank you for your message.
Reading your documentation I better understand my problem, and I appreciated your "learning purpose" contents.
It is right to emphasize that I thought from a developer point of view, so could be many details out of my skills.
I think about Logistic Regression classifying data by "hour of the day" and "day of the week":
$transactions = [
// [duration, memory_peak, hour_of_day, day_of_week],
[12.1, 4.2, 10, 'Saturday'],
[20.0, 6,7, 11, 'Saturday'],
[68.35, 12.0, 11, 'Thursday'],
];
In this way I'm trying to correlate duration
and memory_peak
but linking this classification to the hour and day of the week is equal to assume that data is weekly seasonal. I thought that using an online detector could mitigate the seasonal assumption changing the model over time.
It can also output a probability distribution over these classes as it implements the Probabilistic interface. Is this what you mean by 'range of correctness?'
Yes, I thought to use this information to build the "dynamic grey band" in the chart.
I'm not sure tha classifiers are the right choose for this scenario because at the end I'm dealing with "unsupervised dataset" I thought. I'm not able to know what samples in the past are anomalies or not and train the model accordingly. I'm thinking in the way that the ability to understand if a sample is an anomaly or not should be acquired by the algorithm itself, based on the historical dataset.
Thanks to your advice I better understood Gaussian MLE, it could be another reasonable approach.
I hear more and more often about algorithms like ARIMA or SARIMA (S - seasonal).
I'm a developer that is trying to implement better solutions to solve problems. This is a coompletely new world for me, so thank you for your informations.
from ml.
Related Issues (20)
- Prune redundant Decision Tree leaf nodes
- Fixed Array Memory Optimizations HOT 2
- Use new PHP 8.0 features in version 3.0
- Warning: Ambiguous class resolution, `Rubix\ML\Kernels\Distance\Gower` HOT 4
- Use pretrained models? HOT 3
- How to Train only One Class? HOT 4
- Not working in PHP 8.2 because of voku/portable-utf8/src/voku/helper/UTF8.php HOT 7
- Question, which model users for Fraud Prediction HOT 2
- Incorrect SVC save/load methods implementation HOT 2
- psr/log old version is limiting the project to be used on modern frameworks HOT 2
- Which alogorithm can be used for search result ranking ? HOT 2
- Is "Transformer Architecture Marchine Learning Model" supported on RubixML ??? HOT 4
- Map method in Dataset doesn't exist HOT 2
- Multi Language Tokenization Support HOT 2
- WordCountVectorizer Memory Issue HOT 2
- TruncatedSVD() made PHP crash without any message HOT 3
- Evaluation of the cluster quality with indicators HOT 1
- Requirements not resolved to an installable set of packages HOT 3
- Softmax Classifier & partial training HOT 1
- Does Rubix ML support Natural Language Processing (NLP)? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ml.