Giter Site home page Giter Site logo

Comments (15)

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet can share the function setup? The part of the error message that says [Predict.V1] is only for MultiRMSE (regression with more than one target variable). If you set up the EvalMetric or LossFunction to be 'MultiRMSE' with a single variable target that could be the issue. Otherwise, the internal function is mistakenly thinking that it's a MultiRMSE use case when it shouldn't be and I'll likely just have to modify an if-statement

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

That appears to be my mistake then; I was under the assumption that multiRMSE was needed via the grouped dataset for each of the levels.

I have it out of the example script now and testing on personalized dataset with only 1 grouping variable as opposed to the 2 in the example. Reran with the changes that was the issue. It that leads to yet another new roadblock though.
Thank you again, this truly is an incredible tool you've built.
image

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet thanks for the compliment. These CARMA functions have been fun to build and a headache to get right. There's a ton happening internally.

As for the issue, can you provide a snapshot of your data? If you are also providing an XREGS dataset too then can you also provide a snapshot of that as well? I haven't encountered an error like this before...

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

Certainly. This is a significantly trimmed down version of the full dataset that I'll add other target variables into.

Right now it's just 3 columns: Date, GCN (think of this like a store ID), and the target variable TGM.
There are no xregs at this point in time so that setting has been set to NULL.

image

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet I found some issues with the single group variable use case. I'm going to go through and make the fixes and I'll post back. I should have the fix today at some point. Thanks for bringing the issues to my attention!

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

Greatly appreciated! I look forward to it.

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet Giving you a head's up - This might take more than tonight. I'll be working on it however until it's correct.

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

No worries at all. Thanks for the update.

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet I just made a push. I was able to test single group and two group cases along with both group cases with and without xregs. I'm going to leave the issue open until I can do some further testing but I think you are good to go with the new update.

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

Removed the package completely and reinstalled the newest push and still running into the same error.
Screenshot 2021-04-22 092111
Screenshot 2021-04-22 092148

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet Real quick - did you restart R after installing the package and running your script? If not, the package changes may not take effect.

Otherwise, I'm seeing at the end of the error "GroupVar_12248" which is how the DummifyDT function creates dummy variable names when creating dummy variables. However, that should only occur when you have the MultiRMSE specified. I think I only mentioned it being specified in the LossFunction argument but there is also the EvalMetric argument that can also be set as such and and as a result, trigger the DummifyDT process to occur.

I do have my eye out on the CatBoost team - The multiple regression (MultiRMSE) currently cannot be run with any non-numeric data but they have mentioned that that won't be the case eventually.

Lastly, can you paste in the setup you have the AutoCatBoostCARMA() function call? I can scan your setup and see if there is anything that might be causing issue.

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

Edit: Error still persists.

I believe I had but restarting and rerunning the function now to see if it persists.

Here is the function call and most of it should be stock from the example.

Changes from weekly data to monthly
Change target column
No xregs
No bootstrapping
Metrics = "RMSE"

Build forecast

CatBoostResults <- RemixAutoML::AutoCatBoostVectorCARMA(

# data args
data = data, 
TimeWeights = NULL,
TargetColumnName = c("TGM"),
DateColumnName = "Date",
HierarchGroups = NULL,
GroupVariables = c("GCN"),
TimeUnit = "months",
TimeGroups = c("months"),

# Production args
TrainOnFull = TRUE,
PartitionType = "timeseries",
FC_Periods = 4,
Timer = TRUE,
DebugMode = TRUE,

# Target transformations
TargetTransformation = TRUE,
Methods = c("BoxCox", "Asinh", "Asin", "Log",
            "LogPlus1", "Logit", "YeoJohnson"),
Difference = FALSE,
NonNegativePred = FALSE,
RoundPreds = FALSE,

# Date features
CalendarVariables = c( "month", "quarter"),
HolidayVariable = c("USPublicHolidays",
                    "EasterGroup",
                    "ChristmasGroup","OtherEcclesticalFeasts"),
HolidayLookback = NULL,
HolidayLags = 1,
HolidayMovingAverages = 1:2,

# Time series features
Lags = list("months" = c(1:3)),
MA_Periods = list("months" = c(2,3)),
SD_Periods = NULL,
Skew_Periods = NULL,
Kurt_Periods = NULL,
Quantile_Periods = NULL,
Quantiles_Selected = c("q5","q95"),

# Bonus features
AnomalyDetection = NULL,
XREGS = NULL,
FourierTerms = 2,
TimeTrendVariable = TRUE,
ZeroPadSeries = NULL,
DataTruncate = FALSE,

# ML Args
NumOfParDepPlots = 100L,
EvalMetric = "RMSE",
EvalMetricValue = 1.5,
LossFunction = "RMSE",
LossFunctionValue = 1.5,
GridTune = FALSE,
PassInGrid = NULL,
ModelCount = 5,
TaskType = "GPU",
NumGPU = 1,
MaxRunsWithoutNewWinner = 50,
MaxRunMinutes = 60*60,
Langevin = FALSE,
DiffusionTemperature = 10000,
NTrees = 500,
L2_Leaf_Reg = 3.0,
RandomStrength = 1,
BorderCount = 254,
BootStrapType = c("No"),
Depth = 6)

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet I probably should've asked for your code setup earlier in the conversation. I pasted your setup with AutoCatBoostCARMA() instead of AutoCatBoostVectorCARMA(). The latter is intended for use with multiple target variable columns and the MultiRMSE loss and eval metrics. I set it up to run on CPU but if you have a GPU you can switch up the TaskType argument to "GPU". Also, the BootStrapType would automatically get switched to "Bayesian" instead of "MVS" since "MVS" cannot be used with GPU but it's probably better practice to set the args up properly vs letting the function make the corrections internally.

CatBoostResults <- RemixAutoML::AutoCatBoostCARMA(

  # data args
  data = data_new,
  TimeWeights = NULL,
  TargetColumnName = "TGM",
  DateColumnName = "Date",
  HierarchGroups = NULL,
  GroupVariables = c("GCN"),
  TimeUnit = "months",
  TimeGroups = "months",

  # Production args
  TrainOnFull = TRUE,
  SplitRatios = c(0.95,0.05,
  PartitionType = "random",
  FC_Periods = 4,
  TaskType = "CPU",
  NumGPU = 1,
  Timer = TRUE,
  DebugMode = TRUE,

  # Target variable transformations
  TargetTransformation = TRUE,
  Methods = c("YeoJohnson", "BoxCox", "Asinh", "Log", "LogPlus1", "Sqrt", "Asin", "Logit"),
  Difference = FALSE,
  NonNegativePred = FALSE,
  RoundPreds = FALSE,

  # Calendar-related features
  CalendarVariables = c("month","quarter"),
  HolidayVariable = c("USPublicHolidays","EasterGroup","ChristmasGroup","OtherEcclesticalFeasts"),
  HolidayLookback = NULL,
  HolidayLags = c(1),
  HolidayMovingAverages = c(2,3),

  # Lags, moving averages, and other rolling stats
  Lags = list("months" = c(1:3)),
  MA_Periods = list("months" = c(2,3)),
  SD_Periods = NULL,
  Skew_Periods = NULL,
  Kurt_Periods = NULL,
  Quantile_Periods = NULL,
  Quantiles_Selected = NULL,

  # Bonus features
  AnomalyDetection = NULL,
  XREGS = xregs_new,
  FourierTerms = 2,
  TimeTrendVariable = TRUE,
  ZeroPadSeries = NULL,
  DataTruncate = FALSE,

  # ML grid tuning args
  GridTune = FALSE,
  PassInGrid = NULL,
  ModelCount = 5,
  MaxRunsWithoutNewWinner = 50,
  MaxRunMinutes = 60*60,

  # ML evaluation output
  PDFOutputPath = NULL,
  SaveDataPath = NULL,
  NumOfParDepPlots = 0L,

  # ML loss functions
  EvalMetric = "RMSE",
  EvalMetricValue = 1,
  LossFunction = "RMSE",
  LossFunctionValue = 1,

  # ML tuning args
  NTrees = 500L,
  Depth = 9,
  L2_Leaf_Reg = 3.0,
  LearningRate = NULL,
  Langevin = TRUE,
  DiffusionTemperature = 10000,
  RandomStrength = 1.0,
  BorderCount = 254,
  RSM = 1,
  GrowPolicy = "SymmetricTree",
  BootStrapType = "MVS",
  ModelSizeReg = 0.0,
  FeatureBorderType = "GreedyLogSum",
  SamplingUnit = "Group",
  SubSample = NULL,
  ScoreFunction = "Cosine",
  MinDataInLeaf = 1)

from autoquant.

PyntieHet avatar PyntieHet commented on September 25, 2024

I see. That got it to run to completion. Thank you for your time and clarifications.

from autoquant.

AdrianAntico avatar AdrianAntico commented on September 25, 2024

@PyntieHet that's great! Thanks for raising the ticket because it did uncover a bug on my side.

from autoquant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.