fedegarzar OP t1_jaevmj5 wrote on February 28, 2023 at 10:52 PM

Reply to comment by More-Horse-3281 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar

I agree. Overfitting is a common problem in AutoML solutions. A proper validation strategy should improve the performance in unseen data, but in our experience, most of the AutoML solutions lack this feature.

fedegarzar OP t1_jaev47v wrote on February 28, 2023 at 10:48 PM

Reply to comment by cristianic18 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar

That's an interesting question. Behind the scenes, BigQuery uses an auto Arima model to extrapolate the trend of the time series after deseasonalizing them (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-time-series). I would say that the complexity of the pipeline makes it slower (also our implementations use numba which speeds up the fitting time).

fedegarzar OP t1_jaeue2g wrote on February 28, 2023 at 10:43 PM

Reply to comment by MyActualUserName99 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar

Yes, I agree with your intuitions. However, we used the datasets from the official BigQuery tutorial (https://cloud.google.com/bigquery-ml/docs/arima-speed-up-tutorial). In particular, it isn't easy to generalize in time series forecasting due to the diversity of the datasets of the field. The central intuition of the experiment is that running less sophisticated methods and pipelines could be a better practice before using AutoML as is.

fedegarzar OP t1_j04jp9e wrote on December 14, 2022 at 12:40 AM

Reply to comment by xgboostftw in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar

Here are the results: https://github.com/Nixtla/statsforecast/tree/main/experiments/amazon_forecast
Here is the step-by-step guide to reproduce results: https://nixtla.github.io/statsforecast/examples/aws/statsforecast.html
Here are the steps for Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html

Here is the data:
Train set: https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet
Temporal exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/temporal.parquet
Static exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/static.parquet

fedegarzar OP t1_j048qe0 wrote on December 13, 2022 at 11:21 PM

Reply to comment by xgboostftw in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar

Here is the step-by-step guide to reproducing Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html

As you can see, all the exogenous variables of M5 are included in Amazon Forecast.

Concretely, if you read the same link you posted, we even provide links to the Static and temporal exogenous variables you mention.

From the ReadMe:

The data are ready for download at the following URLs:

Train set: https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet
Temporal exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/temporal.parquet
Static exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/static.parquet

fedegarzar OP t1_izycx10 wrote on December 12, 2022 at 7:28 PM

Reply to comment by Mark8472 in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar

We did not run those experiments. But in our opinion, it's easier to maintain a python pipeline than using the UI or CLI of AWS.
In terms of scalability, I think StatsForecast wins by far, given that it takes a lot less time to compute and supports integration with spark and ray.
The point of the whole experiment is to show that the AutoML solution is far more expensive in the long run.