fedegarzar
fedegarzar OP t1_jaev47v wrote
Reply to comment by cristianic18 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar
That's an interesting question. Behind the scenes, BigQuery uses an auto Arima model to extrapolate the trend of the time series after deseasonalizing them (https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-time-series). I would say that the complexity of the pipeline makes it slower (also our implementations use numba which speeds up the fitting time).
fedegarzar OP t1_jaeue2g wrote
Reply to comment by MyActualUserName99 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar
Yes, I agree with your intuitions. However, we used the datasets from the official BigQuery tutorial (https://cloud.google.com/bigquery-ml/docs/arima-speed-up-tutorial). In particular, it isn't easy to generalize in time series forecasting due to the diversity of the datasets of the field. The central intuition of the experiment is that running less sophisticated methods and pipelines could be a better practice before using AutoML as is.
fedegarzar OP t1_j04jp9e wrote
Reply to comment by xgboostftw in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar
Here are the results: https://github.com/Nixtla/statsforecast/tree/main/experiments/amazon_forecast
Here is the step-by-step guide to reproduce results: https://nixtla.github.io/statsforecast/examples/aws/statsforecast.html
Here are the steps for Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html
Here is the data:
Train set: https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet
Temporal exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/temporal.parquet
Static exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/static.parquet
fedegarzar OP t1_j048qe0 wrote
Reply to comment by xgboostftw in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar
Here is the step-by-step guide to reproducing Amazon Forecast: https://nixtla.github.io/statsforecast/examples/aws/amazonforecast.html
As you can see, all the exogenous variables of M5 are included in Amazon Forecast.
Concretely, if you read the same link you posted, we even provide links to the Static and temporal exogenous variables you mention.
From the ReadMe:
The data are ready for download at the following URLs:
- Train set: https://m5-benchmarks.s3.amazonaws.com/data/train/target.parquet
- Temporal exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/temporal.parquet
- Static exogenous variables (used by AmazonForecast): https://m5-benchmarks.s3.amazonaws.com/data/train/static.parquet
fedegarzar OP t1_izycx10 wrote
Reply to comment by Mark8472 in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar
-
We did not run those experiments. But in our opinion, it's easier to maintain a python pipeline than using the UI or CLI of AWS.
-
In terms of scalability, I think StatsForecast wins by far, given that it takes a lot less time to compute and supports integration with spark and ray.
-
The point of the whole experiment is to show that the AutoML solution is far more expensive in the long run.
fedegarzar OP t1_jaevmj5 wrote
Reply to comment by More-Horse-3281 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar
I agree. Overfitting is a common problem in AutoML solutions. A proper validation strategy should improve the performance in unseen data, but in our experience, most of the AutoML solutions lack this feature.