')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "source": [
+ "Displaying the run objects gives you links to the visual tools in the Azure Portal. Go try them!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "source": [
+ "### Retrieve the Best Model for Each Algorithm\n",
+ "Below we select the best pipeline from our iterations. The get_output method on automl_classifier returns the best run and the fitted model for the last fit invocation. There are overloads on get_output that allow you to retrieve the best run and fitted model for any logged metric or a particular iteration."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "from helper import get_result_df\n",
+ "\n",
+ "summary_df = get_result_df(remote_run)\n",
+ "summary_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core.run import Run\n",
+ "from azureml.widgets import RunDetails\n",
+ "\n",
+ "forecast_model = \"TCNForecaster\"\n",
+ "if not forecast_model in summary_df[\"run_id\"]:\n",
+ " forecast_model = \"ForecastTCN\"\n",
+ "\n",
+ "best_dnn_run_id = summary_df[\"run_id\"][forecast_model]\n",
+ "best_dnn_run = Run(experiment, best_dnn_run_id)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "best_dnn_run.parent\n",
+ "RunDetails(best_dnn_run.parent).show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "best_dnn_run\n",
+ "RunDetails(best_dnn_run).show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "source": [
+ "## Evaluate on Test Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "source": [
+ "We now use the best fitted model from the AutoML Run to make forecasts for the test set. \n",
+ "\n",
+ "We always score on the original dataset whose schema matches the training set schema."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core import Dataset\n",
+ "\n",
+ "test_dataset = Dataset.Tabular.from_delimited_files(\n",
+ " path=[(datastore, \"github-dataset/tabular/test.csv\")]\n",
+ ")\n",
+ "# preview the first 3 rows of the dataset\n",
+ "test_dataset.take(5).to_pandas_dataframe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "compute_target = ws.compute_targets[\"github-cluster\"]\n",
+ "test_experiment = Experiment(ws, experiment_name + \"_test\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import shutil\n",
+ "\n",
+ "script_folder = os.path.join(os.getcwd(), \"inference\")\n",
+ "os.makedirs(script_folder, exist_ok=True)\n",
+ "shutil.copy(\"infer.py\", script_folder)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from helper import run_inference\n",
+ "\n",
+ "test_run = run_inference(\n",
+ " test_experiment,\n",
+ " compute_target,\n",
+ " script_folder,\n",
+ " best_dnn_run,\n",
+ " test_dataset,\n",
+ " valid_dataset,\n",
+ " forecast_horizon,\n",
+ " target_column_name,\n",
+ " time_column_name,\n",
+ " freq,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "RunDetails(test_run).show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from helper import run_multiple_inferences\n",
+ "\n",
+ "summary_df = run_multiple_inferences(\n",
+ " summary_df,\n",
+ " experiment,\n",
+ " test_experiment,\n",
+ " compute_target,\n",
+ " script_folder,\n",
+ " test_dataset,\n",
+ " valid_dataset,\n",
+ " forecast_horizon,\n",
+ " target_column_name,\n",
+ " time_column_name,\n",
+ " freq,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "for run_name, run_summary in summary_df.iterrows():\n",
+ " print(run_name)\n",
+ " print(run_summary)\n",
+ " run_id = run_summary.run_id\n",
+ " test_run_id = run_summary.test_run_id\n",
+ " test_run = Run(test_experiment, test_run_id)\n",
+ " test_run.wait_for_completion()\n",
+ " test_score = test_run.get_metrics()[run_summary.primary_metric]\n",
+ " summary_df.loc[summary_df.run_id == run_id, \"Test Score\"] = test_score\n",
+ " print(\"Test Score: \", test_score)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "hideCode": false,
+ "hidePrompt": false
+ },
+ "outputs": [],
+ "source": [
+ "summary_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "authors": [
+ {
+ "name": "jialiu"
+ }
+ ],
+ "hide_code_all_hidden": false,
+ "kernelspec": {
+ "display_name": "Python 3.6 - AzureML",
+ "language": "python",
+ "name": "python3-azureml"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.9"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_test.csv b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_test.csv
new file mode 100644
index 000000000..6061b0d21
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_test.csv
@@ -0,0 +1,455 @@
+date,count,day_of_week,month_of_year,holiday
+2017-06-04,104663,6.0,5.0,0.0
+2017-06-05,155824,0.0,5.0,0.0
+2017-06-06,164908,1.0,5.0,0.0
+2017-06-07,170309,2.0,5.0,0.0
+2017-06-08,164256,3.0,5.0,0.0
+2017-06-09,153406,4.0,5.0,0.0
+2017-06-10,97024,5.0,5.0,0.0
+2017-06-11,103442,6.0,5.0,0.0
+2017-06-12,160768,0.0,5.0,0.0
+2017-06-13,166288,1.0,5.0,0.0
+2017-06-14,163819,2.0,5.0,0.0
+2017-06-15,157593,3.0,5.0,0.0
+2017-06-16,149259,4.0,5.0,0.0
+2017-06-17,95579,5.0,5.0,0.0
+2017-06-18,98723,6.0,5.0,0.0
+2017-06-19,159076,0.0,5.0,0.0
+2017-06-20,163340,1.0,5.0,0.0
+2017-06-21,163344,2.0,5.0,0.0
+2017-06-22,159528,3.0,5.0,0.0
+2017-06-23,146563,4.0,5.0,0.0
+2017-06-24,92631,5.0,5.0,0.0
+2017-06-25,96549,6.0,5.0,0.0
+2017-06-26,153249,0.0,5.0,0.0
+2017-06-27,160357,1.0,5.0,0.0
+2017-06-28,159941,2.0,5.0,0.0
+2017-06-29,156781,3.0,5.0,0.0
+2017-06-30,144709,4.0,5.0,0.0
+2017-07-01,89101,5.0,6.0,0.0
+2017-07-02,93046,6.0,6.0,0.0
+2017-07-03,144113,0.0,6.0,0.0
+2017-07-04,143061,1.0,6.0,1.0
+2017-07-05,154603,2.0,6.0,0.0
+2017-07-06,157200,3.0,6.0,0.0
+2017-07-07,147213,4.0,6.0,0.0
+2017-07-08,92348,5.0,6.0,0.0
+2017-07-09,97018,6.0,6.0,0.0
+2017-07-10,157192,0.0,6.0,0.0
+2017-07-11,161819,1.0,6.0,0.0
+2017-07-12,161998,2.0,6.0,0.0
+2017-07-13,160280,3.0,6.0,0.0
+2017-07-14,146818,4.0,6.0,0.0
+2017-07-15,93041,5.0,6.0,0.0
+2017-07-16,97505,6.0,6.0,0.0
+2017-07-17,156167,0.0,6.0,0.0
+2017-07-18,162855,1.0,6.0,0.0
+2017-07-19,162519,2.0,6.0,0.0
+2017-07-20,159941,3.0,6.0,0.0
+2017-07-21,148460,4.0,6.0,0.0
+2017-07-22,93431,5.0,6.0,0.0
+2017-07-23,98553,6.0,6.0,0.0
+2017-07-24,156202,0.0,6.0,0.0
+2017-07-25,162503,1.0,6.0,0.0
+2017-07-26,158479,2.0,6.0,0.0
+2017-07-27,158192,3.0,6.0,0.0
+2017-07-28,147108,4.0,6.0,0.0
+2017-07-29,93799,5.0,6.0,0.0
+2017-07-30,97920,6.0,6.0,0.0
+2017-07-31,152197,0.0,6.0,0.0
+2017-08-01,158477,1.0,7.0,0.0
+2017-08-02,159089,2.0,7.0,0.0
+2017-08-03,157182,3.0,7.0,0.0
+2017-08-04,146345,4.0,7.0,0.0
+2017-08-05,92534,5.0,7.0,0.0
+2017-08-06,97128,6.0,7.0,0.0
+2017-08-07,151359,0.0,7.0,0.0
+2017-08-08,159895,1.0,7.0,0.0
+2017-08-09,158329,2.0,7.0,0.0
+2017-08-10,155468,3.0,7.0,0.0
+2017-08-11,144914,4.0,7.0,0.0
+2017-08-12,92258,5.0,7.0,0.0
+2017-08-13,95933,6.0,7.0,0.0
+2017-08-14,147706,0.0,7.0,0.0
+2017-08-15,151115,1.0,7.0,0.0
+2017-08-16,157640,2.0,7.0,0.0
+2017-08-17,156600,3.0,7.0,0.0
+2017-08-18,146980,4.0,7.0,0.0
+2017-08-19,94592,5.0,7.0,0.0
+2017-08-20,99320,6.0,7.0,0.0
+2017-08-21,145727,0.0,7.0,0.0
+2017-08-22,160260,1.0,7.0,0.0
+2017-08-23,160440,2.0,7.0,0.0
+2017-08-24,157830,3.0,7.0,0.0
+2017-08-25,145822,4.0,7.0,0.0
+2017-08-26,94706,5.0,7.0,0.0
+2017-08-27,99047,6.0,7.0,0.0
+2017-08-28,152112,0.0,7.0,0.0
+2017-08-29,162440,1.0,7.0,0.0
+2017-08-30,162902,2.0,7.0,0.0
+2017-08-31,159498,3.0,7.0,0.0
+2017-09-01,145689,4.0,8.0,0.0
+2017-09-02,93589,5.0,8.0,0.0
+2017-09-03,100058,6.0,8.0,0.0
+2017-09-04,140865,0.0,8.0,1.0
+2017-09-05,165715,1.0,8.0,0.0
+2017-09-06,167463,2.0,8.0,0.0
+2017-09-07,164811,3.0,8.0,0.0
+2017-09-08,156157,4.0,8.0,0.0
+2017-09-09,101358,5.0,8.0,0.0
+2017-09-10,107915,6.0,8.0,0.0
+2017-09-11,167845,0.0,8.0,0.0
+2017-09-12,172756,1.0,8.0,0.0
+2017-09-13,172851,2.0,8.0,0.0
+2017-09-14,171675,3.0,8.0,0.0
+2017-09-15,159266,4.0,8.0,0.0
+2017-09-16,103547,5.0,8.0,0.0
+2017-09-17,110964,6.0,8.0,0.0
+2017-09-18,170976,0.0,8.0,0.0
+2017-09-19,177864,1.0,8.0,0.0
+2017-09-20,173567,2.0,8.0,0.0
+2017-09-21,172017,3.0,8.0,0.0
+2017-09-22,161357,4.0,8.0,0.0
+2017-09-23,104681,5.0,8.0,0.0
+2017-09-24,111711,6.0,8.0,0.0
+2017-09-25,173517,0.0,8.0,0.0
+2017-09-26,180049,1.0,8.0,0.0
+2017-09-27,178307,2.0,8.0,0.0
+2017-09-28,174157,3.0,8.0,0.0
+2017-09-29,161707,4.0,8.0,0.0
+2017-09-30,110536,5.0,8.0,0.0
+2017-10-01,106505,6.0,9.0,0.0
+2017-10-02,157565,0.0,9.0,0.0
+2017-10-03,164764,1.0,9.0,0.0
+2017-10-04,163383,2.0,9.0,0.0
+2017-10-05,162847,3.0,9.0,0.0
+2017-10-06,153575,4.0,9.0,0.0
+2017-10-07,107472,5.0,9.0,0.0
+2017-10-08,116127,6.0,9.0,0.0
+2017-10-09,174457,0.0,9.0,1.0
+2017-10-10,185217,1.0,9.0,0.0
+2017-10-11,185120,2.0,9.0,0.0
+2017-10-12,180844,3.0,9.0,0.0
+2017-10-13,170178,4.0,9.0,0.0
+2017-10-14,112754,5.0,9.0,0.0
+2017-10-15,121251,6.0,9.0,0.0
+2017-10-16,183906,0.0,9.0,0.0
+2017-10-17,188945,1.0,9.0,0.0
+2017-10-18,187297,2.0,9.0,0.0
+2017-10-19,183867,3.0,9.0,0.0
+2017-10-20,173021,4.0,9.0,0.0
+2017-10-21,115851,5.0,9.0,0.0
+2017-10-22,126088,6.0,9.0,0.0
+2017-10-23,189452,0.0,9.0,0.0
+2017-10-24,194412,1.0,9.0,0.0
+2017-10-25,192293,2.0,9.0,0.0
+2017-10-26,190163,3.0,9.0,0.0
+2017-10-27,177053,4.0,9.0,0.0
+2017-10-28,114934,5.0,9.0,0.0
+2017-10-29,125289,6.0,9.0,0.0
+2017-10-30,189245,0.0,9.0,0.0
+2017-10-31,191480,1.0,9.0,0.0
+2017-11-01,182281,2.0,10.0,0.0
+2017-11-02,186351,3.0,10.0,0.0
+2017-11-03,175422,4.0,10.0,0.0
+2017-11-04,118160,5.0,10.0,0.0
+2017-11-05,127602,6.0,10.0,0.0
+2017-11-06,191067,0.0,10.0,0.0
+2017-11-07,197083,1.0,10.0,0.0
+2017-11-08,194333,2.0,10.0,0.0
+2017-11-09,193914,3.0,10.0,0.0
+2017-11-10,179933,4.0,10.0,1.0
+2017-11-11,121346,5.0,10.0,0.0
+2017-11-12,131900,6.0,10.0,0.0
+2017-11-13,196969,0.0,10.0,0.0
+2017-11-14,201949,1.0,10.0,0.0
+2017-11-15,198424,2.0,10.0,0.0
+2017-11-16,196902,3.0,10.0,0.0
+2017-11-17,183893,4.0,10.0,0.0
+2017-11-18,122767,5.0,10.0,0.0
+2017-11-19,130890,6.0,10.0,0.0
+2017-11-20,194515,0.0,10.0,0.0
+2017-11-21,198601,1.0,10.0,0.0
+2017-11-22,191041,2.0,10.0,0.0
+2017-11-23,170321,3.0,10.0,1.0
+2017-11-24,155623,4.0,10.0,0.0
+2017-11-25,115759,5.0,10.0,0.0
+2017-11-26,128771,6.0,10.0,0.0
+2017-11-27,199419,0.0,10.0,0.0
+2017-11-28,207253,1.0,10.0,0.0
+2017-11-29,205406,2.0,10.0,0.0
+2017-11-30,200674,3.0,10.0,0.0
+2017-12-01,187017,4.0,11.0,0.0
+2017-12-02,129735,5.0,11.0,0.0
+2017-12-03,139120,6.0,11.0,0.0
+2017-12-04,205505,0.0,11.0,0.0
+2017-12-05,208218,1.0,11.0,0.0
+2017-12-06,202480,2.0,11.0,0.0
+2017-12-07,197822,3.0,11.0,0.0
+2017-12-08,180686,4.0,11.0,0.0
+2017-12-09,123667,5.0,11.0,0.0
+2017-12-10,130987,6.0,11.0,0.0
+2017-12-11,193901,0.0,11.0,0.0
+2017-12-12,194997,1.0,11.0,0.0
+2017-12-13,192063,2.0,11.0,0.0
+2017-12-14,186496,3.0,11.0,0.0
+2017-12-15,170812,4.0,11.0,0.0
+2017-12-16,110474,5.0,11.0,0.0
+2017-12-17,118165,6.0,11.0,0.0
+2017-12-18,176843,0.0,11.0,0.0
+2017-12-19,179550,1.0,11.0,0.0
+2017-12-20,173506,2.0,11.0,0.0
+2017-12-21,165910,3.0,11.0,0.0
+2017-12-22,145886,4.0,11.0,0.0
+2017-12-23,95246,5.0,11.0,0.0
+2017-12-24,88781,6.0,11.0,0.0
+2017-12-25,98189,0.0,11.0,1.0
+2017-12-26,121383,1.0,11.0,0.0
+2017-12-27,135300,2.0,11.0,0.0
+2017-12-28,136827,3.0,11.0,0.0
+2017-12-29,127700,4.0,11.0,0.0
+2017-12-30,93014,5.0,11.0,0.0
+2017-12-31,82878,6.0,11.0,0.0
+2018-01-01,86419,0.0,0.0,1.0
+2018-01-02,147428,1.0,0.0,0.0
+2018-01-03,162193,2.0,0.0,0.0
+2018-01-04,163784,3.0,0.0,0.0
+2018-01-05,158606,4.0,0.0,0.0
+2018-01-06,113467,5.0,0.0,0.0
+2018-01-07,118313,6.0,0.0,0.0
+2018-01-08,175623,0.0,0.0,0.0
+2018-01-09,183880,1.0,0.0,0.0
+2018-01-10,183945,2.0,0.0,0.0
+2018-01-11,181769,3.0,0.0,0.0
+2018-01-12,170552,4.0,0.0,0.0
+2018-01-13,115707,5.0,0.0,0.0
+2018-01-14,121191,6.0,0.0,0.0
+2018-01-15,176127,0.0,0.0,1.0
+2018-01-16,188032,1.0,0.0,0.0
+2018-01-17,189871,2.0,0.0,0.0
+2018-01-18,189348,3.0,0.0,0.0
+2018-01-19,177456,4.0,0.0,0.0
+2018-01-20,123321,5.0,0.0,0.0
+2018-01-21,128306,6.0,0.0,0.0
+2018-01-22,186132,0.0,0.0,0.0
+2018-01-23,197618,1.0,0.0,0.0
+2018-01-24,196402,2.0,0.0,0.0
+2018-01-25,192722,3.0,0.0,0.0
+2018-01-26,179415,4.0,0.0,0.0
+2018-01-27,125769,5.0,0.0,0.0
+2018-01-28,133306,6.0,0.0,0.0
+2018-01-29,194151,0.0,0.0,0.0
+2018-01-30,198680,1.0,0.0,0.0
+2018-01-31,198652,2.0,0.0,0.0
+2018-02-01,195472,3.0,1.0,0.0
+2018-02-02,183173,4.0,1.0,0.0
+2018-02-03,124276,5.0,1.0,0.0
+2018-02-04,129054,6.0,1.0,0.0
+2018-02-05,190024,0.0,1.0,0.0
+2018-02-06,198658,1.0,1.0,0.0
+2018-02-07,198272,2.0,1.0,0.0
+2018-02-08,195339,3.0,1.0,0.0
+2018-02-09,183086,4.0,1.0,0.0
+2018-02-10,122536,5.0,1.0,0.0
+2018-02-11,133033,6.0,1.0,0.0
+2018-02-12,185386,0.0,1.0,0.0
+2018-02-13,184789,1.0,1.0,0.0
+2018-02-14,176089,2.0,1.0,0.0
+2018-02-15,171317,3.0,1.0,0.0
+2018-02-16,162693,4.0,1.0,0.0
+2018-02-17,116342,5.0,1.0,0.0
+2018-02-18,122466,6.0,1.0,0.0
+2018-02-19,172364,0.0,1.0,1.0
+2018-02-20,185896,1.0,1.0,0.0
+2018-02-21,188166,2.0,1.0,0.0
+2018-02-22,189427,3.0,1.0,0.0
+2018-02-23,178732,4.0,1.0,0.0
+2018-02-24,132664,5.0,1.0,0.0
+2018-02-25,134008,6.0,1.0,0.0
+2018-02-26,200075,0.0,1.0,0.0
+2018-02-27,207996,1.0,1.0,0.0
+2018-02-28,204416,2.0,1.0,0.0
+2018-03-01,201320,3.0,2.0,0.0
+2018-03-02,188205,4.0,2.0,0.0
+2018-03-03,131162,5.0,2.0,0.0
+2018-03-04,138320,6.0,2.0,0.0
+2018-03-05,207326,0.0,2.0,0.0
+2018-03-06,212462,1.0,2.0,0.0
+2018-03-07,209357,2.0,2.0,0.0
+2018-03-08,194876,3.0,2.0,0.0
+2018-03-09,193761,4.0,2.0,0.0
+2018-03-10,133449,5.0,2.0,0.0
+2018-03-11,142258,6.0,2.0,0.0
+2018-03-12,208753,0.0,2.0,0.0
+2018-03-13,210602,1.0,2.0,0.0
+2018-03-14,214236,2.0,2.0,0.0
+2018-03-15,210761,3.0,2.0,0.0
+2018-03-16,196619,4.0,2.0,0.0
+2018-03-17,133056,5.0,2.0,0.0
+2018-03-18,141335,6.0,2.0,0.0
+2018-03-19,211580,0.0,2.0,0.0
+2018-03-20,219051,1.0,2.0,0.0
+2018-03-21,215435,2.0,2.0,0.0
+2018-03-22,211961,3.0,2.0,0.0
+2018-03-23,196009,4.0,2.0,0.0
+2018-03-24,132390,5.0,2.0,0.0
+2018-03-25,140021,6.0,2.0,0.0
+2018-03-26,205273,0.0,2.0,0.0
+2018-03-27,212686,1.0,2.0,0.0
+2018-03-28,210683,2.0,2.0,0.0
+2018-03-29,189044,3.0,2.0,0.0
+2018-03-30,170256,4.0,2.0,0.0
+2018-03-31,125999,5.0,2.0,0.0
+2018-04-01,126749,6.0,3.0,0.0
+2018-04-02,186546,0.0,3.0,0.0
+2018-04-03,207905,1.0,3.0,0.0
+2018-04-04,201528,2.0,3.0,0.0
+2018-04-05,188580,3.0,3.0,0.0
+2018-04-06,173714,4.0,3.0,0.0
+2018-04-07,125723,5.0,3.0,0.0
+2018-04-08,142545,6.0,3.0,0.0
+2018-04-09,204767,0.0,3.0,0.0
+2018-04-10,212048,1.0,3.0,0.0
+2018-04-11,210517,2.0,3.0,0.0
+2018-04-12,206924,3.0,3.0,0.0
+2018-04-13,191679,4.0,3.0,0.0
+2018-04-14,126394,5.0,3.0,0.0
+2018-04-15,137279,6.0,3.0,0.0
+2018-04-16,208085,0.0,3.0,0.0
+2018-04-17,213273,1.0,3.0,0.0
+2018-04-18,211580,2.0,3.0,0.0
+2018-04-19,206037,3.0,3.0,0.0
+2018-04-20,191211,4.0,3.0,0.0
+2018-04-21,125564,5.0,3.0,0.0
+2018-04-22,136469,6.0,3.0,0.0
+2018-04-23,206288,0.0,3.0,0.0
+2018-04-24,212115,1.0,3.0,0.0
+2018-04-25,207948,2.0,3.0,0.0
+2018-04-26,205759,3.0,3.0,0.0
+2018-04-27,181330,4.0,3.0,0.0
+2018-04-28,130046,5.0,3.0,0.0
+2018-04-29,120802,6.0,3.0,0.0
+2018-04-30,170390,0.0,3.0,0.0
+2018-05-01,169054,1.0,4.0,0.0
+2018-05-02,197891,2.0,4.0,0.0
+2018-05-03,199820,3.0,4.0,0.0
+2018-05-04,186783,4.0,4.0,0.0
+2018-05-05,124420,5.0,4.0,0.0
+2018-05-06,130666,6.0,4.0,0.0
+2018-05-07,196014,0.0,4.0,0.0
+2018-05-08,203058,1.0,4.0,0.0
+2018-05-09,198582,2.0,4.0,0.0
+2018-05-10,191321,3.0,4.0,0.0
+2018-05-11,183639,4.0,4.0,0.0
+2018-05-12,122023,5.0,4.0,0.0
+2018-05-13,128775,6.0,4.0,0.0
+2018-05-14,199104,0.0,4.0,0.0
+2018-05-15,200658,1.0,4.0,0.0
+2018-05-16,201541,2.0,4.0,0.0
+2018-05-17,196886,3.0,4.0,0.0
+2018-05-18,188597,4.0,4.0,0.0
+2018-05-19,121392,5.0,4.0,0.0
+2018-05-20,126981,6.0,4.0,0.0
+2018-05-21,189291,0.0,4.0,0.0
+2018-05-22,203038,1.0,4.0,0.0
+2018-05-23,205330,2.0,4.0,0.0
+2018-05-24,199208,3.0,4.0,0.0
+2018-05-25,187768,4.0,4.0,0.0
+2018-05-26,117635,5.0,4.0,0.0
+2018-05-27,124352,6.0,4.0,0.0
+2018-05-28,180398,0.0,4.0,1.0
+2018-05-29,194170,1.0,4.0,0.0
+2018-05-30,200281,2.0,4.0,0.0
+2018-05-31,197244,3.0,4.0,0.0
+2018-06-01,184037,4.0,5.0,0.0
+2018-06-02,121135,5.0,5.0,0.0
+2018-06-03,129389,6.0,5.0,0.0
+2018-06-04,200331,0.0,5.0,0.0
+2018-06-05,207735,1.0,5.0,0.0
+2018-06-06,203354,2.0,5.0,0.0
+2018-06-07,200520,3.0,5.0,0.0
+2018-06-08,182038,4.0,5.0,0.0
+2018-06-09,120164,5.0,5.0,0.0
+2018-06-10,125256,6.0,5.0,0.0
+2018-06-11,194786,0.0,5.0,0.0
+2018-06-12,200815,1.0,5.0,0.0
+2018-06-13,197740,2.0,5.0,0.0
+2018-06-14,192294,3.0,5.0,0.0
+2018-06-15,173587,4.0,5.0,0.0
+2018-06-16,105955,5.0,5.0,0.0
+2018-06-17,110780,6.0,5.0,0.0
+2018-06-18,174582,0.0,5.0,0.0
+2018-06-19,193310,1.0,5.0,0.0
+2018-06-20,193062,2.0,5.0,0.0
+2018-06-21,187986,3.0,5.0,0.0
+2018-06-22,173606,4.0,5.0,0.0
+2018-06-23,111795,5.0,5.0,0.0
+2018-06-24,116134,6.0,5.0,0.0
+2018-06-25,185919,0.0,5.0,0.0
+2018-06-26,193142,1.0,5.0,0.0
+2018-06-27,188114,2.0,5.0,0.0
+2018-06-28,183737,3.0,5.0,0.0
+2018-06-29,171496,4.0,5.0,0.0
+2018-06-30,107210,5.0,5.0,0.0
+2018-07-01,111053,6.0,6.0,0.0
+2018-07-02,176198,0.0,6.0,0.0
+2018-07-03,184040,1.0,6.0,0.0
+2018-07-04,169783,2.0,6.0,1.0
+2018-07-05,177996,3.0,6.0,0.0
+2018-07-06,167378,4.0,6.0,0.0
+2018-07-07,106401,5.0,6.0,0.0
+2018-07-08,112327,6.0,6.0,0.0
+2018-07-09,182835,0.0,6.0,0.0
+2018-07-10,187694,1.0,6.0,0.0
+2018-07-11,185762,2.0,6.0,0.0
+2018-07-12,184099,3.0,6.0,0.0
+2018-07-13,170860,4.0,6.0,0.0
+2018-07-14,106799,5.0,6.0,0.0
+2018-07-15,108475,6.0,6.0,0.0
+2018-07-16,175704,0.0,6.0,0.0
+2018-07-17,183596,1.0,6.0,0.0
+2018-07-18,179897,2.0,6.0,0.0
+2018-07-19,183373,3.0,6.0,0.0
+2018-07-20,169626,4.0,6.0,0.0
+2018-07-21,106785,5.0,6.0,0.0
+2018-07-22,112387,6.0,6.0,0.0
+2018-07-23,180572,0.0,6.0,0.0
+2018-07-24,186943,1.0,6.0,0.0
+2018-07-25,185744,2.0,6.0,0.0
+2018-07-26,183117,3.0,6.0,0.0
+2018-07-27,168526,4.0,6.0,0.0
+2018-07-28,105936,5.0,6.0,0.0
+2018-07-29,111708,6.0,6.0,0.0
+2018-07-30,179950,0.0,6.0,0.0
+2018-07-31,185930,1.0,6.0,0.0
+2018-08-01,183366,2.0,7.0,0.0
+2018-08-02,182412,3.0,7.0,0.0
+2018-08-03,173429,4.0,7.0,0.0
+2018-08-04,106108,5.0,7.0,0.0
+2018-08-05,110059,6.0,7.0,0.0
+2018-08-06,178355,0.0,7.0,0.0
+2018-08-07,185518,1.0,7.0,0.0
+2018-08-08,183204,2.0,7.0,0.0
+2018-08-09,181276,3.0,7.0,0.0
+2018-08-10,168297,4.0,7.0,0.0
+2018-08-11,106488,5.0,7.0,0.0
+2018-08-12,111786,6.0,7.0,0.0
+2018-08-13,178620,0.0,7.0,0.0
+2018-08-14,181922,1.0,7.0,0.0
+2018-08-15,172198,2.0,7.0,0.0
+2018-08-16,177367,3.0,7.0,0.0
+2018-08-17,166550,4.0,7.0,0.0
+2018-08-18,107011,5.0,7.0,0.0
+2018-08-19,112299,6.0,7.0,0.0
+2018-08-20,176718,0.0,7.0,0.0
+2018-08-21,182562,1.0,7.0,0.0
+2018-08-22,181484,2.0,7.0,0.0
+2018-08-23,180317,3.0,7.0,0.0
+2018-08-24,170197,4.0,7.0,0.0
+2018-08-25,109383,5.0,7.0,0.0
+2018-08-26,113373,6.0,7.0,0.0
+2018-08-27,180142,0.0,7.0,0.0
+2018-08-28,191628,1.0,7.0,0.0
+2018-08-29,191149,2.0,7.0,0.0
+2018-08-30,187503,3.0,7.0,0.0
+2018-08-31,172280,4.0,7.0,0.0
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_train.csv b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_train.csv
new file mode 100644
index 000000000..5a409ad26
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/github_dau_2011-2018_train.csv
@@ -0,0 +1,2286 @@
+date,count,day_of_week,month_of_year,holiday
+2011-03-01,8583,1.0,2.0,0.0
+2011-03-02,8561,2.0,2.0,0.0
+2011-03-03,8406,3.0,2.0,0.0
+2011-03-04,7921,4.0,2.0,0.0
+2011-03-05,5597,5.0,2.0,0.0
+2011-03-06,6400,6.0,2.0,0.0
+2011-03-07,8043,0.0,2.0,0.0
+2011-03-08,8666,1.0,2.0,0.0
+2011-03-09,8344,2.0,2.0,0.0
+2011-03-10,8344,3.0,2.0,0.0
+2011-03-11,8017,4.0,2.0,0.0
+2011-03-12,5756,5.0,2.0,0.0
+2011-03-13,6294,6.0,2.0,0.0
+2011-03-14,8210,0.0,2.0,0.0
+2011-03-15,8882,1.0,2.0,0.0
+2011-03-16,8849,2.0,2.0,0.0
+2011-03-17,8611,3.0,2.0,0.0
+2011-03-18,8160,4.0,2.0,0.0
+2011-03-19,6068,5.0,2.0,0.0
+2011-03-20,6485,6.0,2.0,0.0
+2011-03-21,8596,0.0,2.0,0.0
+2011-03-22,9240,1.0,2.0,0.0
+2011-03-23,9005,2.0,2.0,0.0
+2011-03-24,8653,3.0,2.0,0.0
+2011-03-25,8288,4.0,2.0,0.0
+2011-03-26,6317,5.0,2.0,0.0
+2011-03-27,6793,6.0,2.0,0.0
+2011-03-28,9369,0.0,2.0,0.0
+2011-03-29,8589,1.0,2.0,0.0
+2011-03-30,9100,2.0,2.0,0.0
+2011-03-31,9013,3.0,2.0,0.0
+2011-04-01,8439,4.0,3.0,0.0
+2011-04-02,6142,5.0,3.0,0.0
+2011-04-03,6703,6.0,3.0,0.0
+2011-04-04,9516,0.0,3.0,0.0
+2011-04-05,9736,1.0,3.0,0.0
+2011-04-06,9370,2.0,3.0,0.0
+2011-04-07,9178,3.0,3.0,0.0
+2011-04-08,8862,4.0,3.0,0.0
+2011-04-09,6183,5.0,3.0,0.0
+2011-04-10,6798,6.0,3.0,0.0
+2011-04-11,9661,0.0,3.0,0.0
+2011-04-12,9498,1.0,3.0,0.0
+2011-04-13,9668,2.0,3.0,0.0
+2011-04-14,9651,3.0,3.0,0.0
+2011-04-15,9052,4.0,3.0,0.0
+2011-04-16,6559,5.0,3.0,0.0
+2011-04-17,6826,6.0,3.0,0.0
+2011-04-18,9243,0.0,3.0,0.0
+2011-04-19,9787,1.0,3.0,0.0
+2011-04-20,9259,2.0,3.0,0.0
+2011-04-21,9090,3.0,3.0,0.0
+2011-04-22,7812,4.0,3.0,0.0
+2011-04-23,6081,5.0,3.0,0.0
+2011-04-24,6106,6.0,3.0,0.0
+2011-04-25,7975,0.0,3.0,0.0
+2011-04-26,9656,1.0,3.0,0.0
+2011-04-27,9090,2.0,3.0,0.0
+2011-04-28,8600,3.0,3.0,0.0
+2011-04-29,9050,4.0,3.0,0.0
+2011-04-30,6073,5.0,3.0,0.0
+2011-05-01,6554,6.0,4.0,0.0
+2011-05-02,8287,0.0,4.0,0.0
+2011-05-03,9763,1.0,4.0,0.0
+2011-05-04,10105,2.0,4.0,0.0
+2011-05-05,10113,3.0,4.0,0.0
+2011-05-06,9085,4.0,4.0,0.0
+2011-05-07,6286,5.0,4.0,0.0
+2011-05-08,6674,6.0,4.0,0.0
+2011-05-09,9810,0.0,4.0,0.0
+2011-05-10,9390,1.0,4.0,0.0
+2011-05-11,10237,2.0,4.0,0.0
+2011-05-12,9630,3.0,4.0,0.0
+2011-05-13,9248,4.0,4.0,0.0
+2011-05-14,6785,5.0,4.0,0.0
+2011-05-15,7197,6.0,4.0,0.0
+2011-05-16,9794,0.0,4.0,0.0
+2011-05-17,10042,1.0,4.0,0.0
+2011-05-18,9978,2.0,4.0,0.0
+2011-05-19,10032,3.0,4.0,0.0
+2011-05-20,8662,4.0,4.0,0.0
+2011-05-21,6172,5.0,4.0,0.0
+2011-05-22,6423,6.0,4.0,0.0
+2011-05-23,10039,0.0,4.0,0.0
+2011-05-24,10487,1.0,4.0,0.0
+2011-05-25,10291,2.0,4.0,0.0
+2011-05-26,10188,3.0,4.0,0.0
+2011-05-27,8773,4.0,4.0,0.0
+2011-05-28,6323,5.0,4.0,0.0
+2011-05-29,6728,6.0,4.0,0.0
+2011-05-30,8663,0.0,4.0,1.0
+2011-05-31,10047,1.0,4.0,0.0
+2011-06-01,10183,2.0,5.0,0.0
+2011-06-02,9305,3.0,5.0,0.0
+2011-06-03,9493,4.0,5.0,0.0
+2011-06-04,6682,5.0,5.0,0.0
+2011-06-05,7043,6.0,5.0,0.0
+2011-06-06,9619,0.0,5.0,0.0
+2011-06-07,10108,1.0,5.0,0.0
+2011-06-08,10330,2.0,5.0,0.0
+2011-06-09,9792,3.0,5.0,0.0
+2011-06-10,9287,4.0,5.0,0.0
+2011-06-11,6432,5.0,5.0,0.0
+2011-06-12,6278,6.0,5.0,0.0
+2011-06-13,9515,0.0,5.0,0.0
+2011-06-14,10155,1.0,5.0,0.0
+2011-06-15,9979,2.0,5.0,0.0
+2011-06-16,9880,3.0,5.0,0.0
+2011-06-17,9855,4.0,5.0,0.0
+2011-06-18,6356,5.0,5.0,0.0
+2011-06-19,7028,6.0,5.0,0.0
+2011-06-20,10335,0.0,5.0,0.0
+2011-06-21,10383,1.0,5.0,0.0
+2011-06-22,10391,2.0,5.0,0.0
+2011-06-23,7190,3.0,5.0,0.0
+2011-06-24,9613,4.0,5.0,0.0
+2011-06-25,5890,5.0,5.0,0.0
+2011-06-26,6256,6.0,5.0,0.0
+2011-06-27,8825,0.0,5.0,0.0
+2011-06-28,10263,1.0,5.0,0.0
+2011-06-29,10628,2.0,5.0,0.0
+2011-06-30,10043,3.0,5.0,0.0
+2011-07-01,9403,4.0,6.0,0.0
+2011-07-02,6294,5.0,6.0,0.0
+2011-07-03,6485,6.0,6.0,0.0
+2011-07-04,8954,0.0,6.0,1.0
+2011-07-05,9672,1.0,6.0,0.0
+2011-07-06,10488,2.0,6.0,0.0
+2011-07-07,10199,3.0,6.0,0.0
+2011-07-08,9300,4.0,6.0,0.0
+2011-07-09,6544,5.0,6.0,0.0
+2011-07-10,6898,6.0,6.0,0.0
+2011-07-11,10087,0.0,6.0,0.0
+2011-07-12,10623,1.0,6.0,0.0
+2011-07-13,10201,2.0,6.0,0.0
+2011-07-14,9771,3.0,6.0,0.0
+2011-07-15,9339,4.0,6.0,0.0
+2011-07-16,6690,5.0,6.0,0.0
+2011-07-17,7059,6.0,6.0,0.0
+2011-07-18,10367,0.0,6.0,0.0
+2011-07-19,10123,1.0,6.0,0.0
+2011-07-20,10370,2.0,6.0,0.0
+2011-07-21,10296,3.0,6.0,0.0
+2011-07-22,9479,4.0,6.0,0.0
+2011-07-23,6667,5.0,6.0,0.0
+2011-07-24,6929,6.0,6.0,0.0
+2011-07-25,9924,0.0,6.0,0.0
+2011-07-26,10840,1.0,6.0,0.0
+2011-07-27,10588,2.0,6.0,0.0
+2011-07-28,10195,3.0,6.0,0.0
+2011-07-29,9688,4.0,6.0,0.0
+2011-07-30,6070,5.0,6.0,0.0
+2011-07-31,6858,6.0,6.0,0.0
+2011-08-01,9822,0.0,7.0,0.0
+2011-08-02,10529,1.0,7.0,0.0
+2011-08-03,10392,2.0,7.0,0.0
+2011-08-04,10498,3.0,7.0,0.0
+2011-08-05,9775,4.0,7.0,0.0
+2011-08-06,6653,5.0,7.0,0.0
+2011-08-07,6361,6.0,7.0,0.0
+2011-08-08,10287,0.0,7.0,0.0
+2011-08-09,10742,1.0,7.0,0.0
+2011-08-10,10086,2.0,7.0,0.0
+2011-08-11,10391,3.0,7.0,0.0
+2011-08-12,9614,4.0,7.0,0.0
+2011-08-13,6835,5.0,7.0,0.0
+2011-08-14,6912,6.0,7.0,0.0
+2011-08-15,10075,0.0,7.0,0.0
+2011-08-16,10949,1.0,7.0,0.0
+2011-08-17,11041,2.0,7.0,0.0
+2011-08-18,10742,3.0,7.0,0.0
+2011-08-19,10146,4.0,7.0,0.0
+2011-08-20,6424,5.0,7.0,0.0
+2011-08-21,7248,6.0,7.0,0.0
+2011-08-22,10650,0.0,7.0,0.0
+2011-08-23,11171,1.0,7.0,0.0
+2011-08-24,11385,2.0,7.0,0.0
+2011-08-25,10968,3.0,7.0,0.0
+2011-08-26,10179,4.0,7.0,0.0
+2011-08-27,7129,5.0,7.0,0.0
+2011-08-28,7341,6.0,7.0,0.0
+2011-08-29,10953,0.0,7.0,0.0
+2011-08-30,11251,1.0,7.0,0.0
+2011-08-31,11103,2.0,7.0,0.0
+2011-09-01,11120,3.0,8.0,0.0
+2011-09-02,10610,4.0,8.0,0.0
+2011-09-03,7280,5.0,8.0,0.0
+2011-09-04,7798,6.0,8.0,0.0
+2011-09-05,10391,0.0,8.0,1.0
+2011-09-06,11625,1.0,8.0,0.0
+2011-09-07,11869,2.0,8.0,0.0
+2011-09-08,11653,3.0,8.0,0.0
+2011-09-09,10962,4.0,8.0,0.0
+2011-09-10,7616,5.0,8.0,0.0
+2011-09-11,8209,6.0,8.0,0.0
+2011-09-12,11410,0.0,8.0,0.0
+2011-09-13,12278,1.0,8.0,0.0
+2011-09-14,12162,2.0,8.0,0.0
+2011-09-15,11739,3.0,8.0,0.0
+2011-09-16,11476,4.0,8.0,0.0
+2011-09-17,7297,5.0,8.0,0.0
+2011-09-18,8467,6.0,8.0,0.0
+2011-09-19,11276,0.0,8.0,0.0
+2011-09-20,11934,1.0,8.0,0.0
+2011-09-21,12059,2.0,8.0,0.0
+2011-09-22,12279,3.0,8.0,0.0
+2011-09-23,11209,4.0,8.0,0.0
+2011-09-24,7928,5.0,8.0,0.0
+2011-09-25,8584,6.0,8.0,0.0
+2011-09-26,12586,0.0,8.0,0.0
+2011-09-27,13016,1.0,8.0,0.0
+2011-09-28,12805,2.0,8.0,0.0
+2011-09-29,12525,3.0,8.0,0.0
+2011-09-30,11612,4.0,8.0,0.0
+2011-10-01,7829,5.0,9.0,0.0
+2011-10-02,8493,6.0,9.0,0.0
+2011-10-03,11934,0.0,9.0,0.0
+2011-10-04,12469,1.0,9.0,0.0
+2011-10-05,12576,2.0,9.0,0.0
+2011-10-06,12347,3.0,9.0,0.0
+2011-10-07,11916,4.0,9.0,0.0
+2011-10-08,8281,5.0,9.0,0.0
+2011-10-09,8830,6.0,9.0,0.0
+2011-10-10,12618,0.0,9.0,1.0
+2011-10-11,13105,1.0,9.0,0.0
+2011-10-12,12897,2.0,9.0,0.0
+2011-10-13,12674,3.0,9.0,0.0
+2011-10-14,11783,4.0,9.0,0.0
+2011-10-15,8104,5.0,9.0,0.0
+2011-10-16,8805,6.0,9.0,0.0
+2011-10-17,12899,0.0,9.0,0.0
+2011-10-18,13196,1.0,9.0,0.0
+2011-10-19,13200,2.0,9.0,0.0
+2011-10-20,13142,3.0,9.0,0.0
+2011-10-21,12269,4.0,9.0,0.0
+2011-10-22,8506,5.0,9.0,0.0
+2011-10-23,9133,6.0,9.0,0.0
+2011-10-24,13230,0.0,9.0,0.0
+2011-10-25,13364,1.0,9.0,0.0
+2011-10-26,13443,2.0,9.0,0.0
+2011-10-27,11080,3.0,9.0,0.0
+2011-10-28,10718,4.0,9.0,0.0
+2011-10-29,7997,5.0,9.0,0.0
+2011-10-30,8613,6.0,9.0,0.0
+2011-10-31,12319,0.0,9.0,0.0
+2011-11-01,12598,1.0,10.0,0.0
+2011-11-02,13218,2.0,10.0,0.0
+2011-11-03,12805,3.0,10.0,0.0
+2011-11-04,12883,4.0,10.0,0.0
+2011-11-05,8569,5.0,10.0,0.0
+2011-11-06,9090,6.0,10.0,0.0
+2011-11-07,11174,0.0,10.0,0.0
+2011-11-08,14122,1.0,10.0,0.0
+2011-11-09,12036,2.0,10.0,0.0
+2011-11-10,12966,3.0,10.0,0.0
+2011-11-11,12005,4.0,10.0,1.0
+2011-11-12,8419,5.0,10.0,0.0
+2011-11-13,9036,6.0,10.0,0.0
+2011-11-14,12804,0.0,10.0,0.0
+2011-11-15,13378,1.0,10.0,0.0
+2011-11-16,12693,2.0,10.0,0.0
+2011-11-17,13360,3.0,10.0,0.0
+2011-11-18,11744,4.0,10.0,0.0
+2011-11-19,8190,5.0,10.0,0.0
+2011-11-20,9690,6.0,10.0,0.0
+2011-11-21,12145,0.0,10.0,0.0
+2011-11-22,13212,1.0,10.0,0.0
+2011-11-23,13477,2.0,10.0,0.0
+2011-11-24,12085,3.0,10.0,1.0
+2011-11-25,10505,4.0,10.0,0.0
+2011-11-26,8705,5.0,10.0,0.0
+2011-11-27,9648,6.0,10.0,0.0
+2011-11-28,13613,0.0,10.0,0.0
+2011-11-29,14272,1.0,10.0,0.0
+2011-11-30,13957,2.0,10.0,0.0
+2011-12-01,14827,3.0,11.0,0.0
+2011-12-02,13591,4.0,11.0,0.0
+2011-12-03,9827,5.0,11.0,0.0
+2011-12-04,10540,6.0,11.0,0.0
+2011-12-05,14286,0.0,11.0,0.0
+2011-12-06,14420,1.0,11.0,0.0
+2011-12-07,13800,2.0,11.0,0.0
+2011-12-08,13077,3.0,11.0,0.0
+2011-12-09,13409,4.0,11.0,0.0
+2011-12-10,9537,5.0,11.0,0.0
+2011-12-11,9686,6.0,11.0,0.0
+2011-12-12,14003,0.0,11.0,0.0
+2011-12-13,13616,1.0,11.0,0.0
+2011-12-14,13695,2.0,11.0,0.0
+2011-12-15,13702,3.0,11.0,0.0
+2011-12-16,13328,4.0,11.0,0.0
+2011-12-17,8779,5.0,11.0,0.0
+2011-12-18,9541,6.0,11.0,0.0
+2011-12-19,13250,0.0,11.0,0.0
+2011-12-20,12924,1.0,11.0,0.0
+2011-12-21,12238,2.0,11.0,0.0
+2011-12-22,11812,3.0,11.0,0.0
+2011-12-23,10407,4.0,11.0,0.0
+2011-12-24,6600,5.0,11.0,0.0
+2011-12-25,5670,6.0,11.0,0.0
+2011-12-26,7446,0.0,11.0,1.0
+2011-12-27,9742,1.0,11.0,0.0
+2011-12-28,10019,2.0,11.0,0.0
+2011-12-29,10927,3.0,11.0,0.0
+2011-12-30,10146,4.0,11.0,0.0
+2012-01-01,6587,6.0,0.0,0.0
+2012-01-02,10254,0.0,0.0,1.0
+2012-01-03,12412,1.0,0.0,0.0
+2012-01-04,11806,2.0,0.0,0.0
+2012-01-05,13030,3.0,0.0,0.0
+2012-01-06,13081,4.0,0.0,0.0
+2012-01-07,9688,5.0,0.0,0.0
+2012-01-08,9682,6.0,0.0,0.0
+2012-01-09,12389,0.0,0.0,0.0
+2012-01-10,12888,1.0,0.0,0.0
+2012-01-11,14916,2.0,0.0,0.0
+2012-01-12,13966,3.0,0.0,0.0
+2012-01-13,13629,4.0,0.0,0.0
+2012-01-14,9862,5.0,0.0,0.0
+2012-01-15,10764,6.0,0.0,0.0
+2012-01-16,14066,0.0,0.0,1.0
+2012-01-17,14636,1.0,0.0,0.0
+2012-01-18,14308,2.0,0.0,0.0
+2012-01-19,14301,3.0,0.0,0.0
+2012-01-20,13525,4.0,0.0,0.0
+2012-01-21,10410,5.0,0.0,0.0
+2012-01-22,10384,6.0,0.0,0.0
+2012-01-23,14114,0.0,0.0,0.0
+2012-01-24,14996,1.0,0.0,0.0
+2012-01-25,14904,2.0,0.0,0.0
+2012-01-26,14957,3.0,0.0,0.0
+2012-01-27,15145,4.0,0.0,0.0
+2012-01-28,11182,5.0,0.0,0.0
+2012-01-29,11845,6.0,0.0,0.0
+2012-01-30,15747,0.0,0.0,0.0
+2012-01-31,16974,1.0,0.0,0.0
+2012-02-01,16410,2.0,1.0,0.0
+2012-02-02,15344,3.0,1.0,0.0
+2012-02-03,15275,4.0,1.0,0.0
+2012-02-04,10634,5.0,1.0,0.0
+2012-02-05,11996,6.0,1.0,0.0
+2012-02-06,13976,0.0,1.0,0.0
+2012-02-07,14838,1.0,1.0,0.0
+2012-02-08,15306,2.0,1.0,0.0
+2012-02-09,15598,3.0,1.0,0.0
+2012-02-10,14349,4.0,1.0,0.0
+2012-02-11,11061,5.0,1.0,0.0
+2012-02-12,12209,6.0,1.0,0.0
+2012-02-13,13869,0.0,1.0,0.0
+2012-02-14,15581,1.0,1.0,0.0
+2012-02-15,13850,2.0,1.0,0.0
+2012-02-16,15864,3.0,1.0,0.0
+2012-02-17,15855,4.0,1.0,0.0
+2012-02-18,11506,5.0,1.0,0.0
+2012-02-19,12713,6.0,1.0,0.0
+2012-02-20,15871,0.0,1.0,1.0
+2012-02-21,18141,1.0,1.0,0.0
+2012-02-22,18658,2.0,1.0,0.0
+2012-02-23,18336,3.0,1.0,0.0
+2012-02-24,17493,4.0,1.0,0.0
+2012-02-25,13047,5.0,1.0,0.0
+2012-02-26,13470,6.0,1.0,0.0
+2012-02-27,18588,0.0,1.0,0.0
+2012-02-28,19337,1.0,1.0,0.0
+2012-02-29,18919,2.0,1.0,0.0
+2012-03-01,16831,3.0,2.0,0.0
+2012-03-02,16858,4.0,2.0,0.0
+2012-03-03,12768,5.0,2.0,0.0
+2012-03-04,11378,6.0,2.0,0.0
+2012-03-05,17247,0.0,2.0,0.0
+2012-03-06,19299,1.0,2.0,0.0
+2012-03-07,19070,2.0,2.0,0.0
+2012-03-08,18345,3.0,2.0,0.0
+2012-03-09,17563,4.0,2.0,0.0
+2012-03-10,4558,5.0,2.0,0.0
+2012-03-11,11403,6.0,2.0,0.0
+2012-03-12,19012,0.0,2.0,0.0
+2012-03-13,19453,1.0,2.0,0.0
+2012-03-14,18612,2.0,2.0,0.0
+2012-03-15,18516,3.0,2.0,0.0
+2012-03-16,17712,4.0,2.0,0.0
+2012-03-17,12388,5.0,2.0,0.0
+2012-03-18,13136,6.0,2.0,0.0
+2012-03-19,19017,0.0,2.0,0.0
+2012-03-20,19748,1.0,2.0,0.0
+2012-03-21,19332,2.0,2.0,0.0
+2012-03-22,19193,3.0,2.0,0.0
+2012-03-23,17920,4.0,2.0,0.0
+2012-03-24,12753,5.0,2.0,0.0
+2012-03-25,13249,6.0,2.0,0.0
+2012-03-26,19124,0.0,2.0,0.0
+2012-03-27,19509,1.0,2.0,0.0
+2012-03-28,19821,2.0,2.0,0.0
+2012-03-29,19472,3.0,2.0,0.0
+2012-03-30,18427,4.0,2.0,0.0
+2012-03-31,13115,5.0,2.0,0.0
+2012-04-01,13515,6.0,3.0,0.0
+2012-04-02,18399,0.0,3.0,0.0
+2012-04-03,19605,1.0,3.0,0.0
+2012-04-04,19252,2.0,3.0,0.0
+2012-04-05,18543,3.0,3.0,0.0
+2012-04-06,16503,4.0,3.0,0.0
+2012-04-07,12460,5.0,3.0,0.0
+2012-04-08,12448,6.0,3.0,0.0
+2012-04-09,17445,0.0,3.0,0.0
+2012-04-10,19932,1.0,3.0,0.0
+2012-04-11,20228,2.0,3.0,0.0
+2012-04-12,19756,3.0,3.0,0.0
+2012-04-13,18782,4.0,3.0,0.0
+2012-04-14,13467,5.0,3.0,0.0
+2012-04-15,14327,6.0,3.0,0.0
+2012-04-16,20054,0.0,3.0,0.0
+2012-04-17,20519,1.0,3.0,0.0
+2012-04-18,20550,2.0,3.0,0.0
+2012-04-19,20701,3.0,3.0,0.0
+2012-04-20,19581,4.0,3.0,0.0
+2012-04-21,13836,5.0,3.0,0.0
+2012-04-22,15203,6.0,3.0,0.0
+2012-04-23,21022,0.0,3.0,0.0
+2012-04-24,21531,1.0,3.0,0.0
+2012-04-25,20843,2.0,3.0,0.0
+2012-04-26,20502,3.0,3.0,0.0
+2012-04-27,19350,4.0,3.0,0.0
+2012-04-28,13435,5.0,3.0,0.0
+2012-04-29,13740,6.0,3.0,0.0
+2012-04-30,18399,0.0,3.0,0.0
+2012-05-01,18568,1.0,4.0,0.0
+2012-05-02,20450,2.0,4.0,0.0
+2012-05-03,20346,3.0,4.0,0.0
+2012-05-04,19046,4.0,4.0,0.0
+2012-05-05,13624,5.0,4.0,0.0
+2012-05-06,14067,6.0,4.0,0.0
+2012-05-07,19843,0.0,4.0,0.0
+2012-05-08,20642,1.0,4.0,0.0
+2012-05-09,20494,2.0,4.0,0.0
+2012-05-10,20582,3.0,4.0,0.0
+2012-05-11,19082,4.0,4.0,0.0
+2012-05-12,12969,5.0,4.0,0.0
+2012-05-13,13213,6.0,4.0,0.0
+2012-05-14,19891,0.0,4.0,0.0
+2012-05-15,20429,1.0,4.0,0.0
+2012-05-16,19803,2.0,4.0,0.0
+2012-05-17,18502,3.0,4.0,0.0
+2012-05-18,17863,4.0,4.0,0.0
+2012-05-19,11967,5.0,4.0,0.0
+2012-05-20,12955,6.0,4.0,0.0
+2012-05-21,19504,0.0,4.0,0.0
+2012-05-22,21177,1.0,4.0,0.0
+2012-05-23,20755,2.0,4.0,0.0
+2012-05-24,20334,3.0,4.0,0.0
+2012-05-25,18596,4.0,4.0,0.0
+2012-05-26,11896,5.0,4.0,0.0
+2012-05-27,12267,6.0,4.0,0.0
+2012-05-28,16877,0.0,4.0,1.0
+2012-05-29,20475,1.0,4.0,0.0
+2012-05-30,20843,2.0,4.0,0.0
+2012-05-31,19725,3.0,4.0,0.0
+2012-06-01,18977,4.0,5.0,0.0
+2012-06-02,12762,5.0,5.0,0.0
+2012-06-03,13811,6.0,5.0,0.0
+2012-06-04,19603,0.0,5.0,0.0
+2012-06-05,20407,1.0,5.0,0.0
+2012-06-06,20109,2.0,5.0,0.0
+2012-06-07,20065,3.0,5.0,0.0
+2012-06-08,18897,4.0,5.0,0.0
+2012-06-09,12974,5.0,5.0,0.0
+2012-06-10,13579,6.0,5.0,0.0
+2012-06-11,19795,0.0,5.0,0.0
+2012-06-12,20766,1.0,5.0,0.0
+2012-06-13,20493,2.0,5.0,0.0
+2012-06-14,20337,3.0,5.0,0.0
+2012-06-15,18872,4.0,5.0,0.0
+2012-06-16,12563,5.0,5.0,0.0
+2012-06-17,12595,6.0,5.0,0.0
+2012-06-18,19942,0.0,5.0,0.0
+2012-06-19,20901,1.0,5.0,0.0
+2012-06-20,20460,2.0,5.0,0.0
+2012-06-21,20208,3.0,5.0,0.0
+2012-06-22,18334,4.0,5.0,0.0
+2012-06-23,12188,5.0,5.0,0.0
+2012-06-24,12974,6.0,5.0,0.0
+2012-06-25,19997,0.0,5.0,0.0
+2012-06-26,21259,1.0,5.0,0.0
+2012-06-27,20474,2.0,5.0,0.0
+2012-06-28,19885,3.0,5.0,0.0
+2012-06-29,18686,4.0,5.0,0.0
+2012-06-30,12240,5.0,5.0,0.0
+2012-07-01,12825,6.0,6.0,0.0
+2012-07-02,19514,0.0,6.0,0.0
+2012-07-03,20326,1.0,6.0,0.0
+2012-07-04,18182,2.0,6.0,1.0
+2012-07-05,19268,3.0,6.0,0.0
+2012-07-06,19182,4.0,6.0,0.0
+2012-07-07,12835,5.0,6.0,0.0
+2012-07-08,13365,6.0,6.0,0.0
+2012-07-09,20486,0.0,6.0,0.0
+2012-07-10,21706,1.0,6.0,0.0
+2012-07-11,21626,2.0,6.0,0.0
+2012-07-12,21252,3.0,6.0,0.0
+2012-07-13,20151,4.0,6.0,0.0
+2012-07-14,12797,5.0,6.0,0.0
+2012-07-15,13483,6.0,6.0,0.0
+2012-07-16,20626,0.0,6.0,0.0
+2012-07-17,21534,1.0,6.0,0.0
+2012-07-18,21272,2.0,6.0,0.0
+2012-07-19,20996,3.0,6.0,0.0
+2012-07-20,19689,4.0,6.0,0.0
+2012-07-21,12728,5.0,6.0,0.0
+2012-07-22,13196,6.0,6.0,0.0
+2012-07-23,20682,0.0,6.0,0.0
+2012-07-24,21436,1.0,6.0,0.0
+2012-07-25,20928,2.0,6.0,0.0
+2012-07-26,20682,3.0,6.0,0.0
+2012-07-27,19471,4.0,6.0,0.0
+2012-07-28,12348,5.0,6.0,0.0
+2012-07-29,13181,6.0,6.0,0.0
+2012-07-30,20472,0.0,6.0,0.0
+2012-07-31,20755,1.0,6.0,0.0
+2012-08-01,20981,2.0,7.0,0.0
+2012-08-02,20754,3.0,7.0,0.0
+2012-08-03,19474,4.0,7.0,0.0
+2012-08-04,12608,5.0,7.0,0.0
+2012-08-05,13300,6.0,7.0,0.0
+2012-08-06,20171,0.0,7.0,0.0
+2012-08-07,21381,1.0,7.0,0.0
+2012-08-08,21414,2.0,7.0,0.0
+2012-08-09,21189,3.0,7.0,0.0
+2012-08-10,20258,4.0,7.0,0.0
+2012-08-11,13126,5.0,7.0,0.0
+2012-08-12,13542,6.0,7.0,0.0
+2012-08-13,21095,0.0,7.0,0.0
+2012-08-14,21820,1.0,7.0,0.0
+2012-08-15,20412,2.0,7.0,0.0
+2012-08-16,20654,3.0,7.0,0.0
+2012-08-17,19865,4.0,7.0,0.0
+2012-08-18,13124,5.0,7.0,0.0
+2012-08-19,13500,6.0,7.0,0.0
+2012-08-20,21156,0.0,7.0,0.0
+2012-08-21,22188,1.0,7.0,0.0
+2012-08-22,22133,2.0,7.0,0.0
+2012-08-23,21972,3.0,7.0,0.0
+2012-08-24,20575,4.0,7.0,0.0
+2012-08-25,13606,5.0,7.0,0.0
+2012-08-26,14147,6.0,7.0,0.0
+2012-08-27,21513,0.0,7.0,0.0
+2012-08-28,22396,1.0,7.0,0.0
+2012-08-29,22023,2.0,7.0,0.0
+2012-08-30,22032,3.0,7.0,0.0
+2012-08-31,20667,4.0,7.0,0.0
+2012-09-01,13193,5.0,8.0,0.0
+2012-09-02,14236,6.0,8.0,0.0
+2012-09-03,19533,0.0,8.0,1.0
+2012-09-04,22529,1.0,8.0,0.0
+2012-09-05,23006,2.0,8.0,0.0
+2012-09-06,22463,3.0,8.0,0.0
+2012-09-07,21547,4.0,8.0,0.0
+2012-09-08,14061,5.0,8.0,0.0
+2012-09-09,15149,6.0,8.0,0.0
+2012-09-10,22730,0.0,8.0,0.0
+2012-09-11,23336,1.0,8.0,0.0
+2012-09-12,23521,2.0,8.0,0.0
+2012-09-13,23435,3.0,8.0,0.0
+2012-09-14,21632,4.0,8.0,0.0
+2012-09-15,14370,5.0,8.0,0.0
+2012-09-16,15122,6.0,8.0,0.0
+2012-09-17,23351,0.0,8.0,0.0
+2012-09-18,24066,1.0,8.0,0.0
+2012-09-19,23742,2.0,8.0,0.0
+2012-09-20,23585,3.0,8.0,0.0
+2012-09-21,22157,4.0,8.0,0.0
+2012-09-22,14539,5.0,8.0,0.0
+2012-09-23,15735,6.0,8.0,0.0
+2012-09-24,23613,0.0,8.0,0.0
+2012-09-25,24315,1.0,8.0,0.0
+2012-09-26,24513,2.0,8.0,0.0
+2012-09-27,23950,3.0,8.0,0.0
+2012-09-28,22489,4.0,8.0,0.0
+2012-09-29,15130,5.0,8.0,0.0
+2012-09-30,15516,6.0,8.0,0.0
+2012-10-01,22938,0.0,9.0,0.0
+2012-10-02,23758,1.0,9.0,0.0
+2012-10-03,24048,2.0,9.0,0.0
+2012-10-04,23651,3.0,9.0,0.0
+2012-10-05,22488,4.0,9.0,0.0
+2012-10-06,15261,5.0,9.0,0.0
+2012-10-07,16074,6.0,9.0,0.0
+2012-10-08,24300,0.0,9.0,1.0
+2012-10-09,26112,1.0,9.0,0.0
+2012-10-10,26118,2.0,9.0,0.0
+2012-10-11,25481,3.0,9.0,0.0
+2012-10-12,23749,4.0,9.0,0.0
+2012-10-13,16161,5.0,9.0,0.0
+2012-10-14,17196,6.0,9.0,0.0
+2012-10-15,25711,0.0,9.0,0.0
+2012-10-16,26368,1.0,9.0,0.0
+2012-10-17,26436,2.0,9.0,0.0
+2012-10-18,25588,3.0,9.0,0.0
+2012-10-19,24120,4.0,9.0,0.0
+2012-10-20,16546,5.0,9.0,0.0
+2012-10-21,17939,6.0,9.0,0.0
+2012-10-22,26790,0.0,9.0,0.0
+2012-10-23,26904,1.0,9.0,0.0
+2012-10-24,27135,2.0,9.0,0.0
+2012-10-25,26631,3.0,9.0,0.0
+2012-10-26,24735,4.0,9.0,0.0
+2012-10-27,16414,5.0,9.0,0.0
+2012-10-28,17832,6.0,9.0,0.0
+2012-10-29,26382,0.0,9.0,0.0
+2012-10-30,27051,1.0,9.0,0.0
+2012-10-31,26630,2.0,9.0,0.0
+2012-11-01,25001,3.0,10.0,0.0
+2012-11-02,24505,4.0,10.0,0.0
+2012-11-03,17411,5.0,10.0,0.0
+2012-11-04,18421,6.0,10.0,0.0
+2012-11-05,27468,0.0,10.0,0.0
+2012-11-06,28425,1.0,10.0,0.0
+2012-11-07,27405,2.0,10.0,0.0
+2012-11-08,28017,3.0,10.0,0.0
+2012-11-09,26332,4.0,10.0,0.0
+2012-11-10,18246,5.0,10.0,0.0
+2012-11-11,19133,6.0,10.0,0.0
+2012-11-12,27814,0.0,10.0,1.0
+2012-11-13,28922,1.0,10.0,0.0
+2012-11-14,28695,2.0,10.0,0.0
+2012-11-15,28078,3.0,10.0,0.0
+2012-11-16,26404,4.0,10.0,0.0
+2012-11-17,18254,5.0,10.0,0.0
+2012-11-18,19573,6.0,10.0,0.0
+2012-11-19,28486,0.0,10.0,0.0
+2012-11-20,28976,1.0,10.0,0.0
+2012-11-21,28161,2.0,10.0,0.0
+2012-11-22,24228,3.0,10.0,1.0
+2012-11-23,22550,4.0,10.0,0.0
+2012-11-24,17484,5.0,10.0,0.0
+2012-11-25,19188,6.0,10.0,0.0
+2012-11-26,28974,0.0,10.0,0.0
+2012-11-27,29963,1.0,10.0,0.0
+2012-11-28,30244,2.0,10.0,0.0
+2012-11-29,29538,3.0,10.0,0.0
+2012-11-30,26786,4.0,10.0,0.0
+2012-12-01,19253,5.0,11.0,0.0
+2012-12-02,20778,6.0,11.0,0.0
+2012-12-03,30026,0.0,11.0,0.0
+2012-12-04,30295,1.0,11.0,0.0
+2012-12-05,30105,2.0,11.0,0.0
+2012-12-06,29559,3.0,11.0,0.0
+2012-12-07,26613,4.0,11.0,0.0
+2012-12-08,18467,5.0,11.0,0.0
+2012-12-09,20055,6.0,11.0,0.0
+2012-12-10,28579,0.0,11.0,0.0
+2012-12-11,29642,1.0,11.0,0.0
+2012-12-12,29168,2.0,11.0,0.0
+2012-12-13,28652,3.0,11.0,0.0
+2012-12-14,26568,4.0,11.0,0.0
+2012-12-15,17788,5.0,11.0,0.0
+2012-12-16,18785,6.0,11.0,0.0
+2012-12-17,27496,0.0,11.0,0.0
+2012-12-18,27723,1.0,11.0,0.0
+2012-12-19,27055,2.0,11.0,0.0
+2012-12-20,26013,3.0,11.0,0.0
+2012-12-21,23140,4.0,11.0,0.0
+2012-12-22,15245,5.0,11.0,0.0
+2012-12-23,14097,6.0,11.0,0.0
+2012-12-24,16373,0.0,11.0,0.0
+2012-12-25,13596,1.0,11.0,1.0
+2012-12-26,17465,2.0,11.0,0.0
+2012-12-27,20445,3.0,11.0,0.0
+2012-12-28,20120,4.0,11.0,0.0
+2012-12-29,16407,5.0,11.0,0.0
+2012-12-30,15777,6.0,11.0,0.0
+2012-12-31,6200,0.0,11.0,0.0
+2013-01-01,11208,1.0,0.0,1.0
+2013-01-02,22522,2.0,0.0,0.0
+2013-01-03,24859,3.0,0.0,0.0
+2013-01-04,25302,4.0,0.0,0.0
+2013-01-05,19114,5.0,0.0,0.0
+2013-01-06,19650,6.0,0.0,0.0
+2013-01-07,27504,0.0,0.0,0.0
+2013-01-08,29375,1.0,0.0,0.0
+2013-01-09,29679,2.0,0.0,0.0
+2013-01-10,29661,3.0,0.0,0.0
+2013-01-11,28997,4.0,0.0,0.0
+2013-01-12,19920,5.0,0.0,0.0
+2013-01-13,21301,6.0,0.0,0.0
+2013-01-14,30089,0.0,0.0,0.0
+2013-01-15,30936,1.0,0.0,0.0
+2013-01-16,31416,2.0,0.0,0.0
+2013-01-17,30992,3.0,0.0,0.0
+2013-01-18,29420,4.0,0.0,0.0
+2013-01-19,20790,5.0,0.0,0.0
+2013-01-20,21897,6.0,0.0,0.0
+2013-01-21,29606,0.0,0.0,1.0
+2013-01-22,31573,1.0,0.0,0.0
+2013-01-23,32344,2.0,0.0,0.0
+2013-01-24,32485,3.0,0.0,0.0
+2013-01-25,30793,4.0,0.0,0.0
+2013-01-26,21917,5.0,0.0,0.0
+2013-01-27,23032,6.0,0.0,0.0
+2013-01-28,31946,0.0,0.0,0.0
+2013-01-29,33487,1.0,0.0,0.0
+2013-01-30,33192,2.0,0.0,0.0
+2013-01-31,32722,3.0,0.0,0.0
+2013-02-01,30716,4.0,1.0,0.0
+2013-02-02,21484,5.0,1.0,0.0
+2013-02-03,22962,6.0,1.0,0.0
+2013-02-04,31284,0.0,1.0,0.0
+2013-02-05,33106,1.0,1.0,0.0
+2013-02-06,32976,2.0,1.0,0.0
+2013-02-07,32429,3.0,1.0,0.0
+2013-02-08,30524,4.0,1.0,0.0
+2013-02-09,21085,5.0,1.0,0.0
+2013-02-10,22281,6.0,1.0,0.0
+2013-02-11,30989,0.0,1.0,0.0
+2013-02-12,32543,1.0,1.0,0.0
+2013-02-13,31854,2.0,1.0,0.0
+2013-02-14,30875,3.0,1.0,0.0
+2013-02-15,29531,4.0,1.0,0.0
+2013-02-16,22299,5.0,1.0,0.0
+2013-02-17,23941,6.0,1.0,0.0
+2013-02-18,33106,0.0,1.0,1.0
+2013-02-19,35274,1.0,1.0,0.0
+2013-02-20,35265,2.0,1.0,0.0
+2013-02-21,34535,3.0,1.0,0.0
+2013-02-22,33009,4.0,1.0,0.0
+2013-02-23,23466,5.0,1.0,0.0
+2013-02-24,24903,6.0,1.0,0.0
+2013-02-25,35081,0.0,1.0,0.0
+2013-02-26,36143,1.0,1.0,0.0
+2013-02-27,35992,2.0,1.0,0.0
+2013-02-28,35284,3.0,1.0,0.0
+2013-03-01,33063,4.0,2.0,0.0
+2013-03-02,23944,5.0,2.0,0.0
+2013-03-03,25119,6.0,2.0,0.0
+2013-03-04,35777,0.0,2.0,0.0
+2013-03-05,36559,1.0,2.0,0.0
+2013-03-06,35998,2.0,2.0,0.0
+2013-03-07,35682,3.0,2.0,0.0
+2013-03-08,33619,4.0,2.0,0.0
+2013-03-09,23860,5.0,2.0,0.0
+2013-03-10,25293,6.0,2.0,0.0
+2013-03-11,36253,0.0,2.0,0.0
+2013-03-12,37391,1.0,2.0,0.0
+2013-03-13,37132,2.0,2.0,0.0
+2013-03-14,36044,3.0,2.0,0.0
+2013-03-15,34297,4.0,2.0,0.0
+2013-03-16,24005,5.0,2.0,0.0
+2013-03-17,25836,6.0,2.0,0.0
+2013-03-18,36614,0.0,2.0,0.0
+2013-03-19,38229,1.0,2.0,0.0
+2013-03-20,38085,2.0,2.0,0.0
+2013-03-21,37290,3.0,2.0,0.0
+2013-03-22,35173,4.0,2.0,0.0
+2013-03-23,23732,5.0,2.0,0.0
+2013-03-24,26573,6.0,2.0,0.0
+2013-03-25,38095,0.0,2.0,0.0
+2013-03-26,38959,1.0,2.0,0.0
+2013-03-27,36841,2.0,2.0,0.0
+2013-03-28,35861,3.0,2.0,0.0
+2013-03-29,31458,4.0,2.0,0.0
+2013-03-30,23375,5.0,2.0,0.0
+2013-03-31,23229,6.0,2.0,0.0
+2013-04-01,32188,0.0,3.0,0.0
+2013-04-02,37574,1.0,3.0,0.0
+2013-04-03,37688,2.0,3.0,0.0
+2013-04-04,36662,3.0,3.0,0.0
+2013-04-05,35247,4.0,3.0,0.0
+2013-04-06,25579,5.0,3.0,0.0
+2013-04-07,28152,6.0,3.0,0.0
+2013-04-08,38770,0.0,3.0,0.0
+2013-04-09,39537,1.0,3.0,0.0
+2013-04-10,39099,2.0,3.0,0.0
+2013-04-11,38970,3.0,3.0,0.0
+2013-04-12,37006,4.0,3.0,0.0
+2013-04-13,25241,5.0,3.0,0.0
+2013-04-14,26604,6.0,3.0,0.0
+2013-04-15,38046,0.0,3.0,0.0
+2013-04-16,39572,1.0,3.0,0.0
+2013-04-17,39873,2.0,3.0,0.0
+2013-04-18,39338,3.0,3.0,0.0
+2013-04-19,36343,4.0,3.0,0.0
+2013-04-20,25210,5.0,3.0,0.0
+2013-04-21,26877,6.0,3.0,0.0
+2013-04-22,39663,0.0,3.0,0.0
+2013-04-23,40706,1.0,3.0,0.0
+2013-04-24,39844,2.0,3.0,0.0
+2013-04-25,38703,3.0,3.0,0.0
+2013-04-26,35427,4.0,3.0,0.0
+2013-04-27,26071,5.0,3.0,0.0
+2013-04-28,27388,6.0,3.0,0.0
+2013-04-29,37487,0.0,3.0,0.0
+2013-04-30,36940,1.0,3.0,0.0
+2013-05-01,33606,2.0,4.0,0.0
+2013-05-02,37390,3.0,4.0,0.0
+2013-05-03,35633,4.0,4.0,0.0
+2013-05-04,24228,5.0,4.0,0.0
+2013-05-05,24997,6.0,4.0,0.0
+2013-05-06,36749,0.0,4.0,0.0
+2013-05-07,37704,1.0,4.0,0.0
+2013-05-08,37857,2.0,4.0,0.0
+2013-05-09,35833,3.0,4.0,0.0
+2013-05-10,34646,4.0,4.0,0.0
+2013-05-11,24376,5.0,4.0,0.0
+2013-05-12,25378,6.0,4.0,0.0
+2013-05-13,38290,0.0,4.0,0.0
+2013-05-14,39639,1.0,4.0,0.0
+2013-05-15,38600,2.0,4.0,0.0
+2013-05-16,38360,3.0,4.0,0.0
+2013-05-17,35699,4.0,4.0,0.0
+2013-05-18,23617,5.0,4.0,0.0
+2013-05-19,24777,6.0,4.0,0.0
+2013-05-20,36164,0.0,4.0,0.0
+2013-05-21,38868,1.0,4.0,0.0
+2013-05-22,39343,2.0,4.0,0.0
+2013-05-23,38808,3.0,4.0,0.0
+2013-05-24,35952,4.0,4.0,0.0
+2013-05-25,23631,5.0,4.0,0.0
+2013-05-26,24617,6.0,4.0,0.0
+2013-05-27,33553,0.0,4.0,1.0
+2013-05-28,38933,1.0,4.0,0.0
+2013-05-29,39393,2.0,4.0,0.0
+2013-05-30,37654,3.0,4.0,0.0
+2013-05-31,36341,4.0,4.0,0.0
+2013-06-01,23781,5.0,5.0,0.0
+2013-06-02,25611,6.0,5.0,0.0
+2013-06-03,38377,0.0,5.0,0.0
+2013-06-04,39508,1.0,5.0,0.0
+2013-06-05,38949,2.0,5.0,0.0
+2013-06-06,38397,3.0,5.0,0.0
+2013-06-07,36512,4.0,5.0,0.0
+2013-06-08,24453,5.0,5.0,0.0
+2013-06-09,25513,6.0,5.0,0.0
+2013-06-10,35931,0.0,5.0,0.0
+2013-06-11,36456,1.0,5.0,0.0
+2013-06-12,36649,2.0,5.0,0.0
+2013-06-13,37838,3.0,5.0,0.0
+2013-06-14,35372,4.0,5.0,0.0
+2013-06-15,22633,5.0,5.0,0.0
+2013-06-16,23632,6.0,5.0,0.0
+2013-06-17,36996,0.0,5.0,0.0
+2013-06-18,38905,1.0,5.0,0.0
+2013-06-19,38128,2.0,5.0,0.0
+2013-06-20,37205,3.0,5.0,0.0
+2013-06-21,34488,4.0,5.0,0.0
+2013-06-22,22328,5.0,5.0,0.0
+2013-06-23,24116,6.0,5.0,0.0
+2013-06-24,37051,0.0,5.0,0.0
+2013-06-25,38924,1.0,5.0,0.0
+2013-06-26,38481,2.0,5.0,0.0
+2013-06-27,37527,3.0,5.0,0.0
+2013-06-28,35081,4.0,5.0,0.0
+2013-06-29,22609,5.0,5.0,0.0
+2013-06-30,23535,6.0,5.0,0.0
+2013-07-01,35825,0.0,6.0,0.0
+2013-07-02,37818,1.0,6.0,0.0
+2013-07-03,37797,2.0,6.0,0.0
+2013-07-04,33322,3.0,6.0,1.0
+2013-07-05,32777,4.0,6.0,0.0
+2013-07-06,22675,5.0,6.0,0.0
+2013-07-07,24558,6.0,6.0,0.0
+2013-07-08,38713,0.0,6.0,0.0
+2013-07-09,40620,1.0,6.0,0.0
+2013-07-10,42070,2.0,6.0,0.0
+2013-07-11,41020,3.0,6.0,0.0
+2013-07-12,37346,4.0,6.0,0.0
+2013-07-13,23190,5.0,6.0,0.0
+2013-07-14,24518,6.0,6.0,0.0
+2013-07-15,38390,0.0,6.0,0.0
+2013-07-16,40149,1.0,6.0,0.0
+2013-07-17,40568,2.0,6.0,0.0
+2013-07-18,40213,3.0,6.0,0.0
+2013-07-19,38293,4.0,6.0,0.0
+2013-07-20,24090,5.0,6.0,0.0
+2013-07-21,24762,6.0,6.0,0.0
+2013-07-22,39038,0.0,6.0,0.0
+2013-07-23,40878,1.0,6.0,0.0
+2013-07-24,39600,2.0,6.0,0.0
+2013-07-25,38883,3.0,6.0,0.0
+2013-07-26,36596,4.0,6.0,0.0
+2013-07-27,23424,5.0,6.0,0.0
+2013-07-28,24364,6.0,6.0,0.0
+2013-07-29,37997,0.0,6.0,0.0
+2013-07-30,39569,1.0,6.0,0.0
+2013-07-31,39220,2.0,6.0,0.0
+2013-08-01,38151,3.0,7.0,0.0
+2013-08-02,35991,4.0,7.0,0.0
+2013-08-03,23359,5.0,7.0,0.0
+2013-08-04,24392,6.0,7.0,0.0
+2013-08-05,37880,0.0,7.0,0.0
+2013-08-06,39787,1.0,7.0,0.0
+2013-08-07,40562,2.0,7.0,0.0
+2013-08-08,39204,3.0,7.0,0.0
+2013-08-09,36126,4.0,7.0,0.0
+2013-08-10,23322,5.0,7.0,0.0
+2013-08-11,24528,6.0,7.0,0.0
+2013-08-12,37294,0.0,7.0,0.0
+2013-08-13,38848,1.0,7.0,0.0
+2013-08-14,38772,2.0,7.0,0.0
+2013-08-15,34626,3.0,7.0,0.0
+2013-08-16,34857,4.0,7.0,0.0
+2013-08-17,23932,5.0,7.0,0.0
+2013-08-18,24779,6.0,7.0,0.0
+2013-08-19,37843,0.0,7.0,0.0
+2013-08-20,38890,1.0,7.0,0.0
+2013-08-21,39298,2.0,7.0,0.0
+2013-08-22,38649,3.0,7.0,0.0
+2013-08-23,36410,4.0,7.0,0.0
+2013-08-24,23893,5.0,7.0,0.0
+2013-08-25,25183,6.0,7.0,0.0
+2013-08-26,37745,0.0,7.0,0.0
+2013-08-27,40279,1.0,7.0,0.0
+2013-08-28,40041,2.0,7.0,0.0
+2013-08-29,39814,3.0,7.0,0.0
+2013-08-30,36737,4.0,7.0,0.0
+2013-08-31,23496,5.0,7.0,0.0
+2013-09-01,24887,6.0,8.0,0.0
+2013-09-02,34734,0.0,8.0,1.0
+2013-09-03,40062,1.0,8.0,0.0
+2013-09-04,40547,2.0,8.0,0.0
+2013-09-05,39817,3.0,8.0,0.0
+2013-09-06,36795,4.0,8.0,0.0
+2013-09-07,25041,5.0,8.0,0.0
+2013-09-08,26867,6.0,8.0,0.0
+2013-09-09,40162,0.0,8.0,0.0
+2013-09-10,41282,1.0,8.0,0.0
+2013-09-11,41776,2.0,8.0,0.0
+2013-09-12,40797,3.0,8.0,0.0
+2013-09-13,39038,4.0,8.0,0.0
+2013-09-14,25547,5.0,8.0,0.0
+2013-09-15,27248,6.0,8.0,0.0
+2013-09-16,41174,0.0,8.0,0.0
+2013-09-17,41800,1.0,8.0,0.0
+2013-09-18,40673,2.0,8.0,0.0
+2013-09-19,35777,3.0,8.0,0.0
+2013-09-20,37267,4.0,8.0,0.0
+2013-09-21,25963,5.0,8.0,0.0
+2013-09-22,28105,6.0,8.0,0.0
+2013-09-23,40921,0.0,8.0,0.0
+2013-09-24,42979,1.0,8.0,0.0
+2013-09-25,42683,2.0,8.0,0.0
+2013-09-26,42336,3.0,8.0,0.0
+2013-09-27,39720,4.0,8.0,0.0
+2013-09-28,26060,5.0,8.0,0.0
+2013-09-29,29404,6.0,8.0,0.0
+2013-09-30,41805,0.0,8.0,0.0
+2013-10-01,41029,1.0,9.0,0.0
+2013-10-02,41378,2.0,9.0,0.0
+2013-10-03,40288,3.0,9.0,0.0
+2013-10-04,38966,4.0,9.0,0.0
+2013-10-05,26606,5.0,9.0,0.0
+2013-10-06,28694,6.0,9.0,0.0
+2013-10-07,42983,0.0,9.0,0.0
+2013-10-08,45969,1.0,9.0,0.0
+2013-10-09,45673,2.0,9.0,0.0
+2013-10-10,44823,3.0,9.0,0.0
+2013-10-11,42240,4.0,9.0,0.0
+2013-10-12,28719,5.0,9.0,0.0
+2013-10-13,29129,6.0,9.0,0.0
+2013-10-14,42706,0.0,9.0,1.0
+2013-10-15,45380,1.0,9.0,0.0
+2013-10-16,46301,2.0,9.0,0.0
+2013-10-17,45649,3.0,9.0,0.0
+2013-10-18,42778,4.0,9.0,0.0
+2013-10-19,28774,5.0,9.0,0.0
+2013-10-20,31296,6.0,9.0,0.0
+2013-10-21,45838,0.0,9.0,0.0
+2013-10-22,46948,1.0,9.0,0.0
+2013-10-23,46510,2.0,9.0,0.0
+2013-10-24,44514,3.0,9.0,0.0
+2013-10-25,44395,4.0,9.0,0.0
+2013-10-26,29485,5.0,9.0,0.0
+2013-10-27,31661,6.0,9.0,0.0
+2013-10-28,46946,0.0,9.0,0.0
+2013-10-29,48500,1.0,9.0,0.0
+2013-10-30,48321,2.0,9.0,0.0
+2013-10-31,46159,3.0,9.0,0.0
+2013-11-01,41112,4.0,10.0,0.0
+2013-11-02,29827,5.0,10.0,0.0
+2013-11-03,31521,6.0,10.0,0.0
+2013-11-04,47735,0.0,10.0,0.0
+2013-11-05,49358,1.0,10.0,0.0
+2013-11-06,49622,2.0,10.0,0.0
+2013-11-07,48864,3.0,10.0,0.0
+2013-11-08,46153,4.0,10.0,0.0
+2013-11-09,31598,5.0,10.0,0.0
+2013-11-10,33505,6.0,10.0,0.0
+2013-11-11,47101,0.0,10.0,1.0
+2013-11-12,50609,1.0,10.0,0.0
+2013-11-13,48306,2.0,10.0,0.0
+2013-11-14,49673,3.0,10.0,0.0
+2013-11-15,46797,4.0,10.0,0.0
+2013-11-16,32098,5.0,10.0,0.0
+2013-11-17,34542,6.0,10.0,0.0
+2013-11-18,50981,0.0,10.0,0.0
+2013-11-19,51901,1.0,10.0,0.0
+2013-11-20,51862,2.0,10.0,0.0
+2013-11-21,51330,3.0,10.0,0.0
+2013-11-22,48100,4.0,10.0,0.0
+2013-11-23,32590,5.0,10.0,0.0
+2013-11-24,34863,6.0,10.0,0.0
+2013-11-25,49346,0.0,10.0,0.0
+2013-11-26,51549,1.0,10.0,0.0
+2013-11-27,49231,2.0,10.0,0.0
+2013-11-28,42985,3.0,10.0,1.0
+2013-11-29,39014,4.0,10.0,0.0
+2013-11-30,29927,5.0,10.0,0.0
+2013-12-01,32875,6.0,11.0,0.0
+2013-12-02,50342,0.0,11.0,0.0
+2013-12-03,52500,1.0,11.0,0.0
+2013-12-04,52398,2.0,11.0,0.0
+2013-12-05,51352,3.0,11.0,0.0
+2013-12-06,47337,4.0,11.0,0.0
+2013-12-07,32551,5.0,11.0,0.0
+2013-12-08,34756,6.0,11.0,0.0
+2013-12-09,50839,0.0,11.0,0.0
+2013-12-10,51506,1.0,11.0,0.0
+2013-12-11,50204,2.0,11.0,0.0
+2013-12-12,48640,3.0,11.0,0.0
+2013-12-13,45504,4.0,11.0,0.0
+2013-12-14,30350,5.0,11.0,0.0
+2013-12-15,32192,6.0,11.0,0.0
+2013-12-16,47571,0.0,11.0,0.0
+2013-12-17,48189,1.0,11.0,0.0
+2013-12-18,46983,2.0,11.0,0.0
+2013-12-19,44986,3.0,11.0,0.0
+2013-12-20,41717,4.0,11.0,0.0
+2013-12-21,26649,5.0,11.0,0.0
+2013-12-22,26917,6.0,11.0,0.0
+2013-12-23,36144,0.0,11.0,0.0
+2013-12-24,30015,1.0,11.0,0.0
+2013-12-25,23280,2.0,11.0,1.0
+2013-12-26,29732,3.0,11.0,0.0
+2013-12-27,32334,4.0,11.0,0.0
+2013-12-28,26369,5.0,11.0,0.0
+2013-12-29,27110,6.0,11.0,0.0
+2013-12-30,35237,0.0,11.0,0.0
+2013-12-31,12471,1.0,11.0,0.0
+2014-01-01,19103,2.0,0.0,1.0
+2014-01-02,38454,3.0,0.0,0.0
+2014-01-03,38788,4.0,0.0,0.0
+2014-01-04,31132,5.0,0.0,0.0
+2014-01-05,32334,6.0,0.0,0.0
+2014-01-06,44539,0.0,0.0,0.0
+2014-01-07,47256,1.0,0.0,0.0
+2014-01-08,47472,2.0,0.0,0.0
+2014-01-09,48662,3.0,0.0,0.0
+2014-01-10,46462,4.0,0.0,0.0
+2014-01-11,32376,5.0,0.0,0.0
+2014-01-12,34043,6.0,0.0,0.0
+2014-01-13,49000,0.0,0.0,0.0
+2014-01-14,50766,1.0,0.0,0.0
+2014-01-15,51247,2.0,0.0,0.0
+2014-01-16,51321,3.0,0.0,0.0
+2014-01-17,48280,4.0,0.0,0.0
+2014-01-18,33741,5.0,0.0,0.0
+2014-01-19,35398,6.0,0.0,0.0
+2014-01-20,48750,0.0,0.0,1.0
+2014-01-21,52079,1.0,0.0,0.0
+2014-01-22,52542,2.0,0.0,0.0
+2014-01-23,52376,3.0,0.0,0.0
+2014-01-24,48155,4.0,0.0,0.0
+2014-01-25,36337,5.0,0.0,0.0
+2014-01-26,38223,6.0,0.0,0.0
+2014-01-27,51032,0.0,0.0,0.0
+2014-01-28,52414,1.0,0.0,0.0
+2014-01-29,51673,2.0,0.0,0.0
+2014-01-30,50439,3.0,0.0,0.0
+2014-01-31,47161,4.0,0.0,0.0
+2014-02-01,33166,5.0,1.0,0.0
+2014-02-02,34890,6.0,1.0,0.0
+2014-02-03,47975,0.0,1.0,0.0
+2014-02-04,51265,1.0,1.0,0.0
+2014-02-05,51264,2.0,1.0,0.0
+2014-02-06,52288,3.0,1.0,0.0
+2014-02-07,50247,4.0,1.0,0.0
+2014-02-08,36698,5.0,1.0,0.0
+2014-02-09,38503,6.0,1.0,0.0
+2014-02-10,54226,0.0,1.0,0.0
+2014-02-11,56115,1.0,1.0,0.0
+2014-02-12,56224,2.0,1.0,0.0
+2014-02-13,55362,3.0,1.0,0.0
+2014-02-14,50776,4.0,1.0,0.0
+2014-02-15,35096,5.0,1.0,0.0
+2014-02-16,38108,6.0,1.0,0.0
+2014-02-17,53408,0.0,1.0,1.0
+2014-02-18,57332,1.0,1.0,0.0
+2014-02-19,56375,2.0,1.0,0.0
+2014-02-20,53624,3.0,1.0,0.0
+2014-02-21,53840,4.0,1.0,0.0
+2014-02-22,37965,5.0,1.0,0.0
+2014-02-23,38901,6.0,1.0,0.0
+2014-02-24,57056,0.0,1.0,0.0
+2014-02-25,58723,1.0,1.0,0.0
+2014-02-26,58317,2.0,1.0,0.0
+2014-02-27,58104,3.0,1.0,0.0
+2014-02-28,54417,4.0,1.0,0.0
+2014-03-01,37253,5.0,2.0,0.0
+2014-03-02,39545,6.0,2.0,0.0
+2014-03-03,56281,0.0,2.0,0.0
+2014-03-04,58275,1.0,2.0,0.0
+2014-03-05,58531,2.0,2.0,0.0
+2014-03-06,58230,3.0,2.0,0.0
+2014-03-07,54381,4.0,2.0,0.0
+2014-03-08,36908,5.0,2.0,0.0
+2014-03-09,38903,6.0,2.0,0.0
+2014-03-10,57466,0.0,2.0,0.0
+2014-03-11,58201,1.0,2.0,0.0
+2014-03-12,59508,2.0,2.0,0.0
+2014-03-13,58819,3.0,2.0,0.0
+2014-03-14,54631,4.0,2.0,0.0
+2014-03-15,37045,5.0,2.0,0.0
+2014-03-16,40071,6.0,2.0,0.0
+2014-03-17,58119,0.0,2.0,0.0
+2014-03-18,60296,1.0,2.0,0.0
+2014-03-19,60348,2.0,2.0,0.0
+2014-03-20,59653,3.0,2.0,0.0
+2014-03-21,54723,4.0,2.0,0.0
+2014-03-22,38438,5.0,2.0,0.0
+2014-03-23,41116,6.0,2.0,0.0
+2014-03-24,59413,0.0,2.0,0.0
+2014-03-25,58491,1.0,2.0,0.0
+2014-03-26,57706,2.0,2.0,0.0
+2014-03-27,59629,3.0,2.0,0.0
+2014-03-28,55961,4.0,2.0,0.0
+2014-03-29,37785,5.0,2.0,0.0
+2014-03-30,40405,6.0,2.0,0.0
+2014-03-31,59279,0.0,2.0,0.0
+2014-04-01,59904,1.0,3.0,0.0
+2014-04-02,61078,2.0,3.0,0.0
+2014-04-03,60665,3.0,3.0,0.0
+2014-04-04,56880,4.0,3.0,0.0
+2014-04-05,38088,5.0,3.0,0.0
+2014-04-06,40745,6.0,3.0,0.0
+2014-04-07,59201,0.0,3.0,0.0
+2014-04-08,62373,1.0,3.0,0.0
+2014-04-09,61447,2.0,3.0,0.0
+2014-04-10,60721,3.0,3.0,0.0
+2014-04-11,57113,4.0,3.0,0.0
+2014-04-12,38778,5.0,3.0,0.0
+2014-04-13,41754,6.0,3.0,0.0
+2014-04-14,60584,0.0,3.0,0.0
+2014-04-15,62573,1.0,3.0,0.0
+2014-04-16,61692,2.0,3.0,0.0
+2014-04-17,58954,3.0,3.0,0.0
+2014-04-18,50828,4.0,3.0,0.0
+2014-04-19,37908,5.0,3.0,0.0
+2014-04-20,37161,6.0,3.0,0.0
+2014-04-21,53971,0.0,3.0,0.0
+2014-04-22,63154,1.0,3.0,0.0
+2014-04-23,64034,2.0,3.0,0.0
+2014-04-24,63013,3.0,3.0,0.0
+2014-04-25,58866,4.0,3.0,0.0
+2014-04-26,41588,5.0,3.0,0.0
+2014-04-27,46281,6.0,3.0,0.0
+2014-04-28,62917,0.0,3.0,0.0
+2014-04-29,58880,1.0,3.0,0.0
+2014-04-30,59201,2.0,3.0,0.0
+2014-05-01,51873,3.0,4.0,0.0
+2014-05-02,53187,4.0,4.0,0.0
+2014-05-03,38524,5.0,4.0,0.0
+2014-05-04,42442,6.0,4.0,0.0
+2014-05-05,59938,0.0,4.0,0.0
+2014-05-06,63226,1.0,4.0,0.0
+2014-05-07,64056,2.0,4.0,0.0
+2014-05-08,62352,3.0,4.0,0.0
+2014-05-09,58471,4.0,4.0,0.0
+2014-05-10,40697,5.0,4.0,0.0
+2014-05-11,42929,6.0,4.0,0.0
+2014-05-12,62494,0.0,4.0,0.0
+2014-05-13,63889,1.0,4.0,0.0
+2014-05-14,63489,2.0,4.0,0.0
+2014-05-15,62615,3.0,4.0,0.0
+2014-05-16,57477,4.0,4.0,0.0
+2014-05-17,38612,5.0,4.0,0.0
+2014-05-18,41834,6.0,4.0,0.0
+2014-05-19,61093,0.0,4.0,0.0
+2014-05-20,63539,1.0,4.0,0.0
+2014-05-21,63520,2.0,4.0,0.0
+2014-05-22,62947,3.0,4.0,0.0
+2014-05-23,58949,4.0,4.0,0.0
+2014-05-24,38860,5.0,4.0,0.0
+2014-05-25,41500,6.0,4.0,0.0
+2014-05-26,54191,0.0,4.0,1.0
+2014-05-27,62110,1.0,4.0,0.0
+2014-05-28,62119,2.0,4.0,0.0
+2014-05-29,57778,3.0,4.0,0.0
+2014-05-30,55360,4.0,4.0,0.0
+2014-05-31,36956,5.0,4.0,0.0
+2014-06-01,38803,6.0,5.0,0.0
+2014-06-02,57279,0.0,5.0,0.0
+2014-06-03,61545,1.0,5.0,0.0
+2014-06-04,62143,2.0,5.0,0.0
+2014-06-05,61565,3.0,5.0,0.0
+2014-06-06,56682,4.0,5.0,0.0
+2014-06-07,37230,5.0,5.0,0.0
+2014-06-08,39439,6.0,5.0,0.0
+2014-06-09,56406,0.0,5.0,0.0
+2014-06-10,60934,1.0,5.0,0.0
+2014-06-11,61030,2.0,5.0,0.0
+2014-06-12,58531,3.0,5.0,0.0
+2014-06-13,54801,4.0,5.0,0.0
+2014-06-14,36688,5.0,5.0,0.0
+2014-06-15,38911,6.0,5.0,0.0
+2014-06-16,57628,0.0,5.0,0.0
+2014-06-17,60437,1.0,5.0,0.0
+2014-06-18,59666,2.0,5.0,0.0
+2014-06-19,58550,3.0,5.0,0.0
+2014-06-20,54734,4.0,5.0,0.0
+2014-06-21,36936,5.0,5.0,0.0
+2014-06-22,40998,6.0,5.0,0.0
+2014-06-23,57817,0.0,5.0,0.0
+2014-06-24,59898,1.0,5.0,0.0
+2014-06-25,59275,2.0,5.0,0.0
+2014-06-26,58194,3.0,5.0,0.0
+2014-06-27,54687,4.0,5.0,0.0
+2014-06-28,34376,5.0,5.0,0.0
+2014-06-29,36039,6.0,5.0,0.0
+2014-06-30,56288,0.0,5.0,0.0
+2014-07-01,57564,1.0,6.0,0.0
+2014-07-02,58226,2.0,6.0,0.0
+2014-07-03,57447,3.0,6.0,0.0
+2014-07-04,46868,4.0,6.0,1.0
+2014-07-05,31976,5.0,6.0,0.0
+2014-07-06,35625,6.0,6.0,0.0
+2014-07-07,57648,0.0,6.0,0.0
+2014-07-08,59817,1.0,6.0,0.0
+2014-07-09,58684,2.0,6.0,0.0
+2014-07-10,59610,3.0,6.0,0.0
+2014-07-11,56361,4.0,6.0,0.0
+2014-07-12,36405,5.0,6.0,0.0
+2014-07-13,37367,6.0,6.0,0.0
+2014-07-14,57220,0.0,6.0,0.0
+2014-07-15,60954,1.0,6.0,0.0
+2014-07-16,60772,2.0,6.0,0.0
+2014-07-17,58139,3.0,6.0,0.0
+2014-07-18,55605,4.0,6.0,0.0
+2014-07-19,35444,5.0,6.0,0.0
+2014-07-20,37516,6.0,6.0,0.0
+2014-07-21,58789,0.0,6.0,0.0
+2014-07-22,61115,1.0,6.0,0.0
+2014-07-23,61183,2.0,6.0,0.0
+2014-07-24,60482,3.0,6.0,0.0
+2014-07-25,56642,4.0,6.0,0.0
+2014-07-26,37052,5.0,6.0,0.0
+2014-07-27,40482,6.0,6.0,0.0
+2014-07-28,58625,0.0,6.0,0.0
+2014-07-29,60214,1.0,6.0,0.0
+2014-07-30,60244,2.0,6.0,0.0
+2014-07-31,59555,3.0,6.0,0.0
+2014-08-01,54851,4.0,7.0,0.0
+2014-08-02,34918,5.0,7.0,0.0
+2014-08-03,36852,6.0,7.0,0.0
+2014-08-04,57355,0.0,7.0,0.0
+2014-08-05,60536,1.0,7.0,0.0
+2014-08-06,60691,2.0,7.0,0.0
+2014-08-07,59387,3.0,7.0,0.0
+2014-08-08,55824,4.0,7.0,0.0
+2014-08-09,35770,5.0,7.0,0.0
+2014-08-10,38102,6.0,7.0,0.0
+2014-08-11,59054,0.0,7.0,0.0
+2014-08-12,60590,1.0,7.0,0.0
+2014-08-13,60448,2.0,7.0,0.0
+2014-08-14,58944,3.0,7.0,0.0
+2014-08-15,53160,4.0,7.0,0.0
+2014-08-16,35988,5.0,7.0,0.0
+2014-08-17,39009,6.0,7.0,0.0
+2014-08-18,59031,0.0,7.0,0.0
+2014-08-19,61664,1.0,7.0,0.0
+2014-08-20,61490,2.0,7.0,0.0
+2014-08-21,61343,3.0,7.0,0.0
+2014-08-22,58054,4.0,7.0,0.0
+2014-08-23,38573,5.0,7.0,0.0
+2014-08-24,41813,6.0,7.0,0.0
+2014-08-25,58804,0.0,7.0,0.0
+2014-08-26,61870,1.0,7.0,0.0
+2014-08-27,61716,2.0,7.0,0.0
+2014-08-28,60539,3.0,7.0,0.0
+2014-08-29,56147,4.0,7.0,0.0
+2014-08-30,36483,5.0,7.0,0.0
+2014-08-31,38402,6.0,7.0,0.0
+2014-09-01,53643,0.0,8.0,1.0
+2014-09-02,62318,1.0,8.0,0.0
+2014-09-03,63877,2.0,8.0,0.0
+2014-09-04,63233,3.0,8.0,0.0
+2014-09-05,59368,4.0,8.0,0.0
+2014-09-06,39023,5.0,8.0,0.0
+2014-09-07,40969,6.0,8.0,0.0
+2014-09-08,59558,0.0,8.0,0.0
+2014-09-09,63536,1.0,8.0,0.0
+2014-09-10,64457,2.0,8.0,0.0
+2014-09-11,64373,3.0,8.0,0.0
+2014-09-12,60704,4.0,8.0,0.0
+2014-09-13,40285,5.0,8.0,0.0
+2014-09-14,42980,6.0,8.0,0.0
+2014-09-15,63854,0.0,8.0,0.0
+2014-09-16,66603,1.0,8.0,0.0
+2014-09-17,66943,2.0,8.0,0.0
+2014-09-18,65374,3.0,8.0,0.0
+2014-09-19,61976,4.0,8.0,0.0
+2014-09-20,41540,5.0,8.0,0.0
+2014-09-21,45895,6.0,8.0,0.0
+2014-09-22,65680,0.0,8.0,0.0
+2014-09-23,65894,1.0,8.0,0.0
+2014-09-24,67516,2.0,8.0,0.0
+2014-09-25,66172,3.0,8.0,0.0
+2014-09-26,62052,4.0,8.0,0.0
+2014-09-27,40681,5.0,8.0,0.0
+2014-09-28,44507,6.0,8.0,0.0
+2014-09-29,66009,0.0,8.0,0.0
+2014-09-30,65377,1.0,8.0,0.0
+2014-10-01,64361,2.0,9.0,0.0
+2014-10-02,63192,3.0,9.0,0.0
+2014-10-03,58623,4.0,9.0,0.0
+2014-10-04,40046,5.0,9.0,0.0
+2014-10-05,42635,6.0,9.0,0.0
+2014-10-06,64289,0.0,9.0,0.0
+2014-10-07,67728,1.0,9.0,0.0
+2014-10-08,70580,2.0,9.0,0.0
+2014-10-09,68939,3.0,9.0,0.0
+2014-10-10,65565,4.0,9.0,0.0
+2014-10-11,45396,5.0,9.0,0.0
+2014-10-12,46315,6.0,9.0,0.0
+2014-10-13,68081,0.0,9.0,1.0
+2014-10-14,70462,1.0,9.0,0.0
+2014-10-15,71679,2.0,9.0,0.0
+2014-10-16,71133,3.0,9.0,0.0
+2014-10-17,66584,4.0,9.0,0.0
+2014-10-18,45259,5.0,9.0,0.0
+2014-10-19,46726,6.0,9.0,0.0
+2014-10-20,71061,0.0,9.0,0.0
+2014-10-21,74351,1.0,9.0,0.0
+2014-10-22,71496,2.0,9.0,0.0
+2014-10-23,72852,3.0,9.0,0.0
+2014-10-24,68836,4.0,9.0,0.0
+2014-10-25,46343,5.0,9.0,0.0
+2014-10-26,51704,6.0,9.0,0.0
+2014-10-27,72386,0.0,9.0,0.0
+2014-10-28,73319,1.0,9.0,0.0
+2014-10-29,71694,2.0,9.0,0.0
+2014-10-30,73188,3.0,9.0,0.0
+2014-10-31,66606,4.0,9.0,0.0
+2014-11-01,43864,5.0,10.0,0.0
+2014-11-02,48725,6.0,10.0,0.0
+2014-11-03,72901,0.0,10.0,0.0
+2014-11-04,75637,1.0,10.0,0.0
+2014-11-05,76423,2.0,10.0,0.0
+2014-11-06,74164,3.0,10.0,0.0
+2014-11-07,71186,4.0,10.0,0.0
+2014-11-08,49033,5.0,10.0,0.0
+2014-11-09,52480,6.0,10.0,0.0
+2014-11-10,74984,0.0,10.0,0.0
+2014-11-11,76113,1.0,10.0,1.0
+2014-11-12,77768,2.0,10.0,0.0
+2014-11-13,77072,3.0,10.0,0.0
+2014-11-14,72203,4.0,10.0,0.0
+2014-11-15,50149,5.0,10.0,0.0
+2014-11-16,53584,6.0,10.0,0.0
+2014-11-17,77223,0.0,10.0,0.0
+2014-11-18,79371,1.0,10.0,0.0
+2014-11-19,79472,2.0,10.0,0.0
+2014-11-20,78357,3.0,10.0,0.0
+2014-11-21,73355,4.0,10.0,0.0
+2014-11-22,50881,5.0,10.0,0.0
+2014-11-23,55522,6.0,10.0,0.0
+2014-11-24,78109,0.0,10.0,0.0
+2014-11-25,79387,1.0,10.0,0.0
+2014-11-26,76170,2.0,10.0,0.0
+2014-11-27,67321,3.0,10.0,1.0
+2014-11-28,61368,4.0,10.0,0.0
+2014-11-29,46144,5.0,10.0,0.0
+2014-11-30,51623,6.0,10.0,0.0
+2014-12-01,77869,0.0,11.0,0.0
+2014-12-02,80799,1.0,11.0,0.0
+2014-12-03,80672,2.0,11.0,0.0
+2014-12-04,78755,3.0,11.0,0.0
+2014-12-05,74423,4.0,11.0,0.0
+2014-12-06,51209,5.0,11.0,0.0
+2014-12-07,54238,6.0,11.0,0.0
+2014-12-08,77058,0.0,11.0,0.0
+2014-12-09,79147,1.0,11.0,0.0
+2014-12-10,77471,2.0,11.0,0.0
+2014-12-11,76005,3.0,11.0,0.0
+2014-12-12,70489,4.0,11.0,0.0
+2014-12-13,47806,5.0,11.0,0.0
+2014-12-14,50486,6.0,11.0,0.0
+2014-12-15,73353,0.0,11.0,0.0
+2014-12-16,74217,1.0,11.0,0.0
+2014-12-17,72849,2.0,11.0,0.0
+2014-12-18,70140,3.0,11.0,0.0
+2014-12-19,64016,4.0,11.0,0.0
+2014-12-20,42131,5.0,11.0,0.0
+2014-12-21,45466,6.0,11.0,0.0
+2014-12-22,59804,0.0,11.0,0.0
+2014-12-23,57678,1.0,11.0,0.0
+2014-12-24,45609,2.0,11.0,0.0
+2014-12-25,34924,3.0,11.0,1.0
+2014-12-26,40747,4.0,11.0,0.0
+2014-12-27,37359,5.0,11.0,0.0
+2014-12-28,39682,6.0,11.0,0.0
+2014-12-29,53699,0.0,11.0,0.0
+2014-12-30,54029,1.0,11.0,0.0
+2014-12-31,18574,2.0,11.0,0.0
+2015-01-01,33211,3.0,0.0,1.0
+2015-01-02,48077,4.0,0.0,0.0
+2015-01-03,44563,5.0,0.0,0.0
+2015-01-04,49137,6.0,0.0,0.0
+2015-01-05,66676,0.0,0.0,0.0
+2015-01-06,66039,1.0,0.0,0.0
+2015-01-07,70055,2.0,0.0,0.0
+2015-01-08,71505,3.0,0.0,0.0
+2015-01-09,66446,4.0,0.0,0.0
+2015-01-10,49634,5.0,0.0,0.0
+2015-01-11,52346,6.0,0.0,0.0
+2015-01-12,76021,0.0,0.0,0.0
+2015-01-13,77374,1.0,0.0,0.0
+2015-01-14,78209,2.0,0.0,0.0
+2015-01-15,77896,3.0,0.0,0.0
+2015-01-16,73533,4.0,0.0,0.0
+2015-01-17,51229,5.0,0.0,0.0
+2015-01-18,54212,6.0,0.0,0.0
+2015-01-19,75243,0.0,0.0,1.0
+2015-01-20,80898,1.0,0.0,0.0
+2015-01-21,81397,2.0,0.0,0.0
+2015-01-22,80848,3.0,0.0,0.0
+2015-01-23,77202,4.0,0.0,0.0
+2015-01-24,55935,5.0,0.0,0.0
+2015-01-25,61597,6.0,0.0,0.0
+2015-01-26,79962,0.0,0.0,0.0
+2015-01-27,82207,1.0,0.0,0.0
+2015-01-28,82554,2.0,0.0,0.0
+2015-01-29,81467,3.0,0.0,0.0
+2015-01-30,76405,4.0,0.0,0.0
+2015-01-31,53052,5.0,0.0,0.0
+2015-02-01,55516,6.0,1.0,0.0
+2015-02-02,78483,0.0,1.0,0.0
+2015-02-03,80571,1.0,1.0,0.0
+2015-02-04,83041,2.0,1.0,0.0
+2015-02-05,82992,3.0,1.0,0.0
+2015-02-06,79509,4.0,1.0,0.0
+2015-02-07,54980,5.0,1.0,0.0
+2015-02-08,59201,6.0,1.0,0.0
+2015-02-09,84344,0.0,1.0,0.0
+2015-02-10,85600,1.0,1.0,0.0
+2015-02-11,84990,2.0,1.0,0.0
+2015-02-12,84056,3.0,1.0,0.0
+2015-02-13,78771,4.0,1.0,0.0
+2015-02-14,50473,5.0,1.0,0.0
+2015-02-15,55681,6.0,1.0,0.0
+2015-02-16,76934,0.0,1.0,1.0
+2015-02-17,80882,1.0,1.0,0.0
+2015-02-18,80672,2.0,1.0,0.0
+2015-02-19,79879,3.0,1.0,0.0
+2015-02-20,77309,4.0,1.0,0.0
+2015-02-21,56256,5.0,1.0,0.0
+2015-02-22,62005,6.0,1.0,0.0
+2015-02-23,81400,0.0,1.0,0.0
+2015-02-24,84252,1.0,1.0,0.0
+2015-02-25,85804,2.0,1.0,0.0
+2015-02-26,86417,3.0,1.0,0.0
+2015-02-27,81035,4.0,1.0,0.0
+2015-02-28,57647,5.0,1.0,0.0
+2015-03-01,59286,6.0,2.0,0.0
+2015-03-02,87020,0.0,2.0,0.0
+2015-03-03,89520,1.0,2.0,0.0
+2015-03-04,90519,2.0,2.0,0.0
+2015-03-05,88078,3.0,2.0,0.0
+2015-03-06,83016,4.0,2.0,0.0
+2015-03-07,57201,5.0,2.0,0.0
+2015-03-08,60121,6.0,2.0,0.0
+2015-03-09,88330,0.0,2.0,0.0
+2015-03-10,91456,1.0,2.0,0.0
+2015-03-11,91102,2.0,2.0,0.0
+2015-03-12,90934,3.0,2.0,0.0
+2015-03-13,86003,4.0,2.0,0.0
+2015-03-14,58089,5.0,2.0,0.0
+2015-03-15,62177,6.0,2.0,0.0
+2015-03-16,90924,0.0,2.0,0.0
+2015-03-17,93210,1.0,2.0,0.0
+2015-03-18,92153,2.0,2.0,0.0
+2015-03-19,91674,3.0,2.0,0.0
+2015-03-20,86065,4.0,2.0,0.0
+2015-03-21,59532,5.0,2.0,0.0
+2015-03-22,65999,6.0,2.0,0.0
+2015-03-23,91418,0.0,2.0,0.0
+2015-03-24,94159,1.0,2.0,0.0
+2015-03-25,93458,2.0,2.0,0.0
+2015-03-26,92072,3.0,2.0,0.0
+2015-03-27,83128,4.0,2.0,0.0
+2015-03-28,57894,5.0,2.0,0.0
+2015-03-29,60676,6.0,2.0,0.0
+2015-03-30,91212,0.0,2.0,0.0
+2015-03-31,93079,1.0,2.0,0.0
+2015-04-01,90691,2.0,3.0,0.0
+2015-04-02,86589,3.0,3.0,0.0
+2015-04-03,74443,4.0,3.0,0.0
+2015-04-04,55184,5.0,3.0,0.0
+2015-04-05,54861,6.0,3.0,0.0
+2015-04-06,78892,0.0,3.0,0.0
+2015-04-07,93458,1.0,3.0,0.0
+2015-04-08,95291,2.0,3.0,0.0
+2015-04-09,93141,3.0,3.0,0.0
+2015-04-10,86853,4.0,3.0,0.0
+2015-04-11,59522,5.0,3.0,0.0
+2015-04-12,63432,6.0,3.0,0.0
+2015-04-13,91817,0.0,3.0,0.0
+2015-04-14,94974,1.0,3.0,0.0
+2015-04-15,94061,2.0,3.0,0.0
+2015-04-16,94221,3.0,3.0,0.0
+2015-04-17,88699,4.0,3.0,0.0
+2015-04-18,59654,5.0,3.0,0.0
+2015-04-19,65146,6.0,3.0,0.0
+2015-04-20,94916,0.0,3.0,0.0
+2015-04-21,97299,1.0,3.0,0.0
+2015-04-22,97751,2.0,3.0,0.0
+2015-04-23,95638,3.0,3.0,0.0
+2015-04-24,89613,4.0,3.0,0.0
+2015-04-25,61119,5.0,3.0,0.0
+2015-04-26,68408,6.0,3.0,0.0
+2015-04-27,94300,0.0,3.0,0.0
+2015-04-28,97417,1.0,3.0,0.0
+2015-04-29,95247,2.0,3.0,0.0
+2015-04-30,89512,3.0,3.0,0.0
+2015-05-01,71100,4.0,4.0,0.0
+2015-05-02,55068,5.0,4.0,0.0
+2015-05-03,59245,6.0,4.0,0.0
+2015-05-04,89677,0.0,4.0,0.0
+2015-05-05,94643,1.0,4.0,0.0
+2015-05-06,94869,2.0,4.0,0.0
+2015-05-07,93583,3.0,4.0,0.0
+2015-05-08,85836,4.0,4.0,0.0
+2015-05-09,57774,5.0,4.0,0.0
+2015-05-10,61098,6.0,4.0,0.0
+2015-05-11,92261,0.0,4.0,0.0
+2015-05-12,96912,1.0,4.0,0.0
+2015-05-13,94490,2.0,4.0,0.0
+2015-05-14,88189,3.0,4.0,0.0
+2015-05-15,84151,4.0,4.0,0.0
+2015-05-16,57518,5.0,4.0,0.0
+2015-05-17,62282,6.0,4.0,0.0
+2015-05-18,92330,0.0,4.0,0.0
+2015-05-19,96248,1.0,4.0,0.0
+2015-05-20,96061,2.0,4.0,0.0
+2015-05-21,94121,3.0,4.0,0.0
+2015-05-22,87344,4.0,4.0,0.0
+2015-05-23,56965,5.0,4.0,0.0
+2015-05-24,60744,6.0,4.0,0.0
+2015-05-25,77609,0.0,4.0,1.0
+2015-05-26,93876,1.0,4.0,0.0
+2015-05-27,95475,2.0,4.0,0.0
+2015-05-28,92911,3.0,4.0,0.0
+2015-05-29,86540,4.0,4.0,0.0
+2015-05-30,56399,5.0,4.0,0.0
+2015-05-31,59770,6.0,4.0,0.0
+2015-06-01,89681,0.0,5.0,0.0
+2015-06-02,94065,1.0,5.0,0.0
+2015-06-03,93262,2.0,5.0,0.0
+2015-06-04,89150,3.0,5.0,0.0
+2015-06-05,84240,4.0,5.0,0.0
+2015-06-06,55264,5.0,5.0,0.0
+2015-06-07,59114,6.0,5.0,0.0
+2015-06-08,89414,0.0,5.0,0.0
+2015-06-09,94342,1.0,5.0,0.0
+2015-06-10,92730,2.0,5.0,0.0
+2015-06-11,90337,3.0,5.0,0.0
+2015-06-12,82629,4.0,5.0,0.0
+2015-06-13,54393,5.0,5.0,0.0
+2015-06-14,58454,6.0,5.0,0.0
+2015-06-15,88580,0.0,5.0,0.0
+2015-06-16,91424,1.0,5.0,0.0
+2015-06-17,91408,2.0,5.0,0.0
+2015-06-18,89458,3.0,5.0,0.0
+2015-06-19,82843,4.0,5.0,0.0
+2015-06-20,52691,5.0,5.0,0.0
+2015-06-21,57034,6.0,5.0,0.0
+2015-06-22,84455,0.0,5.0,0.0
+2015-06-23,90430,1.0,5.0,0.0
+2015-06-24,89483,2.0,5.0,0.0
+2015-06-25,88234,3.0,5.0,0.0
+2015-06-26,81883,4.0,5.0,0.0
+2015-06-27,52129,5.0,5.0,0.0
+2015-06-28,54858,6.0,5.0,0.0
+2015-06-29,86080,0.0,5.0,0.0
+2015-06-30,88498,1.0,5.0,0.0
+2015-07-01,86019,2.0,6.0,0.0
+2015-07-02,84921,3.0,6.0,0.0
+2015-07-03,72626,4.0,6.0,1.0
+2015-07-04,47682,5.0,6.0,0.0
+2015-07-05,51161,6.0,6.0,0.0
+2015-07-06,84781,0.0,6.0,0.0
+2015-07-07,89887,1.0,6.0,0.0
+2015-07-08,89657,2.0,6.0,0.0
+2015-07-09,88592,3.0,6.0,0.0
+2015-07-10,82408,4.0,6.0,0.0
+2015-07-11,52448,5.0,6.0,0.0
+2015-07-12,56396,6.0,6.0,0.0
+2015-07-13,87354,0.0,6.0,0.0
+2015-07-14,88965,1.0,6.0,0.0
+2015-07-15,88859,2.0,6.0,0.0
+2015-07-16,86788,3.0,6.0,0.0
+2015-07-17,80759,4.0,6.0,0.0
+2015-07-18,51601,5.0,6.0,0.0
+2015-07-19,55215,6.0,6.0,0.0
+2015-07-20,85913,0.0,6.0,0.0
+2015-07-21,89034,1.0,6.0,0.0
+2015-07-22,89449,2.0,6.0,0.0
+2015-07-23,89039,3.0,6.0,0.0
+2015-07-24,82762,4.0,6.0,0.0
+2015-07-25,53435,5.0,6.0,0.0
+2015-07-26,57851,6.0,6.0,0.0
+2015-07-27,87111,0.0,6.0,0.0
+2015-07-28,89813,1.0,6.0,0.0
+2015-07-29,89080,2.0,6.0,0.0
+2015-07-30,86852,3.0,6.0,0.0
+2015-07-31,80715,4.0,6.0,0.0
+2015-08-01,49693,5.0,7.0,0.0
+2015-08-02,51980,6.0,7.0,0.0
+2015-08-03,83065,0.0,7.0,0.0
+2015-08-04,87753,1.0,7.0,0.0
+2015-08-05,87047,2.0,7.0,0.0
+2015-08-06,85675,3.0,7.0,0.0
+2015-08-07,79329,4.0,7.0,0.0
+2015-08-08,50372,5.0,7.0,0.0
+2015-08-09,53900,6.0,7.0,0.0
+2015-08-10,84498,0.0,7.0,0.0
+2015-08-11,88065,1.0,7.0,0.0
+2015-08-12,88003,2.0,7.0,0.0
+2015-08-13,86159,3.0,7.0,0.0
+2015-08-14,80407,4.0,7.0,0.0
+2015-08-15,52148,5.0,7.0,0.0
+2015-08-16,55563,6.0,7.0,0.0
+2015-08-17,85716,0.0,7.0,0.0
+2015-08-18,90098,1.0,7.0,0.0
+2015-08-19,90311,2.0,7.0,0.0
+2015-08-20,89112,3.0,7.0,0.0
+2015-08-21,83607,4.0,7.0,0.0
+2015-08-22,54685,5.0,7.0,0.0
+2015-08-23,59679,6.0,7.0,0.0
+2015-08-24,87916,0.0,7.0,0.0
+2015-08-25,89785,1.0,7.0,0.0
+2015-08-26,90842,2.0,7.0,0.0
+2015-08-27,89589,3.0,7.0,0.0
+2015-08-28,84012,4.0,7.0,0.0
+2015-08-29,52998,5.0,7.0,0.0
+2015-08-30,55886,6.0,7.0,0.0
+2015-08-31,86983,0.0,7.0,0.0
+2015-09-01,91295,1.0,8.0,0.0
+2015-09-02,91046,2.0,8.0,0.0
+2015-09-03,87017,3.0,8.0,0.0
+2015-09-04,80813,4.0,8.0,0.0
+2015-09-05,54463,5.0,8.0,0.0
+2015-09-06,59864,6.0,8.0,0.0
+2015-09-07,80617,0.0,8.0,1.0
+2015-09-08,93446,1.0,8.0,0.0
+2015-09-09,94640,2.0,8.0,0.0
+2015-09-10,94089,3.0,8.0,0.0
+2015-09-11,88287,4.0,8.0,0.0
+2015-09-12,57236,5.0,8.0,0.0
+2015-09-13,61339,6.0,8.0,0.0
+2015-09-14,94100,0.0,8.0,0.0
+2015-09-15,97210,1.0,8.0,0.0
+2015-09-16,97520,2.0,8.0,0.0
+2015-09-17,95561,3.0,8.0,0.0
+2015-09-18,90210,4.0,8.0,0.0
+2015-09-19,58521,5.0,8.0,0.0
+2015-09-20,62414,6.0,8.0,0.0
+2015-09-21,96432,0.0,8.0,0.0
+2015-09-22,99956,1.0,8.0,0.0
+2015-09-23,99207,2.0,8.0,0.0
+2015-09-24,97696,3.0,8.0,0.0
+2015-09-25,90619,4.0,8.0,0.0
+2015-09-26,59733,5.0,8.0,0.0
+2015-09-27,64337,6.0,8.0,0.0
+2015-09-28,95277,0.0,8.0,0.0
+2015-09-29,99909,1.0,8.0,0.0
+2015-09-30,98496,2.0,8.0,0.0
+2015-10-01,93111,3.0,9.0,0.0
+2015-10-02,86753,4.0,9.0,0.0
+2015-10-03,58268,5.0,9.0,0.0
+2015-10-04,62592,6.0,9.0,0.0
+2015-10-05,95603,0.0,9.0,0.0
+2015-10-06,99837,1.0,9.0,0.0
+2015-10-07,100860,2.0,9.0,0.0
+2015-10-08,102409,3.0,9.0,0.0
+2015-10-09,95631,4.0,9.0,0.0
+2015-10-10,66043,5.0,9.0,0.0
+2015-10-11,66601,6.0,9.0,0.0
+2015-10-12,98066,0.0,9.0,1.0
+2015-10-13,106570,1.0,9.0,0.0
+2015-10-14,105415,2.0,9.0,0.0
+2015-10-15,104366,3.0,9.0,0.0
+2015-10-16,97556,4.0,9.0,0.0
+2015-10-17,64064,5.0,9.0,0.0
+2015-10-18,69221,6.0,9.0,0.0
+2015-10-19,105710,0.0,9.0,0.0
+2015-10-20,108226,1.0,9.0,0.0
+2015-10-21,107216,2.0,9.0,0.0
+2015-10-22,106180,3.0,9.0,0.0
+2015-10-23,99348,4.0,9.0,0.0
+2015-10-24,67090,5.0,9.0,0.0
+2015-10-25,73283,6.0,9.0,0.0
+2015-10-26,104805,0.0,9.0,0.0
+2015-10-27,111076,1.0,9.0,0.0
+2015-10-28,110991,2.0,9.0,0.0
+2015-10-29,109068,3.0,9.0,0.0
+2015-10-30,100655,4.0,9.0,0.0
+2015-10-31,63910,5.0,9.0,0.0
+2015-11-01,67454,6.0,10.0,0.0
+2015-11-02,106405,0.0,10.0,0.0
+2015-11-03,113189,1.0,10.0,0.0
+2015-11-04,112399,2.0,10.0,0.0
+2015-11-05,112257,3.0,10.0,0.0
+2015-11-06,105629,4.0,10.0,0.0
+2015-11-07,70570,5.0,10.0,0.0
+2015-11-08,75161,6.0,10.0,0.0
+2015-11-09,110784,0.0,10.0,0.0
+2015-11-10,112978,1.0,10.0,0.0
+2015-11-11,107347,2.0,10.0,1.0
+2015-11-12,111293,3.0,10.0,0.0
+2015-11-13,104493,4.0,10.0,0.0
+2015-11-14,68039,5.0,10.0,0.0
+2015-11-15,73945,6.0,10.0,0.0
+2015-11-16,111285,0.0,10.0,0.0
+2015-11-17,115457,1.0,10.0,0.0
+2015-11-18,115393,2.0,10.0,0.0
+2015-11-19,115387,3.0,10.0,0.0
+2015-11-20,107008,4.0,10.0,0.0
+2015-11-21,71677,5.0,10.0,0.0
+2015-11-22,77702,6.0,10.0,0.0
+2015-11-23,113226,0.0,10.0,0.0
+2015-11-24,114841,1.0,10.0,0.0
+2015-11-25,109386,2.0,10.0,0.0
+2015-11-26,96620,3.0,10.0,1.0
+2015-11-27,88369,4.0,10.0,0.0
+2015-11-28,66696,5.0,10.0,0.0
+2015-11-29,74591,6.0,10.0,0.0
+2015-11-30,114424,0.0,10.0,0.0
+2015-12-01,117806,1.0,11.0,0.0
+2015-12-02,118201,2.0,11.0,0.0
+2015-12-03,117780,3.0,11.0,0.0
+2015-12-04,108975,4.0,11.0,0.0
+2015-12-05,72662,5.0,11.0,0.0
+2015-12-06,76360,6.0,11.0,0.0
+2015-12-07,113903,0.0,11.0,0.0
+2015-12-08,115911,1.0,11.0,0.0
+2015-12-09,115324,2.0,11.0,0.0
+2015-12-10,113844,3.0,11.0,0.0
+2015-12-11,105420,4.0,11.0,0.0
+2015-12-12,70442,5.0,11.0,0.0
+2015-12-13,74537,6.0,11.0,0.0
+2015-12-14,110352,0.0,11.0,0.0
+2015-12-15,111033,1.0,11.0,0.0
+2015-12-16,107508,2.0,11.0,0.0
+2015-12-17,103108,3.0,11.0,0.0
+2015-12-18,93664,4.0,11.0,0.0
+2015-12-19,60441,5.0,11.0,0.0
+2015-12-20,62608,6.0,11.0,0.0
+2015-12-21,91916,0.0,11.0,0.0
+2015-12-22,91125,1.0,11.0,0.0
+2015-12-23,84466,2.0,11.0,0.0
+2015-12-24,66672,3.0,11.0,0.0
+2015-12-25,50812,4.0,11.0,1.0
+2015-12-26,49720,5.0,11.0,0.0
+2015-12-27,57018,6.0,11.0,0.0
+2015-12-28,76983,0.0,11.0,0.0
+2015-12-29,80256,1.0,11.0,0.0
+2015-12-30,78067,2.0,11.0,0.0
+2016-01-01,46109,4.0,0.0,1.0
+2016-01-02,56771,5.0,0.0,0.0
+2016-01-03,63608,6.0,0.0,0.0
+2016-01-04,96670,0.0,0.0,0.0
+2016-01-05,102054,1.0,0.0,0.0
+2016-01-06,101968,2.0,0.0,0.0
+2016-01-07,103695,3.0,0.0,0.0
+2016-01-08,99226,4.0,0.0,0.0
+2016-01-09,68617,5.0,0.0,0.0
+2016-01-10,73313,6.0,0.0,0.0
+2016-01-11,107882,0.0,0.0,0.0
+2016-01-12,111240,1.0,0.0,0.0
+2016-01-13,111346,2.0,0.0,0.0
+2016-01-14,110350,3.0,0.0,0.0
+2016-01-15,103836,4.0,0.0,0.0
+2016-01-16,69762,5.0,0.0,0.0
+2016-01-17,73548,6.0,0.0,0.0
+2016-01-18,106252,0.0,0.0,1.0
+2016-01-19,114235,1.0,0.0,0.0
+2016-01-20,114520,2.0,0.0,0.0
+2016-01-21,113333,3.0,0.0,0.0
+2016-01-22,106865,4.0,0.0,0.0
+2016-01-23,74103,5.0,0.0,0.0
+2016-01-24,78655,6.0,0.0,0.0
+2016-01-25,114045,0.0,0.0,0.0
+2016-01-26,116293,1.0,0.0,0.0
+2016-01-27,117360,2.0,0.0,0.0
+2016-01-28,112890,3.0,0.0,0.0
+2016-01-29,110408,4.0,0.0,0.0
+2016-01-30,77881,5.0,0.0,0.0
+2016-01-31,81804,6.0,0.0,0.0
+2016-02-01,115705,0.0,1.0,0.0
+2016-02-02,117639,1.0,1.0,0.0
+2016-02-03,118168,2.0,1.0,0.0
+2016-02-04,115485,3.0,1.0,0.0
+2016-02-05,106779,4.0,1.0,0.0
+2016-02-06,72602,5.0,1.0,0.0
+2016-02-07,73299,6.0,1.0,0.0
+2016-02-08,103308,0.0,1.0,0.0
+2016-02-09,110246,1.0,1.0,0.0
+2016-02-10,111835,2.0,1.0,0.0
+2016-02-11,112118,3.0,1.0,0.0
+2016-02-12,105677,4.0,1.0,0.0
+2016-02-13,74145,5.0,1.0,0.0
+2016-02-14,76379,6.0,1.0,0.0
+2016-02-15,111654,0.0,1.0,1.0
+2016-02-16,121528,1.0,1.0,0.0
+2016-02-17,122884,2.0,1.0,0.0
+2016-02-18,123112,3.0,1.0,0.0
+2016-02-19,117492,4.0,1.0,0.0
+2016-02-20,81509,5.0,1.0,0.0
+2016-02-21,86026,6.0,1.0,0.0
+2016-02-22,124960,0.0,1.0,0.0
+2016-02-23,128025,1.0,1.0,0.0
+2016-02-24,128860,2.0,1.0,0.0
+2016-02-25,126574,3.0,1.0,0.0
+2016-02-26,119158,4.0,1.0,0.0
+2016-02-27,81761,5.0,1.0,0.0
+2016-02-28,86421,6.0,1.0,0.0
+2016-02-29,125898,0.0,1.0,0.0
+2016-03-01,128020,1.0,2.0,0.0
+2016-03-02,130518,2.0,2.0,0.0
+2016-03-03,129859,3.0,2.0,0.0
+2016-03-04,121636,4.0,2.0,0.0
+2016-03-05,83814,5.0,2.0,0.0
+2016-03-06,86859,6.0,2.0,0.0
+2016-03-07,127229,0.0,2.0,0.0
+2016-03-08,129281,1.0,2.0,0.0
+2016-03-09,131505,2.0,2.0,0.0
+2016-03-10,126847,3.0,2.0,0.0
+2016-03-11,121670,4.0,2.0,0.0
+2016-03-12,82209,5.0,2.0,0.0
+2016-03-13,87358,6.0,2.0,0.0
+2016-03-14,129607,0.0,2.0,0.0
+2016-03-15,132397,1.0,2.0,0.0
+2016-03-16,132666,2.0,2.0,0.0
+2016-03-17,129579,3.0,2.0,0.0
+2016-03-18,120239,4.0,2.0,0.0
+2016-03-19,81427,5.0,2.0,0.0
+2016-03-20,86878,6.0,2.0,0.0
+2016-03-21,128245,0.0,2.0,0.0
+2016-03-22,130351,1.0,2.0,0.0
+2016-03-23,128611,2.0,2.0,0.0
+2016-03-24,122141,3.0,2.0,0.0
+2016-03-25,105815,4.0,2.0,0.0
+2016-03-26,78197,5.0,2.0,0.0
+2016-03-27,78675,6.0,2.0,0.0
+2016-03-28,116328,0.0,2.0,0.0
+2016-03-29,131001,1.0,2.0,0.0
+2016-03-30,133101,2.0,2.0,0.0
+2016-03-31,130283,3.0,2.0,0.0
+2016-04-01,119257,4.0,3.0,0.0
+2016-04-02,81281,5.0,3.0,0.0
+2016-04-03,87360,6.0,3.0,0.0
+2016-04-04,126389,0.0,3.0,0.0
+2016-04-05,133803,1.0,3.0,0.0
+2016-04-06,135934,2.0,3.0,0.0
+2016-04-07,134653,3.0,3.0,0.0
+2016-04-08,125221,4.0,3.0,0.0
+2016-04-09,85645,5.0,3.0,0.0
+2016-04-10,91857,6.0,3.0,0.0
+2016-04-11,136700,0.0,3.0,0.0
+2016-04-12,138801,1.0,3.0,0.0
+2016-04-13,137409,2.0,3.0,0.0
+2016-04-14,134651,3.0,3.0,0.0
+2016-04-15,125713,4.0,3.0,0.0
+2016-04-16,84789,5.0,3.0,0.0
+2016-04-17,90514,6.0,3.0,0.0
+2016-04-18,135770,0.0,3.0,0.0
+2016-04-19,140338,1.0,3.0,0.0
+2016-04-20,138994,2.0,3.0,0.0
+2016-04-21,134338,3.0,3.0,0.0
+2016-04-22,125713,4.0,3.0,0.0
+2016-04-23,85348,5.0,3.0,0.0
+2016-04-24,91963,6.0,3.0,0.0
+2016-04-25,135422,0.0,3.0,0.0
+2016-04-26,141059,1.0,3.0,0.0
+2016-04-27,138390,2.0,3.0,0.0
+2016-04-28,134493,3.0,3.0,0.0
+2016-04-29,123089,4.0,3.0,0.0
+2016-04-30,78081,5.0,3.0,0.0
+2016-05-01,80160,6.0,4.0,0.0
+2016-05-02,118508,0.0,4.0,0.0
+2016-05-03,131204,1.0,4.0,0.0
+2016-05-04,132146,2.0,4.0,0.0
+2016-05-05,123214,3.0,4.0,0.0
+2016-05-06,117566,4.0,4.0,0.0
+2016-05-07,78005,5.0,4.0,0.0
+2016-05-08,81871,6.0,4.0,0.0
+2016-05-09,127489,0.0,4.0,0.0
+2016-05-10,136121,1.0,4.0,0.0
+2016-05-11,135402,2.0,4.0,0.0
+2016-05-12,132926,3.0,4.0,0.0
+2016-05-13,123555,4.0,4.0,0.0
+2016-05-14,80533,5.0,4.0,0.0
+2016-05-15,84697,6.0,4.0,0.0
+2016-05-16,125306,0.0,4.0,0.0
+2016-05-17,135812,1.0,4.0,0.0
+2016-05-18,135197,2.0,4.0,0.0
+2016-05-19,131924,3.0,4.0,0.0
+2016-05-20,122504,4.0,4.0,0.0
+2016-05-21,79192,5.0,4.0,0.0
+2016-05-22,84851,6.0,4.0,0.0
+2016-05-23,127438,0.0,4.0,0.0
+2016-05-24,133972,1.0,4.0,0.0
+2016-05-25,131697,2.0,4.0,0.0
+2016-05-26,126174,3.0,4.0,0.0
+2016-05-27,117773,4.0,4.0,0.0
+2016-05-28,74793,5.0,4.0,0.0
+2016-05-29,79262,6.0,4.0,0.0
+2016-05-30,113390,0.0,4.0,1.0
+2016-05-31,129636,1.0,4.0,0.0
+2016-06-01,129838,2.0,5.0,0.0
+2016-06-02,127650,3.0,5.0,0.0
+2016-06-03,119107,4.0,5.0,0.0
+2016-06-04,76582,5.0,5.0,0.0
+2016-06-05,80829,6.0,5.0,0.0
+2016-06-06,123175,0.0,5.0,0.0
+2016-06-07,128655,1.0,5.0,0.0
+2016-06-08,126728,2.0,5.0,0.0
+2016-06-09,116963,3.0,5.0,0.0
+2016-06-10,108602,4.0,5.0,0.0
+2016-06-11,73541,5.0,5.0,0.0
+2016-06-12,82245,6.0,5.0,0.0
+2016-06-13,119977,0.0,5.0,0.0
+2016-06-14,125678,1.0,5.0,0.0
+2016-06-15,125977,2.0,5.0,0.0
+2016-06-16,122900,3.0,5.0,0.0
+2016-06-17,113905,4.0,5.0,0.0
+2016-06-18,71738,5.0,5.0,0.0
+2016-06-19,74376,6.0,5.0,0.0
+2016-06-20,93499,0.0,5.0,0.0
+2016-06-21,124257,1.0,5.0,0.0
+2016-06-22,122793,2.0,5.0,0.0
+2016-06-23,120902,3.0,5.0,0.0
+2016-06-24,108118,4.0,5.0,0.0
+2016-06-25,69170,5.0,5.0,0.0
+2016-06-26,72480,6.0,5.0,0.0
+2016-06-27,115501,0.0,5.0,0.0
+2016-06-28,121523,1.0,5.0,0.0
+2016-06-29,121456,2.0,5.0,0.0
+2016-06-30,119093,3.0,5.0,0.0
+2016-07-01,107813,4.0,6.0,0.0
+2016-07-02,66427,5.0,6.0,0.0
+2016-07-03,68168,6.0,6.0,0.0
+2016-07-04,101448,0.0,6.0,1.0
+2016-07-05,114130,1.0,6.0,0.0
+2016-07-06,118196,2.0,6.0,0.0
+2016-07-07,116360,3.0,6.0,0.0
+2016-07-08,109588,4.0,6.0,0.0
+2016-07-09,68949,5.0,6.0,0.0
+2016-07-10,71387,6.0,6.0,0.0
+2016-07-11,116802,0.0,6.0,0.0
+2016-07-12,119864,1.0,6.0,0.0
+2016-07-13,120468,2.0,6.0,0.0
+2016-07-14,117523,3.0,6.0,0.0
+2016-07-15,108681,4.0,6.0,0.0
+2016-07-16,67189,5.0,6.0,0.0
+2016-07-17,71085,6.0,6.0,0.0
+2016-07-18,116616,0.0,6.0,0.0
+2016-07-19,121000,1.0,6.0,0.0
+2016-07-20,119165,2.0,6.0,0.0
+2016-07-21,117941,3.0,6.0,0.0
+2016-07-22,110570,4.0,6.0,0.0
+2016-07-23,68398,5.0,6.0,0.0
+2016-07-24,71980,6.0,6.0,0.0
+2016-07-25,116361,0.0,6.0,0.0
+2016-07-26,120986,1.0,6.0,0.0
+2016-07-27,120932,2.0,6.0,0.0
+2016-07-28,118101,3.0,6.0,0.0
+2016-07-29,110240,4.0,6.0,0.0
+2016-07-30,69022,5.0,6.0,0.0
+2016-07-31,71959,6.0,6.0,0.0
+2016-08-01,114920,0.0,7.0,0.0
+2016-08-02,120783,1.0,7.0,0.0
+2016-08-03,119825,2.0,7.0,0.0
+2016-08-04,117712,3.0,7.0,0.0
+2016-08-05,109966,4.0,7.0,0.0
+2016-08-06,67755,5.0,7.0,0.0
+2016-08-07,70693,6.0,7.0,0.0
+2016-08-08,115440,0.0,7.0,0.0
+2016-08-09,118682,1.0,7.0,0.0
+2016-08-10,119555,2.0,7.0,0.0
+2016-08-11,117924,3.0,7.0,0.0
+2016-08-12,110083,4.0,7.0,0.0
+2016-08-13,68028,5.0,7.0,0.0
+2016-08-14,69705,6.0,7.0,0.0
+2016-08-15,109543,0.0,7.0,0.0
+2016-08-16,120896,1.0,7.0,0.0
+2016-08-17,121107,2.0,7.0,0.0
+2016-08-18,119516,3.0,7.0,0.0
+2016-08-19,112999,4.0,7.0,0.0
+2016-08-20,71603,5.0,7.0,0.0
+2016-08-21,74724,6.0,7.0,0.0
+2016-08-22,120374,0.0,7.0,0.0
+2016-08-23,125253,1.0,7.0,0.0
+2016-08-24,124546,2.0,7.0,0.0
+2016-08-25,123134,3.0,7.0,0.0
+2016-08-26,115443,4.0,7.0,0.0
+2016-08-27,73510,5.0,7.0,0.0
+2016-08-28,77456,6.0,7.0,0.0
+2016-08-29,122370,0.0,7.0,0.0
+2016-08-30,128081,1.0,7.0,0.0
+2016-08-31,127520,2.0,7.0,0.0
+2016-09-01,124829,3.0,8.0,0.0
+2016-09-02,115659,4.0,8.0,0.0
+2016-09-03,71772,5.0,8.0,0.0
+2016-09-04,76164,6.0,8.0,0.0
+2016-09-05,109751,0.0,8.0,1.0
+2016-09-06,127745,1.0,8.0,0.0
+2016-09-07,128145,2.0,8.0,0.0
+2016-09-08,127996,3.0,8.0,0.0
+2016-09-09,120314,4.0,8.0,0.0
+2016-09-10,77719,5.0,8.0,0.0
+2016-09-11,81649,6.0,8.0,0.0
+2016-09-12,127325,0.0,8.0,0.0
+2016-09-13,131451,1.0,8.0,0.0
+2016-09-14,128826,2.0,8.0,0.0
+2016-09-15,120041,3.0,8.0,0.0
+2016-09-16,113989,4.0,8.0,0.0
+2016-09-17,80862,5.0,8.0,0.0
+2016-09-18,91832,6.0,8.0,0.0
+2016-09-19,131871,0.0,8.0,0.0
+2016-09-20,138590,1.0,8.0,0.0
+2016-09-21,138146,2.0,8.0,0.0
+2016-09-22,136479,3.0,8.0,0.0
+2016-09-23,127803,4.0,8.0,0.0
+2016-09-24,81861,5.0,8.0,0.0
+2016-09-25,86861,6.0,8.0,0.0
+2016-09-26,137176,0.0,8.0,0.0
+2016-09-27,139433,1.0,8.0,0.0
+2016-09-28,140373,2.0,8.0,0.0
+2016-09-29,138011,3.0,8.0,0.0
+2016-09-30,127044,4.0,8.0,0.0
+2016-10-01,78726,5.0,9.0,0.0
+2016-10-02,82758,6.0,9.0,0.0
+2016-10-03,125866,0.0,9.0,0.0
+2016-10-04,132182,1.0,9.0,0.0
+2016-10-05,131995,2.0,9.0,0.0
+2016-10-06,132759,3.0,9.0,0.0
+2016-10-07,124588,4.0,9.0,0.0
+2016-10-08,90358,5.0,9.0,0.0
+2016-10-09,96542,6.0,9.0,0.0
+2016-10-10,135850,0.0,9.0,1.0
+2016-10-11,144073,1.0,9.0,0.0
+2016-10-12,143248,2.0,9.0,0.0
+2016-10-13,144176,3.0,9.0,0.0
+2016-10-14,134423,4.0,9.0,0.0
+2016-10-15,88312,5.0,9.0,0.0
+2016-10-16,94694,6.0,9.0,0.0
+2016-10-17,140981,0.0,9.0,0.0
+2016-10-18,150758,1.0,9.0,0.0
+2016-10-19,148760,2.0,9.0,0.0
+2016-10-20,145021,3.0,9.0,0.0
+2016-10-21,123991,4.0,9.0,0.0
+2016-10-22,90117,5.0,9.0,0.0
+2016-10-23,95498,6.0,9.0,0.0
+2016-10-24,146136,0.0,9.0,0.0
+2016-10-25,150283,1.0,9.0,0.0
+2016-10-26,149086,2.0,9.0,0.0
+2016-10-27,146600,3.0,9.0,0.0
+2016-10-28,134101,4.0,9.0,0.0
+2016-10-29,85873,5.0,9.0,0.0
+2016-10-30,91905,6.0,9.0,0.0
+2016-10-31,141022,0.0,9.0,0.0
+2016-11-01,142467,1.0,10.0,0.0
+2016-11-02,148404,2.0,10.0,0.0
+2016-11-03,149540,3.0,10.0,0.0
+2016-11-04,138040,4.0,10.0,0.0
+2016-11-05,93128,5.0,10.0,0.0
+2016-11-06,99820,6.0,10.0,0.0
+2016-11-07,150788,0.0,10.0,0.0
+2016-11-08,150053,1.0,10.0,0.0
+2016-11-09,140674,2.0,10.0,0.0
+2016-11-10,146301,3.0,10.0,0.0
+2016-11-11,132609,4.0,10.0,1.0
+2016-11-12,93843,5.0,10.0,0.0
+2016-11-13,100633,6.0,10.0,0.0
+2016-11-14,150935,0.0,10.0,0.0
+2016-11-15,156066,1.0,10.0,0.0
+2016-11-16,156273,2.0,10.0,0.0
+2016-11-17,154473,3.0,10.0,0.0
+2016-11-18,144040,4.0,10.0,0.0
+2016-11-19,95853,5.0,10.0,0.0
+2016-11-20,103220,6.0,10.0,0.0
+2016-11-21,154232,0.0,10.0,0.0
+2016-11-22,156131,1.0,10.0,0.0
+2016-11-23,149146,2.0,10.0,0.0
+2016-11-24,133080,3.0,10.0,1.0
+2016-11-25,120535,4.0,10.0,0.0
+2016-11-26,90022,5.0,10.0,0.0
+2016-11-27,100373,6.0,10.0,0.0
+2016-11-28,154971,0.0,10.0,0.0
+2016-11-29,161691,1.0,10.0,0.0
+2016-11-30,159450,2.0,10.0,0.0
+2016-12-01,157196,3.0,11.0,0.0
+2016-12-02,147743,4.0,11.0,0.0
+2016-12-03,98102,5.0,11.0,0.0
+2016-12-04,104400,6.0,11.0,0.0
+2016-12-05,156268,0.0,11.0,0.0
+2016-12-06,158169,1.0,11.0,0.0
+2016-12-07,158758,2.0,11.0,0.0
+2016-12-08,152258,3.0,11.0,0.0
+2016-12-09,142222,4.0,11.0,0.0
+2016-12-10,95665,5.0,11.0,0.0
+2016-12-11,100707,6.0,11.0,0.0
+2016-12-12,148783,0.0,11.0,0.0
+2016-12-13,152591,1.0,11.0,0.0
+2016-12-14,149908,2.0,11.0,0.0
+2016-12-15,145085,3.0,11.0,0.0
+2016-12-16,131580,4.0,11.0,0.0
+2016-12-17,84443,5.0,11.0,0.0
+2016-12-18,88845,6.0,11.0,0.0
+2016-12-19,134794,0.0,11.0,0.0
+2016-12-20,136427,1.0,11.0,0.0
+2016-12-21,131770,2.0,11.0,0.0
+2016-12-22,124751,3.0,11.0,0.0
+2016-12-23,105776,4.0,11.0,0.0
+2016-12-24,66740,5.0,11.0,0.0
+2016-12-25,60535,6.0,11.0,0.0
+2016-12-26,86775,0.0,11.0,1.0
+2016-12-27,102574,1.0,11.0,0.0
+2016-12-28,106393,2.0,11.0,0.0
+2016-12-29,105158,3.0,11.0,0.0
+2016-12-30,98098,4.0,11.0,0.0
+2016-12-31,64696,5.0,11.0,0.0
+2017-01-01,59005,6.0,0.0,0.0
+2017-01-02,95818,0.0,0.0,1.0
+2017-01-03,127728,1.0,0.0,0.0
+2017-01-04,133210,2.0,0.0,0.0
+2017-01-05,128376,3.0,0.0,0.0
+2017-01-06,125230,4.0,0.0,0.0
+2017-01-07,71521,5.0,0.0,0.0
+2017-01-08,94736,6.0,0.0,0.0
+2017-01-09,140861,0.0,0.0,0.0
+2017-01-10,145521,1.0,0.0,0.0
+2017-01-11,145604,2.0,0.0,0.0
+2017-01-12,144985,3.0,0.0,0.0
+2017-01-13,135657,4.0,0.0,0.0
+2017-01-14,91791,5.0,0.0,0.0
+2017-01-15,97570,6.0,0.0,0.0
+2017-01-16,140046,0.0,0.0,1.0
+2017-01-17,151455,1.0,0.0,0.0
+2017-01-18,151122,2.0,0.0,0.0
+2017-01-19,149733,3.0,0.0,0.0
+2017-01-20,140506,4.0,0.0,0.0
+2017-01-21,97774,5.0,0.0,0.0
+2017-01-22,106965,6.0,0.0,0.0
+2017-01-23,147843,0.0,0.0,0.0
+2017-01-24,149039,1.0,0.0,0.0
+2017-01-25,144802,2.0,0.0,0.0
+2017-01-26,138288,3.0,0.0,0.0
+2017-01-27,127738,4.0,0.0,0.0
+2017-01-28,88164,5.0,0.0,0.0
+2017-01-29,92052,6.0,0.0,0.0
+2017-01-30,137919,0.0,0.0,0.0
+2017-01-31,143069,1.0,0.0,0.0
+2017-02-01,143529,2.0,1.0,0.0
+2017-02-02,145011,3.0,1.0,0.0
+2017-02-03,139875,4.0,1.0,0.0
+2017-02-04,101218,5.0,1.0,0.0
+2017-02-05,104585,6.0,1.0,0.0
+2017-02-06,152808,0.0,1.0,0.0
+2017-02-07,161273,1.0,1.0,0.0
+2017-02-08,162144,2.0,1.0,0.0
+2017-02-09,159440,3.0,1.0,0.0
+2017-02-10,149755,4.0,1.0,0.0
+2017-02-11,100746,5.0,1.0,0.0
+2017-02-12,106434,6.0,1.0,0.0
+2017-02-13,160474,0.0,1.0,0.0
+2017-02-14,159982,1.0,1.0,0.0
+2017-02-15,161897,2.0,1.0,0.0
+2017-02-16,164364,3.0,1.0,0.0
+2017-02-17,153956,4.0,1.0,0.0
+2017-02-18,104661,5.0,1.0,0.0
+2017-02-19,109589,6.0,1.0,0.0
+2017-02-20,158043,0.0,1.0,1.0
+2017-02-21,170265,1.0,1.0,0.0
+2017-02-22,170559,2.0,1.0,0.0
+2017-02-23,163711,3.0,1.0,0.0
+2017-02-24,154537,4.0,1.0,0.0
+2017-02-25,106039,5.0,1.0,0.0
+2017-02-26,111816,6.0,1.0,0.0
+2017-02-27,163119,0.0,1.0,0.0
+2017-02-28,165643,1.0,1.0,0.0
+2017-03-01,167480,2.0,2.0,0.0
+2017-03-02,168730,3.0,2.0,0.0
+2017-03-03,158171,4.0,2.0,0.0
+2017-03-04,106739,5.0,2.0,0.0
+2017-03-05,114464,6.0,2.0,0.0
+2017-03-06,169538,0.0,2.0,0.0
+2017-03-07,173736,1.0,2.0,0.0
+2017-03-08,168734,2.0,2.0,0.0
+2017-03-09,171452,3.0,2.0,0.0
+2017-03-10,159470,4.0,2.0,0.0
+2017-03-11,107371,5.0,2.0,0.0
+2017-03-12,114907,6.0,2.0,0.0
+2017-03-13,170043,0.0,2.0,0.0
+2017-03-14,174748,1.0,2.0,0.0
+2017-03-15,171274,2.0,2.0,0.0
+2017-03-16,172067,3.0,2.0,0.0
+2017-03-17,159312,4.0,2.0,0.0
+2017-03-18,107141,5.0,2.0,0.0
+2017-03-19,116705,6.0,2.0,0.0
+2017-03-20,173053,0.0,2.0,0.0
+2017-03-21,179270,1.0,2.0,0.0
+2017-03-22,178776,2.0,2.0,0.0
+2017-03-23,175353,3.0,2.0,0.0
+2017-03-24,155802,4.0,2.0,0.0
+2017-03-25,107862,5.0,2.0,0.0
+2017-03-26,114867,6.0,2.0,0.0
+2017-03-27,174989,0.0,2.0,0.0
+2017-03-28,177936,1.0,2.0,0.0
+2017-03-29,177053,2.0,2.0,0.0
+2017-03-30,174951,3.0,2.0,0.0
+2017-03-31,161692,4.0,2.0,0.0
+2017-04-01,111982,5.0,3.0,0.0
+2017-04-02,109185,6.0,3.0,0.0
+2017-04-03,159117,0.0,3.0,0.0
+2017-04-04,162855,1.0,3.0,0.0
+2017-04-05,176611,2.0,3.0,0.0
+2017-04-06,174519,3.0,3.0,0.0
+2017-04-07,161085,4.0,3.0,0.0
+2017-04-08,106383,5.0,3.0,0.0
+2017-04-09,112315,6.0,3.0,0.0
+2017-04-10,169584,0.0,3.0,0.0
+2017-04-11,171826,1.0,3.0,0.0
+2017-04-12,168847,2.0,3.0,0.0
+2017-04-13,160786,3.0,3.0,0.0
+2017-04-14,137040,4.0,3.0,0.0
+2017-04-15,100190,5.0,3.0,0.0
+2017-04-16,100898,6.0,3.0,0.0
+2017-04-17,152066,0.0,3.0,0.0
+2017-04-18,174171,1.0,3.0,0.0
+2017-04-19,175620,2.0,3.0,0.0
+2017-04-20,173856,3.0,3.0,0.0
+2017-04-21,160574,4.0,3.0,0.0
+2017-04-22,110084,5.0,3.0,0.0
+2017-04-23,117159,6.0,3.0,0.0
+2017-04-24,174875,0.0,3.0,0.0
+2017-04-25,179750,1.0,3.0,0.0
+2017-04-26,179115,2.0,3.0,0.0
+2017-04-27,172230,3.0,3.0,0.0
+2017-04-28,157630,4.0,3.0,0.0
+2017-04-29,99513,5.0,3.0,0.0
+2017-04-30,100849,6.0,3.0,0.0
+2017-05-01,137413,0.0,4.0,0.0
+2017-05-02,169970,1.0,4.0,0.0
+2017-05-03,173007,2.0,4.0,0.0
+2017-05-04,171814,3.0,4.0,0.0
+2017-05-05,158556,4.0,4.0,0.0
+2017-05-06,104891,5.0,4.0,0.0
+2017-05-07,111184,6.0,4.0,0.0
+2017-05-08,167207,0.0,4.0,0.0
+2017-05-09,174139,1.0,4.0,0.0
+2017-05-10,173376,2.0,4.0,0.0
+2017-05-11,170399,3.0,4.0,0.0
+2017-05-12,159003,4.0,4.0,0.0
+2017-05-13,104441,5.0,4.0,0.0
+2017-05-14,108658,6.0,4.0,0.0
+2017-05-15,169555,0.0,4.0,0.0
+2017-05-16,174468,1.0,4.0,0.0
+2017-05-17,172630,2.0,4.0,0.0
+2017-05-18,168885,3.0,4.0,0.0
+2017-05-19,158328,4.0,4.0,0.0
+2017-05-20,101883,5.0,4.0,0.0
+2017-05-21,108279,6.0,4.0,0.0
+2017-05-22,167274,0.0,4.0,0.0
+2017-05-23,173357,1.0,4.0,0.0
+2017-05-24,170350,2.0,4.0,0.0
+2017-05-25,157737,3.0,4.0,0.0
+2017-05-26,150028,4.0,4.0,0.0
+2017-05-27,103856,5.0,4.0,0.0
+2017-05-28,99612,6.0,4.0,0.0
+2017-05-29,138303,0.0,4.0,1.0
+2017-05-30,159403,1.0,4.0,0.0
+2017-05-31,167107,2.0,4.0,0.0
+2017-06-01,165586,3.0,5.0,0.0
+2017-06-02,154671,4.0,5.0,0.0
+2017-06-03,99082,5.0,5.0,0.0
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/helper.py b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/helper.py
new file mode 100644
index 000000000..5b78e0ba4
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/helper.py
@@ -0,0 +1,183 @@
+import pandas as pd
+from azureml.core import Environment
+from azureml.core.conda_dependencies import CondaDependencies
+from azureml.train.estimator import Estimator
+from azureml.core.run import Run
+from azureml.automl.core.shared import constants
+
+
+def split_fraction_by_grain(df, fraction, time_column_name, grain_column_names=None):
+ if not grain_column_names:
+ df["tmp_grain_column"] = "grain"
+ grain_column_names = ["tmp_grain_column"]
+
+ """Group df by grain and split on last n rows for each group."""
+ df_grouped = df.sort_values(time_column_name).groupby(
+ grain_column_names, group_keys=False
+ )
+
+ df_head = df_grouped.apply(
+ lambda dfg: dfg.iloc[: -int(len(dfg) * fraction)] if fraction > 0 else dfg
+ )
+
+ df_tail = df_grouped.apply(
+ lambda dfg: dfg.iloc[-int(len(dfg) * fraction) :] if fraction > 0 else dfg[:0]
+ )
+
+ if "tmp_grain_column" in grain_column_names:
+ for df2 in (df, df_head, df_tail):
+ df2.drop("tmp_grain_column", axis=1, inplace=True)
+
+ grain_column_names.remove("tmp_grain_column")
+
+ return df_head, df_tail
+
+
+def split_full_for_forecasting(
+ df, time_column_name, grain_column_names=None, test_split=0.2
+):
+ index_name = df.index.name
+
+ # Assumes that there isn't already a column called tmpindex
+
+ df["tmpindex"] = df.index
+
+ train_df, test_df = split_fraction_by_grain(
+ df, test_split, time_column_name, grain_column_names
+ )
+
+ train_df = train_df.set_index("tmpindex")
+ train_df.index.name = index_name
+
+ test_df = test_df.set_index("tmpindex")
+ test_df.index.name = index_name
+
+ df.drop("tmpindex", axis=1, inplace=True)
+
+ return train_df, test_df
+
+
+def get_result_df(remote_run):
+ children = list(remote_run.get_children(recursive=True))
+ summary_df = pd.DataFrame(
+ index=["run_id", "run_algorithm", "primary_metric", "Score"]
+ )
+ goal_minimize = False
+ for run in children:
+ if (
+ run.get_status().lower() == constants.RunState.COMPLETE_RUN
+ and "run_algorithm" in run.properties
+ and "score" in run.properties
+ ):
+ # We only count in the completed child runs.
+ summary_df[run.id] = [
+ run.id,
+ run.properties["run_algorithm"],
+ run.properties["primary_metric"],
+ float(run.properties["score"]),
+ ]
+ if "goal" in run.properties:
+ goal_minimize = run.properties["goal"].split("_")[-1] == "min"
+
+ summary_df = summary_df.T.sort_values(
+ "Score", ascending=goal_minimize
+ ).drop_duplicates(["run_algorithm"])
+ summary_df = summary_df.set_index("run_algorithm")
+ return summary_df
+
+
+def run_inference(
+ test_experiment,
+ compute_target,
+ script_folder,
+ train_run,
+ test_dataset,
+ lookback_dataset,
+ max_horizon,
+ target_column_name,
+ time_column_name,
+ freq,
+):
+ model_base_name = "model.pkl"
+ if "model_data_location" in train_run.properties:
+ model_location = train_run.properties["model_data_location"]
+ _, model_base_name = model_location.rsplit("/", 1)
+ train_run.download_file(
+ "outputs/{}".format(model_base_name), "inference/{}".format(model_base_name)
+ )
+ train_run.download_file("outputs/conda_env_v_1_0_0.yml", "inference/condafile.yml")
+
+ inference_env = Environment("myenv")
+ inference_env.docker.enabled = True
+ inference_env.python.conda_dependencies = CondaDependencies(
+ conda_dependencies_file_path="inference/condafile.yml"
+ )
+
+ est = Estimator(
+ source_directory=script_folder,
+ entry_script="infer.py",
+ script_params={
+ "--max_horizon": max_horizon,
+ "--target_column_name": target_column_name,
+ "--time_column_name": time_column_name,
+ "--frequency": freq,
+ "--model_path": model_base_name,
+ },
+ inputs=[
+ test_dataset.as_named_input("test_data"),
+ lookback_dataset.as_named_input("lookback_data"),
+ ],
+ compute_target=compute_target,
+ environment_definition=inference_env,
+ )
+
+ run = test_experiment.submit(
+ est,
+ tags={
+ "training_run_id": train_run.id,
+ "run_algorithm": train_run.properties["run_algorithm"],
+ "valid_score": train_run.properties["score"],
+ "primary_metric": train_run.properties["primary_metric"],
+ },
+ )
+
+ run.log("run_algorithm", run.tags["run_algorithm"])
+ return run
+
+
+def run_multiple_inferences(
+ summary_df,
+ train_experiment,
+ test_experiment,
+ compute_target,
+ script_folder,
+ test_dataset,
+ lookback_dataset,
+ max_horizon,
+ target_column_name,
+ time_column_name,
+ freq,
+):
+ for run_name, run_summary in summary_df.iterrows():
+ print(run_name)
+ print(run_summary)
+ run_id = run_summary.run_id
+ train_run = Run(train_experiment, run_id)
+
+ test_run = run_inference(
+ test_experiment,
+ compute_target,
+ script_folder,
+ train_run,
+ test_dataset,
+ lookback_dataset,
+ max_horizon,
+ target_column_name,
+ time_column_name,
+ freq,
+ )
+
+ print(test_run)
+ summary_df.loc[summary_df.run_id == run_id, "test_run_id"] = test_run.id
+
+ return summary_df
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/infer.py b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/infer.py
new file mode 100644
index 000000000..7b2f1eee4
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-github-dau/infer.py
@@ -0,0 +1,386 @@
+import argparse
+import os
+
+import numpy as np
+import pandas as pd
+
+from pandas.tseries.frequencies import to_offset
+from sklearn.externals import joblib
+from sklearn.metrics import mean_absolute_error, mean_squared_error
+
+from azureml.automl.runtime.shared.score import scoring, constants
+from azureml.core import Run
+
+try:
+ import torch
+
+ _torch_present = True
+except ImportError:
+ _torch_present = False
+
+
+def align_outputs(
+ y_predicted,
+ X_trans,
+ X_test,
+ y_test,
+ predicted_column_name="predicted",
+ horizon_colname="horizon_origin",
+):
+ """
+ Demonstrates how to get the output aligned to the inputs
+ using pandas indexes. Helps understand what happened if
+ the output's shape differs from the input shape, or if
+ the data got re-sorted by time and grain during forecasting.
+
+ Typical causes of misalignment are:
+ * we predicted some periods that were missing in actuals -> drop from eval
+ * model was asked to predict past max_horizon -> increase max horizon
+ * data at start of X_test was needed for lags -> provide previous periods
+ """
+ if horizon_colname in X_trans:
+ df_fcst = pd.DataFrame(
+ {
+ predicted_column_name: y_predicted,
+ horizon_colname: X_trans[horizon_colname],
+ }
+ )
+ else:
+ df_fcst = pd.DataFrame({predicted_column_name: y_predicted})
+
+ # y and X outputs are aligned by forecast() function contract
+ df_fcst.index = X_trans.index
+
+ # align original X_test to y_test
+ X_test_full = X_test.copy()
+ X_test_full[target_column_name] = y_test
+
+ # X_test_full's index does not include origin, so reset for merge
+ df_fcst.reset_index(inplace=True)
+ X_test_full = X_test_full.reset_index().drop(columns="index")
+ together = df_fcst.merge(X_test_full, how="right")
+
+ # drop rows where prediction or actuals are nan
+ # happens because of missing actuals
+ # or at edges of time due to lags/rolling windows
+ clean = together[
+ together[[target_column_name, predicted_column_name]].notnull().all(axis=1)
+ ]
+ return clean
+
+
+def do_rolling_forecast_with_lookback(
+ fitted_model, X_test, y_test, max_horizon, X_lookback, y_lookback, freq="D"
+):
+ """
+ Produce forecasts on a rolling origin over the given test set.
+
+ Each iteration makes a forecast for the next 'max_horizon' periods
+ with respect to the current origin, then advances the origin by the
+ horizon time duration. The prediction context for each forecast is set so
+ that the forecaster uses the actual target values prior to the current
+ origin time for constructing lag features.
+
+ This function returns a concatenated DataFrame of rolling forecasts.
+ """
+ print("Using lookback of size: ", y_lookback.size)
+ df_list = []
+ origin_time = X_test[time_column_name].min()
+ X = X_lookback.append(X_test)
+ y = np.concatenate((y_lookback, y_test), axis=0)
+ while origin_time <= X_test[time_column_name].max():
+ # Set the horizon time - end date of the forecast
+ horizon_time = origin_time + max_horizon * to_offset(freq)
+
+ # Extract test data from an expanding window up-to the horizon
+ expand_wind = X[time_column_name] < horizon_time
+ X_test_expand = X[expand_wind]
+ y_query_expand = np.zeros(len(X_test_expand)).astype(np.float)
+ y_query_expand.fill(np.NaN)
+
+ if origin_time != X[time_column_name].min():
+ # Set the context by including actuals up-to the origin time
+ test_context_expand_wind = X[time_column_name] < origin_time
+ context_expand_wind = X_test_expand[time_column_name] < origin_time
+ y_query_expand[context_expand_wind] = y[test_context_expand_wind]
+
+ # Print some debug info
+ print(
+ "Horizon_time:",
+ horizon_time,
+ " origin_time: ",
+ origin_time,
+ " max_horizon: ",
+ max_horizon,
+ " freq: ",
+ freq,
+ )
+ print("expand_wind: ", expand_wind)
+ print("y_query_expand")
+ print(y_query_expand)
+ print("X_test")
+ print(X)
+ print("X_test_expand")
+ print(X_test_expand)
+ print("Type of X_test_expand: ", type(X_test_expand))
+ print("Type of y_query_expand: ", type(y_query_expand))
+
+ print("y_query_expand")
+ print(y_query_expand)
+
+ # Make a forecast out to the maximum horizon
+ # y_fcst, X_trans = y_query_expand, X_test_expand
+ y_fcst, X_trans = fitted_model.forecast(X_test_expand, y_query_expand)
+
+ print("y_fcst")
+ print(y_fcst)
+
+ # Align forecast with test set for dates within
+ # the current rolling window
+ trans_tindex = X_trans.index.get_level_values(time_column_name)
+ trans_roll_wind = (trans_tindex >= origin_time) & (trans_tindex < horizon_time)
+ test_roll_wind = expand_wind & (X[time_column_name] >= origin_time)
+ df_list.append(
+ align_outputs(
+ y_fcst[trans_roll_wind],
+ X_trans[trans_roll_wind],
+ X[test_roll_wind],
+ y[test_roll_wind],
+ )
+ )
+
+ # Advance the origin time
+ origin_time = horizon_time
+
+ return pd.concat(df_list, ignore_index=True)
+
+
+def do_rolling_forecast(fitted_model, X_test, y_test, max_horizon, freq="D"):
+ """
+ Produce forecasts on a rolling origin over the given test set.
+
+ Each iteration makes a forecast for the next 'max_horizon' periods
+ with respect to the current origin, then advances the origin by the
+ horizon time duration. The prediction context for each forecast is set so
+ that the forecaster uses the actual target values prior to the current
+ origin time for constructing lag features.
+
+ This function returns a concatenated DataFrame of rolling forecasts.
+ """
+ df_list = []
+ origin_time = X_test[time_column_name].min()
+ while origin_time <= X_test[time_column_name].max():
+ # Set the horizon time - end date of the forecast
+ horizon_time = origin_time + max_horizon * to_offset(freq)
+
+ # Extract test data from an expanding window up-to the horizon
+ expand_wind = X_test[time_column_name] < horizon_time
+ X_test_expand = X_test[expand_wind]
+ y_query_expand = np.zeros(len(X_test_expand)).astype(np.float)
+ y_query_expand.fill(np.NaN)
+
+ if origin_time != X_test[time_column_name].min():
+ # Set the context by including actuals up-to the origin time
+ test_context_expand_wind = X_test[time_column_name] < origin_time
+ context_expand_wind = X_test_expand[time_column_name] < origin_time
+ y_query_expand[context_expand_wind] = y_test[test_context_expand_wind]
+
+ # Print some debug info
+ print(
+ "Horizon_time:",
+ horizon_time,
+ " origin_time: ",
+ origin_time,
+ " max_horizon: ",
+ max_horizon,
+ " freq: ",
+ freq,
+ )
+ print("expand_wind: ", expand_wind)
+ print("y_query_expand")
+ print(y_query_expand)
+ print("X_test")
+ print(X_test)
+ print("X_test_expand")
+ print(X_test_expand)
+ print("Type of X_test_expand: ", type(X_test_expand))
+ print("Type of y_query_expand: ", type(y_query_expand))
+ print("y_query_expand")
+ print(y_query_expand)
+
+ # Make a forecast out to the maximum horizon
+ y_fcst, X_trans = fitted_model.forecast(X_test_expand, y_query_expand)
+
+ print("y_fcst")
+ print(y_fcst)
+
+ # Align forecast with test set for dates within the
+ # current rolling window
+ trans_tindex = X_trans.index.get_level_values(time_column_name)
+ trans_roll_wind = (trans_tindex >= origin_time) & (trans_tindex < horizon_time)
+ test_roll_wind = expand_wind & (X_test[time_column_name] >= origin_time)
+ df_list.append(
+ align_outputs(
+ y_fcst[trans_roll_wind],
+ X_trans[trans_roll_wind],
+ X_test[test_roll_wind],
+ y_test[test_roll_wind],
+ )
+ )
+
+ # Advance the origin time
+ origin_time = horizon_time
+
+ return pd.concat(df_list, ignore_index=True)
+
+
+def APE(actual, pred):
+ """
+ Calculate absolute percentage error.
+ Returns a vector of APE values with same length as actual/pred.
+ """
+ return 100 * np.abs((actual - pred) / actual)
+
+
+def MAPE(actual, pred):
+ """
+ Calculate mean absolute percentage error.
+ Remove NA and values where actual is close to zero
+ """
+ not_na = ~(np.isnan(actual) | np.isnan(pred))
+ not_zero = ~np.isclose(actual, 0.0)
+ actual_safe = actual[not_na & not_zero]
+ pred_safe = pred[not_na & not_zero]
+ return np.mean(APE(actual_safe, pred_safe))
+
+
+def map_location_cuda(storage, loc):
+ return storage.cuda()
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+ "--max_horizon",
+ type=int,
+ dest="max_horizon",
+ default=10,
+ help="Max Horizon for forecasting",
+)
+parser.add_argument(
+ "--target_column_name",
+ type=str,
+ dest="target_column_name",
+ help="Target Column Name",
+)
+parser.add_argument(
+ "--time_column_name", type=str, dest="time_column_name", help="Time Column Name"
+)
+parser.add_argument(
+ "--frequency", type=str, dest="freq", help="Frequency of prediction"
+)
+parser.add_argument(
+ "--model_path",
+ type=str,
+ dest="model_path",
+ default="model.pkl",
+ help="Filename of model to be loaded",
+)
+
+args = parser.parse_args()
+max_horizon = args.max_horizon
+target_column_name = args.target_column_name
+time_column_name = args.time_column_name
+freq = args.freq
+model_path = args.model_path
+
+print("args passed are: ")
+print(max_horizon)
+print(target_column_name)
+print(time_column_name)
+print(freq)
+print(model_path)
+
+run = Run.get_context()
+# get input dataset by name
+test_dataset = run.input_datasets["test_data"]
+lookback_dataset = run.input_datasets["lookback_data"]
+
+grain_column_names = []
+
+df = test_dataset.to_pandas_dataframe()
+
+print("Read df")
+print(df)
+
+X_test_df = test_dataset.drop_columns(columns=[target_column_name])
+y_test_df = test_dataset.with_timestamp_columns(None).keep_columns(
+ columns=[target_column_name]
+)
+
+X_lookback_df = lookback_dataset.drop_columns(columns=[target_column_name])
+y_lookback_df = lookback_dataset.with_timestamp_columns(None).keep_columns(
+ columns=[target_column_name]
+)
+
+_, ext = os.path.splitext(model_path)
+if ext == ".pt":
+ # Load the fc-tcn torch model.
+ assert _torch_present
+ if torch.cuda.is_available():
+ map_location = map_location_cuda
+ else:
+ map_location = "cpu"
+ with open(model_path, "rb") as fh:
+ fitted_model = torch.load(fh, map_location=map_location)
+else:
+ # Load the sklearn pipeline.
+ fitted_model = joblib.load(model_path)
+
+if hasattr(fitted_model, "get_lookback"):
+ lookback = fitted_model.get_lookback()
+ df_all = do_rolling_forecast_with_lookback(
+ fitted_model,
+ X_test_df.to_pandas_dataframe(),
+ y_test_df.to_pandas_dataframe().values.T[0],
+ max_horizon,
+ X_lookback_df.to_pandas_dataframe()[-lookback:],
+ y_lookback_df.to_pandas_dataframe().values.T[0][-lookback:],
+ freq,
+ )
+else:
+ df_all = do_rolling_forecast(
+ fitted_model,
+ X_test_df.to_pandas_dataframe(),
+ y_test_df.to_pandas_dataframe().values.T[0],
+ max_horizon,
+ freq,
+ )
+
+print(df_all)
+
+print("target values:::")
+print(df_all[target_column_name])
+print("predicted values:::")
+print(df_all["predicted"])
+
+# Use the AutoML scoring module
+regression_metrics = list(constants.REGRESSION_SCALAR_SET)
+y_test = np.array(df_all[target_column_name])
+y_pred = np.array(df_all["predicted"])
+scores = scoring.score_regression(y_test, y_pred, regression_metrics)
+
+print("scores:")
+print(scores)
+
+for key, value in scores.items():
+ run.log(key, value)
+
+print("Simple forecasting model")
+rmse = np.sqrt(mean_squared_error(df_all[target_column_name], df_all["predicted"]))
+print("[Test Data] \nRoot Mean squared error: %.2f" % rmse)
+mae = mean_absolute_error(df_all[target_column_name], df_all["predicted"])
+print("mean_absolute_error score: %.2f" % mae)
+print("MAPE: %.2f" % MAPE(df_all[target_column_name], df_all["predicted"]))
+
+run.log("rmse", rmse)
+run.log("mae", mae)
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/README.md b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/README.md
new file mode 100644
index 000000000..735e348d4
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/README.md
@@ -0,0 +1,94 @@
+---
+page_type: sample
+languages:
+- python
+products:
+- azure-machine-learning
+description: Tutorial showing how to solve a complex machine learning time series forecasting problems at scale by using Azure Automated ML and Hierarchical time series accelerator.
+---
+
+## Microsoft Solution Accelerator: Hierachical Time Series Forecasting
+
+In most applications, customers have a need to understand their forecasts at a macro and micro level of the business. Whether that be predicting sales of products at different geographic locations, or understanding the expected workforce demand for different organizations at a company, the ability to train a machine learning model to intelligently forecast on hierarchy data is essential.
+
+This business pattern is common across a wide variety of industries and applicable to many real world use cases. Below are some examples of where the hierarchical time series pattern is useful.
+
+| Industry | Scenario |
+|----------------|--------------------------------------------|
+| *Restaurant Chain* | Building demand forecasting models across thousands of restaurants and several countries. |
+| *Retail Organization* | Building workforce optimization models for thousands of stores. |
+| *Retail Organization*| Price optimization models for hundreds of thousands of products available. |
+
+
+### Technical Summary
+
+A hierarchical time series is a structure in which each of the unique series are arranged into a hierarchy based on dimensions such as geography, or product type. The table below shows an example of data whose unique attributes form a hierarchy. Our hierarchy is defined by the `product type` such as headphones or tablets, the `product category` which splits product types into accessories and devices, and the `region` the products are sold in. The table below demonstrates the first input of each unique series in the hierarchy.
+
+![data-table](./media/data-table.png)
+
+To further visualize this, the leaf levels of the hierarchy contain all the time series with unique combinations of attribute values. Each higher level in the hierarchy will consider one less dimension for defining the time series and will aggregate each set of `child nodes` from the lower level into a `parent node`.
+
+![hierachy-sample](./media/hierarchy-sample-ms.PNG)
+
+> **Note:** If no unique root level exists in the data, Automated Machine Learning will create a node `automl_top_level` for users to train or forecasts totals.
+
+## Prerequisites
+
+To use this solution accelerator, all you need is access to an [Azure subscription](https://azure.microsoft.com/free/) and an [Azure Machine Learning Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace) that you'll create below.
+
+A basic understanding of Azure Machine Learning and hierarchical time series concepts will be helpful for understanding the solution. The following resources can help introduce you to these concepts:
+
+1. [Azure Machine Learning Overview](https://azure.microsoft.com/services/machine-learning/)
+2. [Azure Machine Learning Tutorials](https://docs.microsoft.com/azure/machine-learning/tutorial-1st-experiment-sdk-setup)
+3. [Azure Machine Learning Sample Notebooks on Github](https://github.com/Azure/azureml-examples/)
+4. [Forecasting: Principles and Practice, Hierarchical time series](https://otexts.com/fpp2/hts.html)
+
+## Getting started
+
+### 1. Set up the Compute Instance
+Please create a [Compute Instance](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-instance#create) and clone the git repo to your workspace.
+
+### 2. Run the Notebook
+
+Once your environment is set up, go to JupyterLab and run the notebook auto-ml-hierarchical-timeseries.ipynb on Compute Instance you created. It would run through the steps outlined sequentially. By the end, you'll know how to train, score, and make predictions using the hierarchical time series model pattern on Azure Machine Learning.
+
+| Notebook | Description |
+|----------------|--------------------------------------------|
+| `auto-ml-forecasting-hierarchical-timeseries.ipynb`|Creates a pipeline to train machine learning models for the defined hierarchy and forecast at the desired hierarchy level using Automated ML. |
+
+
+![Work Flow](./media/workflow.PNG)
+
+## Key Concepts
+
+### Automated Machine Learning
+
+[Automated Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-automated-ml) also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
+
+### Pipelines
+
+[Pipelines](https://docs.microsoft.com/azure/machine-learning/concept-ml-pipelines) allow you to create workflows in your machine learning projects. These workflows have a number of benefits including speed, simplicity, repeatability, and modularity.
+
+### ParallelRunStep
+
+[ParallelRunStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.parallel_run_step.parallelrunstep?view=azure-ml-py) enables the parallel training of models and is commonly used for batch inferencing. This [document](https://docs.microsoft.com/azure/machine-learning/how-to-use-parallel-run-step) walks through some of the key concepts around ParallelRunStep.
+
+### Other Concepts
+
+In additional to ParallelRunStep, Pipelines and Automated Machine Learning, you'll also be working with the following concepts including [workspace](https://docs.microsoft.com/azure/machine-learning/concept-workspace), [datasets](https://docs.microsoft.com/azure/machine-learning/concept-data#datasets), [compute targets](https://docs.microsoft.com/azure/machine-learning/concept-compute-target#train), [python script steps](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py), and [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/).
+
+## Contributing
+
+This project welcomes contributions and suggestions. To learn more visit the [contributing](CONTRIBUTING.md) section.
+
+Most contributions require you to agree to a Contributor License Agreement (CLA)
+declaring that you have the right to, and actually do, grant us
+the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+
+When you submit a pull request, a CLA bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
+provided by the bot. You will only need to do this once across all repos using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
+contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb
index e2ab133f9..ebbf8b04b 100644
--- a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.ipynb
@@ -1,639 +1,639 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Copyright (c) Microsoft Corporation. All rights reserved.\n",
- "\n",
- "Licensed under the MIT License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Hierarchical Time Series - Automated ML\n",
- "**_Generate hierarchical time series forecasts with Automated Machine Learning_**\n",
- "\n",
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For this notebook we are using a synthetic dataset portraying sales data to predict the the quantity of a vartiety of product skus across several states, stores, and product categories.\n",
- "\n",
- "**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Prerequisites\n",
- "You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 1.0 Set up workspace, datastore, experiment"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613003526897
- }
- },
- "outputs": [],
- "source": [
- "import azureml.core\n",
- "from azureml.core import Workspace, Datastore\n",
- "import pandas as pd\n",
- "\n",
- "# Set up your workspace\n",
- "ws = Workspace.from_config()\n",
- "ws.get_details()\n",
- "\n",
- "# Set up your datastores\n",
- "dstore = ws.get_default_datastore()\n",
- "\n",
- "output = {}\n",
- "output[\"SDK version\"] = azureml.core.VERSION\n",
- "output[\"Subscription ID\"] = ws.subscription_id\n",
- "output[\"Workspace\"] = ws.name\n",
- "output[\"Resource Group\"] = ws.resource_group\n",
- "output[\"Location\"] = ws.location\n",
- "output[\"Default datastore name\"] = dstore.name\n",
- "pd.set_option(\"display.max_colwidth\", -1)\n",
- "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
- "outputDf.T"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Choose an experiment"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613003540729
- }
- },
- "outputs": [],
- "source": [
- "from azureml.core import Experiment\n",
- "\n",
- "experiment = Experiment(ws, \"automl-hts\")\n",
- "\n",
- "print(\"Experiment name: \" + experiment.name)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 2.0 Data\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "nteract": {
- "transient": {
- "deleting": false
- }
- }
- },
- "source": [
- "### Upload local csv files to datastore\n",
- "You can upload your train and inference csv files to the default datastore in your workspace. \n",
- "\n",
- "A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
- "Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) documentation on how to access data from Datastore."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "datastore_path = \"hts-sample\""
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "datastore = ws.get_default_datastore()\n",
- "datastore"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Create the TabularDatasets \n",
- "\n",
- "Datasets in Azure Machine Learning are references to specific data in a Datastore. The data can be retrieved as a [TabularDatasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py). We will read in the data as a pandas DataFrame, upload to the data store and register them to your Workspace using ```register_pandas_dataframe``` so they can be called as an input into the training pipeline. We will use the inference dataset as part of the forecasting pipeline. The step need only be completed once."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007017296
- }
- },
- "outputs": [],
- "source": [
- "from azureml.data.dataset_factory import TabularDatasetFactory\n",
- "\n",
- "registered_train = TabularDatasetFactory.register_pandas_dataframe(\n",
- " pd.read_csv(\"Data/hts-sample-train.csv\"),\n",
- " target=(datastore, \"hts-sample\"),\n",
- " name=\"hts-sales-train\",\n",
- ")\n",
- "registered_inference = TabularDatasetFactory.register_pandas_dataframe(\n",
- " pd.read_csv(\"Data/hts-sample-test.csv\"),\n",
- " target=(datastore, \"hts-sample\"),\n",
- " name=\"hts-sales-test\",\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3.0 Build the training pipeline\n",
- "Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
- "\n",
- "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Choose a compute target\n",
- "\n",
- "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
- "\n",
- "\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
- "\n",
- "If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007037308
- }
- },
- "outputs": [],
- "source": [
- "from azureml.core.compute import ComputeTarget, AmlCompute\n",
- "\n",
- "# Name your cluster\n",
- "compute_name = \"hts-compute\"\n",
- "\n",
- "\n",
- "if compute_name in ws.compute_targets:\n",
- " compute_target = ws.compute_targets[compute_name]\n",
- " if compute_target and type(compute_target) is AmlCompute:\n",
- " print(\"Found compute target: \" + compute_name)\n",
- "else:\n",
- " print(\"Creating a new compute target...\")\n",
- " provisioning_config = AmlCompute.provisioning_configuration(\n",
- " vm_size=\"STANDARD_D16S_V3\", max_nodes=20\n",
- " )\n",
- " # Create the compute target\n",
- " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
- "\n",
- " # Can poll for a minimum number of nodes and for a specific timeout.\n",
- " # If no min node count is provided it will use the scale settings for the cluster\n",
- " compute_target.wait_for_completion(\n",
- " show_output=True, min_node_count=None, timeout_in_minutes=20\n",
- " )\n",
- "\n",
- " # For a more detailed view of current cluster status, use the 'status' property\n",
- " print(compute_target.status.serialize())"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Set up training parameters\n",
- "\n",
- "This dictionary defines the AutoML and hierarchy settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, the hierarchy definition, and the level of the hierarchy at which to train.\n",
- "\n",
- "| Property | Description|\n",
- "| :--------------- | :------------------- |\n",
- "| **task** | forecasting |\n",
- "| **primary_metric** | This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error |\n",
- "| **blocked_models** | Blocked models won't be used by AutoML. |\n",
- "| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **label_column_name** | The name of the label column. |\n",
- "| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
- "| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
- "| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
- "| **time_column_name** | The name of your time column. |\n",
- "| **hierarchy_column_names** | The names of columns that define the hierarchical structure of the data from highest level to most granular. |\n",
- "| **training_level** | The level of the hierarchy to be used for training models. |\n",
- "| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
- "| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
- "| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
- "| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
- "| **model_explainability** | Flag to disable explaining the best automated ML model at the end of all training iterations. The default is True and will block non-explainable models which may impact the forecast accuracy. For more information, see [Interpretability: model explanations in automated machine learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl). |"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007061544
- }
- },
- "outputs": [],
- "source": [
- "from azureml.train.automl.runtime._hts.hts_parameters import HTSTrainParameters\n",
- "\n",
- "model_explainability = True\n",
- "\n",
- "engineered_explanations = False\n",
- "# Define your hierarchy. Adjust the settings below based on your dataset.\n",
- "hierarchy = [\"state\", \"store_id\", \"product_category\", \"SKU\"]\n",
- "training_level = \"SKU\"\n",
- "\n",
- "# Set your forecast parameters. Adjust the settings below based on your dataset.\n",
- "time_column_name = \"date\"\n",
- "label_column_name = \"quantity\"\n",
- "forecast_horizon = 7\n",
- "\n",
- "\n",
- "automl_settings = {\n",
- " \"task\": \"forecasting\",\n",
- " \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
- " \"label_column_name\": label_column_name,\n",
- " \"time_column_name\": time_column_name,\n",
- " \"forecast_horizon\": forecast_horizon,\n",
- " \"hierarchy_column_names\": hierarchy,\n",
- " \"hierarchy_training_level\": training_level,\n",
- " \"track_child_runs\": False,\n",
- " \"pipeline_fetch_max_batch_size\": 15,\n",
- " \"model_explainability\": model_explainability,\n",
- " # The following settings are specific to this sample and should be adjusted according to your own needs.\n",
- " \"iteration_timeout_minutes\": 10,\n",
- " \"iterations\": 10,\n",
- " \"n_cross_validations\": 2,\n",
- "}\n",
- "\n",
- "hts_parameters = HTSTrainParameters(\n",
- " automl_settings=automl_settings,\n",
- " hierarchy_column_names=hierarchy,\n",
- " training_level=training_level,\n",
- " enable_engineered_explanations=engineered_explanations,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Set up hierarchy training pipeline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Parallel run step is leveraged to train the hierarchy. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The `process_count_per_node` is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
- "\n",
- "* **experiment:** The experiment used for training.\n",
- "* **train_data:** The tabular dataset to be used as input to the training run.\n",
- "* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long.\n",
- "* **process_count_per_node:** Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance.\n",
- "* **train_pipeline_parameters:** The set of configuration parameters defined in the previous section. \n",
- "\n",
- "Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
- "\n",
- "\n",
- "training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
- " experiment=experiment,\n",
- " train_data=registered_train,\n",
- " compute_target=compute_target,\n",
- " node_count=2,\n",
- " process_count_per_node=8,\n",
- " train_pipeline_parameters=hts_parameters,\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.pipeline.core import Pipeline\n",
- "\n",
- "training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Submit the pipeline to run\n",
- "Next we submit our pipeline to run. The whole training pipeline takes about 1h 11m using a Standard_D12_V2 VM with our current ParallelRunConfig setting."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "training_run = experiment.submit(training_pipeline)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "training_run.wait_for_completion(show_output=False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### [Optional] Get the explanations\n",
- "First we need to download the explanations to the local disk."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "if model_explainability:\n",
- " expl_output = training_run.get_pipeline_output(\"explanations\")\n",
- " expl_output.download(\"training_explanations\")\n",
- "else:\n",
- " print(\n",
- " \"Model explanations are available only if model_explainability is set to True.\"\n",
- " )"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The explanations are downloaded to the \"training_explanations/azureml\" directory."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import os\n",
- "\n",
- "if model_explainability:\n",
- " explanations_dirrectory = os.listdir(\n",
- " os.path.join(\"training_explanations\", \"azureml\")\n",
- " )\n",
- " if len(explanations_dirrectory) > 1:\n",
- " print(\n",
- " \"Warning! The directory contains multiple explanations, only the first one will be displayed.\"\n",
- " )\n",
- " print(\"The explanations are located at {}.\".format(explanations_dirrectory[0]))\n",
- " # Now we will list all the explanations.\n",
- " explanation_path = os.path.join(\n",
- " \"training_explanations\",\n",
- " \"azureml\",\n",
- " explanations_dirrectory[0],\n",
- " \"training_explanations\",\n",
- " )\n",
- " print(\"Available explanations\")\n",
- " print(\"==============================\")\n",
- " print(\"\\n\".join(os.listdir(explanation_path)))\n",
- "else:\n",
- " print(\n",
- " \"Model explanations are available only if model_explainability is set to True.\"\n",
- " )"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "View the explanations on \"state\" level."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from IPython.display import display\n",
- "\n",
- "explanation_type = \"raw\"\n",
- "level = \"state\"\n",
- "\n",
- "if model_explainability:\n",
- " display(\n",
- " pd.read_csv(\n",
- " os.path.join(explanation_path, \"{}_explanations_{}.csv\").format(\n",
- " explanation_type, level\n",
- " )\n",
- " )\n",
- " )"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 5.0 Forecasting\n",
- "For hierarchical forecasting we need to provide the HTSInferenceParameters object.\n",
- "#### HTSInferenceParameters arguments\n",
- "* **hierarchy_forecast_level:** The default level of the hierarchy to produce prediction/forecast on.\n",
- "* **allocation_method:** \\[Optional] The disaggregation method to use if the hierarchy forecast level specified is below the define hierarchy training level.
(average historical proportions) 'average_historical_proportions'
(proportions of the historical averages) 'proportions_of_historical_average'\n",
- "\n",
- "#### get_many_models_batch_inference_steps arguments\n",
- "* **experiment:** The experiment used for inference run.\n",
- "* **inference_data:** The data to use for inferencing. It should be the same schema as used for training.\n",
- "* **compute_target:** The compute target that runs the inference pipeline.\n",
- "* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku).\n",
- "* **process_count_per_node:** The number of processes per node.\n",
- "* **train_run_id:** \\[Optional] The run id of the hierarchy training, by default it is the latest successful training hts run in the experiment.\n",
- "* **train_experiment_name:** \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline.\n",
- "* **process_count_per_node:** \\[Optional] The number of processes per node, by default it's 4."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.train.automl.runtime._hts.hts_parameters import HTSInferenceParameters\n",
- "\n",
- "inference_parameters = HTSInferenceParameters(\n",
- " hierarchy_forecast_level=\"store_id\", # The setting is specific to this dataset and should be changed based on your dataset.\n",
- " allocation_method=\"proportions_of_historical_average\",\n",
- ")\n",
- "\n",
- "steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
- " experiment=experiment,\n",
- " inference_data=registered_inference,\n",
- " compute_target=compute_target,\n",
- " inference_pipeline_parameters=inference_parameters,\n",
- " node_count=2,\n",
- " process_count_per_node=8,\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.pipeline.core import Pipeline\n",
- "\n",
- "inference_pipeline = Pipeline(ws, steps=steps)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "inference_run = experiment.submit(inference_pipeline)\n",
- "inference_run.wait_for_completion(show_output=False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Retrieve results\n",
- "\n",
- "Forecast results can be retrieved through the following code. The prediction results summary and the actual predictions are downloaded the \"forecast_results\" folder"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "forecasts = inference_run.get_pipeline_output(\"forecasts\")\n",
- "forecasts.download(\"forecast_results\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Resbumit the Pipeline\n",
- "\n",
- "The inference pipeline can be submitted with different configurations."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "inference_run = experiment.submit(\n",
- " inference_pipeline, pipeline_parameters={\"hierarchy_forecast_level\": \"state\"}\n",
- ")\n",
- "inference_run.wait_for_completion(show_output=False)"
- ]
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+ "\n",
+ "Licensed under the MIT License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Hierarchical Time Series - Automated ML\n",
+ "**_Generate hierarchical time series forecasts with Automated Machine Learning_**\n",
+ "\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For this notebook we are using a synthetic dataset portraying sales data to predict the the quantity of a vartiety of product skus across several states, stores, and product categories.\n",
+ "\n",
+ "**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Prerequisites\n",
+ "You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1.0 Set up workspace, datastore, experiment"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613003526897
}
- ],
- "metadata": {
- "authors": [
- {
- "name": "jialiu"
- }
- ],
- "categories": [
- "how-to-use-azureml",
- "automated-machine-learning"
- ],
- "kernelspec": {
- "display_name": "Python 3.6",
- "language": "python",
- "name": "python36"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
+ },
+ "outputs": [],
+ "source": [
+ "import azureml.core\n",
+ "from azureml.core import Workspace, Datastore\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Set up your workspace\n",
+ "ws = Workspace.from_config()\n",
+ "ws.get_details()\n",
+ "\n",
+ "# Set up your datastores\n",
+ "dstore = ws.get_default_datastore()\n",
+ "\n",
+ "output = {}\n",
+ "output[\"SDK version\"] = azureml.core.VERSION\n",
+ "output[\"Subscription ID\"] = ws.subscription_id\n",
+ "output[\"Workspace\"] = ws.name\n",
+ "output[\"Resource Group\"] = ws.resource_group\n",
+ "output[\"Location\"] = ws.location\n",
+ "output[\"Default datastore name\"] = dstore.name\n",
+ "pd.set_option(\"display.max_colwidth\", -1)\n",
+ "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
+ "outputDf.T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Choose an experiment"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613003540729
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core import Experiment\n",
+ "\n",
+ "experiment = Experiment(ws, \"automl-hts\")\n",
+ "\n",
+ "print(\"Experiment name: \" + experiment.name)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.0 Data\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "### Upload local csv files to datastore\n",
+ "You can upload your train and inference csv files to the default datastore in your workspace. \n",
+ "\n",
+ "A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
+ "Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py) documentation on how to access data from Datastore."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datastore_path = \"hts-sample\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "datastore = ws.get_default_datastore()\n",
+ "datastore"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Create the TabularDatasets \n",
+ "\n",
+ "Datasets in Azure Machine Learning are references to specific data in a Datastore. The data can be retrieved as a [TabularDatasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py). We will read in the data as a pandas DataFrame, upload to the data store and register them to your Workspace using ```register_pandas_dataframe``` so they can be called as an input into the training pipeline. We will use the inference dataset as part of the forecasting pipeline. The step need only be completed once."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007017296
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.data.dataset_factory import TabularDatasetFactory\n",
+ "\n",
+ "registered_train = TabularDatasetFactory.register_pandas_dataframe(\n",
+ " pd.read_csv(\"Data/hts-sample-train.csv\"),\n",
+ " target=(datastore, \"hts-sample\"),\n",
+ " name=\"hts-sales-train\",\n",
+ ")\n",
+ "registered_inference = TabularDatasetFactory.register_pandas_dataframe(\n",
+ " pd.read_csv(\"Data/hts-sample-test.csv\"),\n",
+ " target=(datastore, \"hts-sample\"),\n",
+ " name=\"hts-sales-test\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3.0 Build the training pipeline\n",
+ "Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
+ "\n",
+ "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Choose a compute target\n",
+ "\n",
+ "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
+ "\n",
+ "\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
+ "\n",
+ "If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007037308
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+ "\n",
+ "# Name your cluster\n",
+ "compute_name = \"hts-compute\"\n",
+ "\n",
+ "\n",
+ "if compute_name in ws.compute_targets:\n",
+ " compute_target = ws.compute_targets[compute_name]\n",
+ " if compute_target and type(compute_target) is AmlCompute:\n",
+ " print(\"Found compute target: \" + compute_name)\n",
+ "else:\n",
+ " print(\"Creating a new compute target...\")\n",
+ " provisioning_config = AmlCompute.provisioning_configuration(\n",
+ " vm_size=\"STANDARD_D16S_V3\", max_nodes=20\n",
+ " )\n",
+ " # Create the compute target\n",
+ " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
+ "\n",
+ " # Can poll for a minimum number of nodes and for a specific timeout.\n",
+ " # If no min node count is provided it will use the scale settings for the cluster\n",
+ " compute_target.wait_for_completion(\n",
+ " show_output=True, min_node_count=None, timeout_in_minutes=20\n",
+ " )\n",
+ "\n",
+ " # For a more detailed view of current cluster status, use the 'status' property\n",
+ " print(compute_target.status.serialize())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up training parameters\n",
+ "\n",
+ "This dictionary defines the AutoML and hierarchy settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, the hierarchy definition, and the level of the hierarchy at which to train.\n",
+ "\n",
+ "| Property | Description|\n",
+ "| :--------------- | :------------------- |\n",
+ "| **task** | forecasting |\n",
+ "| **primary_metric** | This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error |\n",
+ "| **blocked_models** | Blocked models won't be used by AutoML. |\n",
+ "| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **label_column_name** | The name of the label column. |\n",
+ "| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
+ "| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
+ "| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
+ "| **time_column_name** | The name of your time column. |\n",
+ "| **hierarchy_column_names** | The names of columns that define the hierarchical structure of the data from highest level to most granular. |\n",
+ "| **training_level** | The level of the hierarchy to be used for training models. |\n",
+ "| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
+ "| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
+ "| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
+ "| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
+ "| **model_explainability** | Flag to disable explaining the best automated ML model at the end of all training iterations. The default is True and will block non-explainable models which may impact the forecast accuracy. For more information, see [Interpretability: model explanations in automated machine learning](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-machine-learning-interpretability-automl). |"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007061544
}
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.train.automl.runtime._hts.hts_parameters import HTSTrainParameters\n",
+ "\n",
+ "model_explainability = True\n",
+ "\n",
+ "engineered_explanations = False\n",
+ "# Define your hierarchy. Adjust the settings below based on your dataset.\n",
+ "hierarchy = [\"state\", \"store_id\", \"product_category\", \"SKU\"]\n",
+ "training_level = \"SKU\"\n",
+ "\n",
+ "# Set your forecast parameters. Adjust the settings below based on your dataset.\n",
+ "time_column_name = \"date\"\n",
+ "label_column_name = \"quantity\"\n",
+ "forecast_horizon = 7\n",
+ "\n",
+ "\n",
+ "automl_settings = {\n",
+ " \"task\": \"forecasting\",\n",
+ " \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
+ " \"label_column_name\": label_column_name,\n",
+ " \"time_column_name\": time_column_name,\n",
+ " \"forecast_horizon\": forecast_horizon,\n",
+ " \"hierarchy_column_names\": hierarchy,\n",
+ " \"hierarchy_training_level\": training_level,\n",
+ " \"track_child_runs\": False,\n",
+ " \"pipeline_fetch_max_batch_size\": 15,\n",
+ " \"model_explainability\": model_explainability,\n",
+ " # The following settings are specific to this sample and should be adjusted according to your own needs.\n",
+ " \"iteration_timeout_minutes\": 10,\n",
+ " \"iterations\": 10,\n",
+ " \"n_cross_validations\": 2,\n",
+ "}\n",
+ "\n",
+ "hts_parameters = HTSTrainParameters(\n",
+ " automl_settings=automl_settings,\n",
+ " hierarchy_column_names=hierarchy,\n",
+ " training_level=training_level,\n",
+ " enable_engineered_explanations=engineered_explanations,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up hierarchy training pipeline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Parallel run step is leveraged to train the hierarchy. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The `process_count_per_node` is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
+ "\n",
+ "* **experiment:** The experiment used for training.\n",
+ "* **train_data:** The tabular dataset to be used as input to the training run.\n",
+ "* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long.\n",
+ "* **process_count_per_node:** Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance.\n",
+ "* **train_pipeline_parameters:** The set of configuration parameters defined in the previous section. \n",
+ "\n",
+ "Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
+ "\n",
+ "\n",
+ "training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
+ " experiment=experiment,\n",
+ " train_data=registered_train,\n",
+ " compute_target=compute_target,\n",
+ " node_count=2,\n",
+ " process_count_per_node=8,\n",
+ " train_pipeline_parameters=hts_parameters,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.pipeline.core import Pipeline\n",
+ "\n",
+ "training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Submit the pipeline to run\n",
+ "Next we submit our pipeline to run. The whole training pipeline takes about 1h using a Standard_D16_V3 VM with our current ParallelRunConfig setting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "training_run = experiment.submit(training_pipeline)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "training_run.wait_for_completion(show_output=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### [Optional] Get the explanations\n",
+ "First we need to download the explanations to the local disk."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "if model_explainability:\n",
+ " expl_output = training_run.get_pipeline_output(\"explanations\")\n",
+ " expl_output.download(\"training_explanations\")\n",
+ "else:\n",
+ " print(\n",
+ " \"Model explanations are available only if model_explainability is set to True.\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The explanations are downloaded to the \"training_explanations/azureml\" directory."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "if model_explainability:\n",
+ " explanations_dirrectory = os.listdir(\n",
+ " os.path.join(\"training_explanations\", \"azureml\")\n",
+ " )\n",
+ " if len(explanations_dirrectory) > 1:\n",
+ " print(\n",
+ " \"Warning! The directory contains multiple explanations, only the first one will be displayed.\"\n",
+ " )\n",
+ " print(\"The explanations are located at {}.\".format(explanations_dirrectory[0]))\n",
+ " # Now we will list all the explanations.\n",
+ " explanation_path = os.path.join(\n",
+ " \"training_explanations\",\n",
+ " \"azureml\",\n",
+ " explanations_dirrectory[0],\n",
+ " \"training_explanations\",\n",
+ " )\n",
+ " print(\"Available explanations\")\n",
+ " print(\"==============================\")\n",
+ " print(\"\\n\".join(os.listdir(explanation_path)))\n",
+ "else:\n",
+ " print(\n",
+ " \"Model explanations are available only if model_explainability is set to True.\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "View the explanations on \"state\" level."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from IPython.display import display\n",
+ "\n",
+ "explanation_type = \"raw\"\n",
+ "level = \"state\"\n",
+ "\n",
+ "if model_explainability:\n",
+ " display(\n",
+ " pd.read_csv(\n",
+ " os.path.join(explanation_path, \"{}_explanations_{}.csv\").format(\n",
+ " explanation_type, level\n",
+ " )\n",
+ " )\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5.0 Forecasting\n",
+ "For hierarchical forecasting we need to provide the HTSInferenceParameters object.\n",
+ "#### HTSInferenceParameters arguments\n",
+ "* **hierarchy_forecast_level:** The default level of the hierarchy to produce prediction/forecast on.\n",
+ "* **allocation_method:** \\[Optional] The disaggregation method to use if the hierarchy forecast level specified is below the define hierarchy training level.
(average historical proportions) 'average_historical_proportions'
(proportions of the historical averages) 'proportions_of_historical_average'\n",
+ "\n",
+ "#### get_many_models_batch_inference_steps arguments\n",
+ "* **experiment:** The experiment used for inference run.\n",
+ "* **inference_data:** The data to use for inferencing. It should be the same schema as used for training.\n",
+ "* **compute_target:** The compute target that runs the inference pipeline.\n",
+ "* **node_count:** The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku).\n",
+ "* **process_count_per_node:** The number of processes per node.\n",
+ "* **train_run_id:** \\[Optional] The run id of the hierarchy training, by default it is the latest successful training hts run in the experiment.\n",
+ "* **train_experiment_name:** \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline.\n",
+ "* **process_count_per_node:** \\[Optional] The number of processes per node, by default it's 4."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.train.automl.runtime._hts.hts_parameters import HTSInferenceParameters\n",
+ "\n",
+ "inference_parameters = HTSInferenceParameters(\n",
+ " hierarchy_forecast_level=\"store_id\", # The setting is specific to this dataset and should be changed based on your dataset.\n",
+ " allocation_method=\"proportions_of_historical_average\",\n",
+ ")\n",
+ "\n",
+ "steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
+ " experiment=experiment,\n",
+ " inference_data=registered_inference,\n",
+ " compute_target=compute_target,\n",
+ " inference_pipeline_parameters=inference_parameters,\n",
+ " node_count=2,\n",
+ " process_count_per_node=8,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.pipeline.core import Pipeline\n",
+ "\n",
+ "inference_pipeline = Pipeline(ws, steps=steps)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "inference_run = experiment.submit(inference_pipeline)\n",
+ "inference_run.wait_for_completion(show_output=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Retrieve results\n",
+ "\n",
+ "Forecast results can be retrieved through the following code. The prediction results summary and the actual predictions are downloaded in forecast_results folder"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "forecasts = inference_run.get_pipeline_output(\"forecasts\")\n",
+ "forecasts.download(\"forecast_results\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Resbumit the Pipeline\n",
+ "\n",
+ "The inference pipeline can be submitted with different configurations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "inference_run = experiment.submit(\n",
+ " inference_pipeline, pipeline_parameters={\"hierarchy_forecast_level\": \"state\"}\n",
+ ")\n",
+ "inference_run.wait_for_completion(show_output=False)"
+ ]
+ }
+ ],
+ "metadata": {
+ "authors": [
+ {
+ "name": "jialiu"
+ }
+ ],
+ "categories": [
+ "how-to-use-azureml",
+ "automated-machine-learning"
+ ],
+ "kernelspec": {
+ "display_name": "Python 3.6 - AzureML",
+ "language": "python",
+ "name": "python3-azureml"
},
- "nbformat": 4,
- "nbformat_minor": 4
-}
\ No newline at end of file
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/data-table.png b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/data-table.png
new file mode 100644
index 000000000..193d9c06b
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/data-table.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/deploy-button.png b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/deploy-button.png
new file mode 100644
index 000000000..e81f2c1c5
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/deploy-button.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/food-chain.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/food-chain.PNG
new file mode 100644
index 000000000..7f46d6d9f
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/food-chain.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/hierarchy-sample-ms.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/hierarchy-sample-ms.PNG
new file mode 100644
index 000000000..82bb14f7f
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/hierarchy-sample-ms.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org-2.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org-2.PNG
new file mode 100644
index 000000000..70f48b053
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org-2.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org.PNG
new file mode 100644
index 000000000..0b4f84b6f
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/retail-org.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/workflow.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/workflow.PNG
new file mode 100644
index 000000000..7f161548d
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/media/workflow.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/update_env.yml b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/update_env.yml
new file mode 100644
index 000000000..d0b193dab
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/update_env.yml
@@ -0,0 +1,3 @@
+dependencies:
+- pip:
+ - azureml-contrib-automl-pipeline-steps
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/README.md b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/README.md
new file mode 100644
index 000000000..681528e33
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/README.md
@@ -0,0 +1,122 @@
+---
+page_type: sample
+languages:
+- python
+products:
+- azure-machine-learning
+description: Tutorial showing how to solve a complex machine learning time series forecasting problems at scale by using Azure Automated ML and Many Models solution accelerator.
+---
+
+![Many Models Solution Accelerator Banner](images/mmsa.png)
+# Many Models Solution Accelerator
+
+
+
+In the real world, many problems can be too complex to be solved by a single machine learning model. Whether that be predicting sales for each individual store, building a predictive maintanence model for hundreds of oil wells, or tailoring an experience to individual users, building a model for each instance can lead to improved results on many machine learning problems.
+
+This Pattern is very common across a wide variety of industries and applicable to many real world use cases. Below are some examples we have seen where this pattern is being used.
+
+- Energy and utility companies building predictive maintenance models for thousands of oil wells, hundreds of wind turbines or hundreds of smart meters
+
+- Retail organizations building workforce optimization models for thousands of stores, campaign promotion propensity models, Price optimization models for hundreds of thousands of products they sell
+
+- Restaurant chains building demand forecasting models across thousands of restaurants 
+
+- Banks and financial institutes building models for cash replenishment for ATM Machine and for several ATMs or building personalized models for individuals
+
+- Enterprises building revenue forecasting models at each division level
+
+- Document management companies building text analytics and legal document search models per each state
+
+Azure Machine Learning (AML) makes it easy to train, operate, and manage hundreds or even thousands of models. This repo will walk you through the end to end process of creating a many models solution from training to scoring to monitoring.
+
+## Prerequisites
+
+To use this solution accelerator, all you need is access to an [Azure subscription](https://azure.microsoft.com/free/) and an [Azure Machine Learning Workspace](https://docs.microsoft.com/azure/machine-learning/how-to-manage-workspace) that you'll create below.
+
+While it's not required, a basic understanding of Azure Machine Learning will be helpful for understanding the solution. The following resources can help introduce you to AML:
+
+1. [Azure Machine Learning Overview](https://azure.microsoft.com/services/machine-learning/)
+2. [Azure Machine Learning Tutorials](https://docs.microsoft.com/azure/machine-learning/tutorial-1st-experiment-sdk-setup)
+3. [Azure Machine Learning Sample Notebooks on Github](https://github.com/Azure/azureml-examples)
+
+## Getting started
+
+### 1. Deploy Resources
+
+Start by deploying the resources to Azure. The button below will deploy Azure Machine Learning and its related resources:
+
+
+
+
+
+### 2. Configure Development Environment
+
+Next you'll need to configure your [development environment](https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment) for Azure Machine Learning. We recommend using a [Compute Instance](https://docs.microsoft.com/azure/machine-learning/how-to-configure-environment#compute-instance) as it's the fastest way to get up and running.
+
+### 3. Run Notebooks
+
+Once your development environment is set up, run through the Jupyter Notebooks sequentially following the steps outlined. By the end, you'll know how to train, score, and make predictions using the many models pattern on Azure Machine Learning.
+
+![Sequence of Notebooks](./images/mmsa-overview.png)
+
+
+## Contents
+
+In this repo, you'll train and score a forecasting model for each orange juice brand and for each store at a (simulated) grocery chain. By the end, you'll have forecasted sales by using up to 11,973 models to predict sales for the next few weeks.
+
+The data used in this sample is simulated based on the [Dominick's Orange Juice Dataset](http://www.cs.unitn.it/~taufer/QMMA/L10-OJ-Data.html#(1)), sales data from a Chicago area grocery store.
+
+
+
+### Using Automated ML to train the models:
+
+The [`auto-ml-forecasting-many-models.ipynb`](./auto-ml-forecasting-many-models.ipynb) noteboook is a guided solution accelerator that demonstrates steps from data preparation, to model training, and forecasting on train models as well as operationalizing the solution.
+
+## How-to-videos
+
+Watch these how-to-videos for a step by step walk-through of the many model solution accelerator to learn how to setup your models using Automated ML.
+
+### Automated ML
+
+[![Watch the video](https://media.giphy.com/media/dWUKfameudyNGRnp1t/giphy.gif)](https://channel9.msdn.com/Shows/Docs-AI/Building-Large-Scale-Machine-Learning-Forecasting-Models-using-Azure-Machine-Learnings-Automated-ML)
+
+## Key concepts
+
+### ParallelRunStep
+
+[ParallelRunStep](https://docs.microsoft.com/en-us/python/api/azureml-pipeline-steps/azureml.pipeline.steps.parallel_run_step.parallelrunstep?view=azure-ml-py) enables the parallel training of models and is commonly used for batch inferencing. This [document](https://docs.microsoft.com/azure/machine-learning/how-to-use-parallel-run-step) walks through some of the key concepts around ParallelRunStep.
+
+### Pipelines
+
+[Pipelines](https://docs.microsoft.com/azure/machine-learning/concept-ml-pipelines) allow you to create workflows in your machine learning projects. These workflows have a number of benefits including speed, simplicity, repeatability, and modularity.
+
+### Automated Machine Learning
+
+[Automated Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-automated-ml) also referred to as automated ML or AutoML, is the process of automating the time consuming, iterative tasks of machine learning model development. It allows data scientists, analysts, and developers to build ML models with high scale, efficiency, and productivity all while sustaining model quality.
+
+### Other Concepts
+
+In additional to ParallelRunStep, Pipelines and Automated Machine Learning, you'll also be working with the following concepts including [workspace](https://docs.microsoft.com/azure/machine-learning/concept-workspace), [datasets](https://docs.microsoft.com/azure/machine-learning/concept-data#datasets), [compute targets](https://docs.microsoft.com/azure/machine-learning/concept-compute-target#train), [python script steps](https://docs.microsoft.com/python/api/azureml-pipeline-steps/azureml.pipeline.steps.python_script_step.pythonscriptstep?view=azure-ml-py), and [Azure Open Datasets](https://azure.microsoft.com/services/open-datasets/).
+
+## Contributing
+
+This project welcomes contributions and suggestions. To learn more visit the [contributing](../../../CONTRIBUTING.md) section.
+
+Most contributions require you to agree to a Contributor License Agreement (CLA)
+declaring that you have the right to, and actually do, grant us
+the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+
+When you submit a pull request, a CLA bot will automatically determine whether you need to provide
+a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
+provided by the bot. You will only need to do this once across all repos using our CLA.
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
+For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
+contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/auto-ml-forecasting-many-models.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/auto-ml-forecasting-many-models.ipynb
index 75caf8596..686b8aeb2 100644
--- a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/auto-ml-forecasting-many-models.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/auto-ml-forecasting-many-models.ipynb
@@ -1,746 +1,746 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Copyright (c) Microsoft Corporation. All rights reserved.\n",
- "\n",
- "Licensed under the MIT License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Many Models - Automated ML\n",
- "**_Generate many models time series forecasts with Automated Machine Learning_**\n",
- "\n",
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For this notebook we are using a synthetic dataset portraying sales data to predict the quantity of a vartiety of product SKUs across several states, stores, and product categories.\n",
- "\n",
- "**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Prerequisites\n",
- "You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 1.0 Set up workspace, datastore, experiment"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613003526897
- }
- },
- "outputs": [],
- "source": [
- "import azureml.core\n",
- "from azureml.core import Workspace, Datastore\n",
- "import pandas as pd\n",
- "\n",
- "# Set up your workspace\n",
- "ws = Workspace.from_config()\n",
- "ws.get_details()\n",
- "\n",
- "# Set up your datastores\n",
- "dstore = ws.get_default_datastore()\n",
- "\n",
- "output = {}\n",
- "output[\"SDK version\"] = azureml.core.VERSION\n",
- "output[\"Subscription ID\"] = ws.subscription_id\n",
- "output[\"Workspace\"] = ws.name\n",
- "output[\"Resource Group\"] = ws.resource_group\n",
- "output[\"Location\"] = ws.location\n",
- "output[\"Default datastore name\"] = dstore.name\n",
- "pd.set_option(\"display.max_colwidth\", -1)\n",
- "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
- "outputDf.T"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Choose an experiment"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613003540729
- }
- },
- "outputs": [],
- "source": [
- "from azureml.core import Experiment\n",
- "\n",
- "experiment = Experiment(ws, \"automl-many-models\")\n",
- "\n",
- "print(\"Experiment name: \" + experiment.name)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 2.0 Data\n",
- "\n",
- "This notebook uses simulated orange juice sales data to walk you through the process of training many models on Azure Machine Learning using Automated ML. \n",
- "\n",
- "The time series data used in this example was simulated based on the University of Chicago's Dominick's Finer Foods dataset which featured two years of sales of 3 different orange juice brands for individual stores. The full simulated dataset includes 3,991 stores with 3 orange juice brands each thus allowing 11,973 models to be trained to showcase the power of the many models pattern.\n",
- "\n",
- " \n",
- "In this notebook, two datasets will be created: one with all 11,973 files and one with only 10 files that can be used to quickly test and debug. For each dataset, you'll be walked through the process of:\n",
- "\n",
- "1. Registering the blob container as a Datastore to the Workspace\n",
- "2. Registering a tabular dataset to the Workspace"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "nteract": {
- "transient": {
- "deleting": false
- }
- }
- },
- "source": [
- "### 2.1 Data Preparation\n",
- "The OJ data is available in the public blob container. The data is split to be used for training and for inferencing. For the current dataset, the data was split on time column ('WeekStarting') before and after '1992-5-28' .\n",
- "\n",
- "The container has\n",
- "\n",
- " - 'oj-data-tabular' and 'oj-inference-tabular' folders that contains training and inference data respectively for the 11,973 models.
\n",
- " - It also has 'oj-data-small-tabular' and 'oj-inference-small-tabular' folders that has training and inference data for 10 models.
\n",
- "
\n",
- "\n",
- "To create the [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) needed for the ParallelRunStep, you first need to register the blob container to the workspace."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "nteract": {
- "transient": {
- "deleting": false
- }
- }
- },
- "source": [
- " To use your own data, put your own data in a blobstore folder. As shown it can be one file or multiple files. We can then register datastore using that blob as shown below.\n",
- " \n",
- " How sample data in blob store looks like
\n",
- "\n",
- "['oj-data-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
- "![image-4.png](mm-1.png)\n",
- "\n",
- "['oj-inference-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
- "![image-3.png](mm-2.png)\n",
- "\n",
- "['oj-data-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
- "\n",
- "![image-5.png](mm-3.png)\n",
- "\n",
- "['oj-inference-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
- "![image-6.png](mm-4.png)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### 2.2 Register the blob container as DataStore\n",
- "\n",
- "A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
- "\n",
- "Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) documentation on how to access data from Datastore.\n",
- "\n",
- "In this next step, we will be registering blob storage as datastore to the Workspace."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.core import Datastore\n",
- "\n",
- "# Please change the following to point to your own blob container and pass in account_key\n",
- "blob_datastore_name = \"automl_many_models\"\n",
- "container_name = \"automl-sample-notebook-data\"\n",
- "account_name = \"automlsamplenotebookdata\"\n",
- "\n",
- "oj_datastore = Datastore.register_azure_blob_container(\n",
- " workspace=ws,\n",
- " datastore_name=blob_datastore_name,\n",
- " container_name=container_name,\n",
- " account_name=account_name,\n",
- " create_if_not_exists=True,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "#### 2.3 Using tabular datasets \n",
- "\n",
- "Now that the datastore is available from the Workspace, [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) can be created. Datasets in Azure Machine Learning are references to specific data in a Datastore. We are using TabularDataset, so that users who have their data which can be in one or many files (*.parquet or *.csv) and have not split up data according to group columns needed for training, can do so using out of box support for 'partiion_by' feature of TabularDataset shown in section 5.0 below."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007017296
- }
- },
- "outputs": [],
- "source": [
- "from azureml.core import Dataset\n",
- "\n",
- "ds_name_small = \"oj-data-small-tabular\"\n",
- "input_ds_small = Dataset.Tabular.from_delimited_files(\n",
- " path=oj_datastore.path(ds_name_small + \"/\"), validate=False\n",
- ")\n",
- "\n",
- "inference_name_small = \"oj-inference-small-tabular\"\n",
- "inference_ds_small = Dataset.Tabular.from_delimited_files(\n",
- " path=oj_datastore.path(inference_name_small + \"/\"), validate=False\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 3.0 Build the training pipeline\n",
- "Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
- "\n",
- "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Choose a compute target\n",
- "\n",
- "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
- "\n",
- "\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
- "\n",
- "If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007037308
- }
- },
- "outputs": [],
- "source": [
- "from azureml.core.compute import ComputeTarget, AmlCompute\n",
- "\n",
- "# Name your cluster\n",
- "compute_name = \"mm-compute\"\n",
- "\n",
- "\n",
- "if compute_name in ws.compute_targets:\n",
- " compute_target = ws.compute_targets[compute_name]\n",
- " if compute_target and type(compute_target) is AmlCompute:\n",
- " print(\"Found compute target: \" + compute_name)\n",
- "else:\n",
- " print(\"Creating a new compute target...\")\n",
- " provisioning_config = AmlCompute.provisioning_configuration(\n",
- " vm_size=\"STANDARD_D16S_V3\", max_nodes=20\n",
- " )\n",
- " # Create the compute target\n",
- " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
- "\n",
- " # Can poll for a minimum number of nodes and for a specific timeout.\n",
- " # If no min node count is provided it will use the scale settings for the cluster\n",
- " compute_target.wait_for_completion(\n",
- " show_output=True, min_node_count=None, timeout_in_minutes=20\n",
- " )\n",
- "\n",
- " # For a more detailed view of current cluster status, use the 'status' property\n",
- " print(compute_target.status.serialize())"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Set up training parameters\n",
- "\n",
- "This dictionary defines the AutoML and many models settings. For this forecasting task we need to define several settings including the name of the time column, the maximum forecast horizon, and the partition column name definition.\n",
- "\n",
- "| Property | Description|\n",
- "| :--------------- | :------------------- |\n",
- "| **task** | forecasting |\n",
- "| **primary_metric** | This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error |\n",
- "| **blocked_models** | Blocked models won't be used by AutoML. |\n",
- "| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
- "| **label_column_name** | The name of the label column. |\n",
- "| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
- "| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
- "| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
- "| **time_column_name** | The name of your time column. |\n",
- "| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
- "| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
- "| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
- "| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
- "| **partition_column_names** | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "gather": {
- "logged": 1613007061544
- }
- },
- "outputs": [],
- "source": [
- "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
- " ManyModelsTrainParameters,\n",
- ")\n",
- "\n",
- "partition_column_names = [\"Store\", \"Brand\"]\n",
- "automl_settings = {\n",
- " \"task\": \"forecasting\",\n",
- " \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
- " \"iteration_timeout_minutes\": 10, # This needs to be changed based on the dataset. We ask customer to explore how long training is taking before settings this value\n",
- " \"iterations\": 15,\n",
- " \"experiment_timeout_hours\": 0.25,\n",
- " \"label_column_name\": \"Quantity\",\n",
- " \"n_cross_validations\": 3,\n",
- " \"time_column_name\": \"WeekStarting\",\n",
- " \"drop_column_names\": \"Revenue\",\n",
- " \"max_horizon\": 6,\n",
- " \"grain_column_names\": partition_column_names,\n",
- " \"track_child_runs\": False,\n",
- "}\n",
- "\n",
- "mm_paramters = ManyModelsTrainParameters(\n",
- " automl_settings=automl_settings, partition_column_names=partition_column_names\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Set up many models pipeline"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Parallel run step is leveraged to train multiple models at once. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The process_count_per_node is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
- "\n",
- "| Property | Description|\n",
- "| :--------------- | :------------------- |\n",
- "| **experiment** | The experiment used for training. |\n",
- "| **train_data** | The file dataset to be used as input to the training run. |\n",
- "| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long. |\n",
- "| **process_count_per_node** | Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance. |\n",
- "| **train_pipeline_parameters** | The set of configuration parameters defined in the previous section. |\n",
- "\n",
- "Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
- "\n",
- "\n",
- "training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
- " experiment=experiment,\n",
- " train_data=input_ds_small,\n",
- " compute_target=compute_target,\n",
- " node_count=2,\n",
- " process_count_per_node=8,\n",
- " run_invocation_timeout=920,\n",
- " train_pipeline_parameters=mm_paramters,\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.pipeline.core import Pipeline\n",
- "\n",
- "training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Submit the pipeline to run\n",
- "Next we submit our pipeline to run. The whole training pipeline takes about 40m using a STANDARD_D16S_V3 VM with our current ParallelRunConfig setting."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "training_run = experiment.submit(training_pipeline)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "training_run.wait_for_completion(show_output=False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 5.0 Publish and schedule the train pipeline (Optional)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### 5.1 Publish the pipeline\n",
- "\n",
- "Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# published_pipeline = training_pipeline.publish(name = 'automl_train_many_models',\n",
- "# description = 'train many models',\n",
- "# version = '1',\n",
- "# continue_on_step_failure = False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### 7.2 Schedule the pipeline\n",
- "You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain models every month or based on another trigger such as data drift."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
- "\n",
- "# training_pipeline_id = published_pipeline.id\n",
- "\n",
- "# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
- "# recurring_schedule = Schedule.create(ws, name=\"automl_training_recurring_schedule\",\n",
- "# description=\"Schedule Training Pipeline to run on the first day of every month\",\n",
- "# pipeline_id=training_pipeline_id,\n",
- "# experiment_name=experiment.name,\n",
- "# recurrence=recurrence)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 6.0 Forecasting"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Set up output dataset for inference data\n",
- "Output of inference can be represented as [OutputFileDatasetConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.output_dataset_config.outputdatasetconfig?view=azure-ml-py) object and OutputFileDatasetConfig can be registered as a dataset. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.data import OutputFileDatasetConfig\n",
- "\n",
- "output_inference_data_ds = OutputFileDatasetConfig(\n",
- " name=\"many_models_inference_output\", destination=(dstore, \"oj/inference_data/\")\n",
- ").register_on_complete(name=\"oj_inference_data_ds\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For many models we need to provide the ManyModelsInferenceParameters object.\n",
- "\n",
- "#### ManyModelsInferenceParameters arguments\n",
- "| Property | Description|\n",
- "| :--------------- | :------------------- |\n",
- "| **partition_column_names** | List of column names that identifies groups. |\n",
- "| **target_column_name** | \\[Optional] Column name only if the inference dataset has the target. |\n",
- "| **time_column_name** | \\[Optional] Column name only if it is timeseries. |\n",
- "| **many_models_run_id** | \\[Optional] Many models run id where models were trained. |\n",
- "\n",
- "#### get_many_models_batch_inference_steps arguments\n",
- "| Property | Description|\n",
- "| :--------------- | :------------------- |\n",
- "| **experiment** | The experiment used for inference run. |\n",
- "| **inference_data** | The data to use for inferencing. It should be the same schema as used for training.\n",
- "| **compute_target** | The compute target that runs the inference pipeline.|\n",
- "| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku). |\n",
- "| **process_count_per_node** | The number of processes per node.\n",
- "| **train_run_id** | \\[Optional\\] The run id of the hierarchy training, by default it is the latest successful training many model run in the experiment. |\n",
- "| **train_experiment_name** | \\[Optional\\] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline. |\n",
- "| **process_count_per_node** | \\[Optional\\] The number of processes per node, by default it's 4. |"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
- "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
- " ManyModelsInferenceParameters,\n",
- ")\n",
- "\n",
- "mm_parameters = ManyModelsInferenceParameters(\n",
- " partition_column_names=[\"Store\", \"Brand\"],\n",
- " time_column_name=\"WeekStarting\",\n",
- " target_column_name=\"Quantity\",\n",
- ")\n",
- "\n",
- "inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
- " experiment=experiment,\n",
- " inference_data=inference_ds_small,\n",
- " node_count=2,\n",
- " process_count_per_node=8,\n",
- " compute_target=compute_target,\n",
- " run_invocation_timeout=300,\n",
- " output_datastore=output_inference_data_ds,\n",
- " train_run_id=training_run.id,\n",
- " train_experiment_name=training_run.experiment.name,\n",
- " inference_pipeline_parameters=mm_parameters,\n",
- ")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.pipeline.core import Pipeline\n",
- "\n",
- "inference_pipeline = Pipeline(ws, steps=inference_steps)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "inference_run = experiment.submit(inference_pipeline)\n",
- "inference_run.wait_for_completion(show_output=False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Retrieve results\n",
- "\n",
- "The forecasting pipeline forecasts the orange juice quantity for a Store by Brand. The pipeline returns one file with the predictions for each store and outputs the result to the forecasting_output Blob container. The details of the blob container is listed in 'forecasting_output.txt' under Outputs+logs. \n",
- "\n",
- "The following code snippet:\n",
- "1. Downloads the contents of the output folder that is passed in the parallel run step \n",
- "2. Reads the parallel_run_step.txt file that has the predictions as pandas dataframe and \n",
- "3. Displays the top 10 rows of the predictions"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.contrib.automl.pipeline.steps.utilities import get_output_from_mm_pipeline\n",
- "\n",
- "forecasting_results_name = \"forecasting_results\"\n",
- "forecasting_output_name = \"many_models_inference_output\"\n",
- "forecast_file = get_output_from_mm_pipeline(\n",
- " inference_run, forecasting_results_name, forecasting_output_name\n",
- ")\n",
- "df = pd.read_csv(forecast_file, delimiter=\" \", header=None)\n",
- "df.columns = [\n",
- " \"Week Starting\",\n",
- " \"Store\",\n",
- " \"Brand\",\n",
- " \"Quantity\",\n",
- " \"Advert\",\n",
- " \"Price\",\n",
- " \"Revenue\",\n",
- " \"Predicted\",\n",
- "]\n",
- "print(\n",
- " \"Prediction has \", df.shape[0], \" rows. Here the first 10 rows are being displayed.\"\n",
- ")\n",
- "df.head(10)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## 7.0 Publish and schedule the inference pipeline (Optional)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### 7.1 Publish the pipeline\n",
- "\n",
- "Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# published_pipeline_inf = inference_pipeline.publish(name = 'automl_forecast_many_models',\n",
- "# description = 'forecast many models',\n",
- "# version = '1',\n",
- "# continue_on_step_failure = False)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### 7.2 Schedule the pipeline\n",
- "You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain or forecast models every month or based on another trigger such as data drift."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
- "\n",
- "# forecasting_pipeline_id = published_pipeline.id\n",
- "\n",
- "# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
- "# recurring_schedule = Schedule.create(ws, name=\"automl_forecasting_recurring_schedule\",\n",
- "# description=\"Schedule Forecasting Pipeline to run on the first day of every week\",\n",
- "# pipeline_id=forecasting_pipeline_id,\n",
- "# experiment_name=experiment.name,\n",
- "# recurrence=recurrence)"
- ]
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+ "\n",
+ "Licensed under the MIT License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-hierarchical-timeseries/auto-ml-forecasting-hierarchical-timeseries.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Many Models - Automated ML\n",
+ "**_Generate many models time series forecasts with Automated Machine Learning_**\n",
+ "\n",
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For this notebook we are using a synthetic dataset portraying sales data to predict the the quantity of a vartiety of product skus across several states, stores, and product categories.\n",
+ "\n",
+ "**NOTE: There are limits on how many runs we can do in parallel per workspace, and we currently recommend to set the parallelism to maximum of 320 runs per experiment per workspace. If users want to have more parallelism and increase this limit they might encounter Too Many Requests errors (HTTP 429).**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Prerequisites\n",
+ "You'll need to create a compute Instance by following the instructions in the [EnvironmentSetup.md](../Setup_Resources/EnvironmentSetup.md)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 1.0 Set up workspace, datastore, experiment"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613003526897
}
- ],
- "metadata": {
- "authors": [
- {
- "name": "jialiu"
- }
- ],
- "categories": [
- "how-to-use-azureml",
- "automated-machine-learning"
- ],
- "kernelspec": {
- "display_name": "Python 3.6",
- "language": "python",
- "name": "python36"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.8"
+ },
+ "outputs": [],
+ "source": [
+ "import azureml.core\n",
+ "from azureml.core import Workspace, Datastore\n",
+ "import pandas as pd\n",
+ "\n",
+ "# Set up your workspace\n",
+ "ws = Workspace.from_config()\n",
+ "ws.get_details()\n",
+ "\n",
+ "# Set up your datastores\n",
+ "dstore = ws.get_default_datastore()\n",
+ "\n",
+ "output = {}\n",
+ "output[\"SDK version\"] = azureml.core.VERSION\n",
+ "output[\"Subscription ID\"] = ws.subscription_id\n",
+ "output[\"Workspace\"] = ws.name\n",
+ "output[\"Resource Group\"] = ws.resource_group\n",
+ "output[\"Location\"] = ws.location\n",
+ "output[\"Default datastore name\"] = dstore.name\n",
+ "pd.set_option(\"display.max_colwidth\", -1)\n",
+ "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
+ "outputDf.T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Choose an experiment"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613003540729
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core import Experiment\n",
+ "\n",
+ "experiment = Experiment(ws, \"automl-many-models\")\n",
+ "\n",
+ "print(\"Experiment name: \" + experiment.name)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 2.0 Data\n",
+ "\n",
+ "This notebook uses simulated orange juice sales data to walk you through the process of training many models on Azure Machine Learning using Automated ML. \n",
+ "\n",
+ "The time series data used in this example was simulated based on the University of Chicago's Dominick's Finer Foods dataset which featured two years of sales of 3 different orange juice brands for individual stores. The full simulated dataset includes 3,991 stores with 3 orange juice brands each thus allowing 11,973 models to be trained to showcase the power of the many models pattern.\n",
+ "\n",
+ " \n",
+ "In this notebook, two datasets will be created: one with all 11,973 files and one with only 10 files that can be used to quickly test and debug. For each dataset, you'll be walked through the process of:\n",
+ "\n",
+ "1. Registering the blob container as a Datastore to the Workspace\n",
+ "2. Registering a tabular dataset to the Workspace"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
+ }
+ },
+ "source": [
+ "### 2.1 Data Preparation\n",
+ "The OJ data is available in the public blob container. The data is split to be used for training and for inferencing. For the current dataset, the data was split on time column ('WeekStarting') before and after '1992-5-28' .\n",
+ "\n",
+ "The container has\n",
+ "\n",
+ " - 'oj-data-tabular' and 'oj-inference-tabular' folders that contains training and inference data respectively for the 11,973 models.
\n",
+ " - It also has 'oj-data-small-tabular' and 'oj-inference-small-tabular' folders that has training and inference data for 10 models.
\n",
+ "
\n",
+ "\n",
+ "To create the [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) needed for the ParallelRunStep, you first need to register the blob container to the workspace."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "nteract": {
+ "transient": {
+ "deleting": false
+ }
}
+ },
+ "source": [
+ " To use your own data, put your own data in a blobstore folder. As shown it can be one file or multiple files. We can then register datastore using that blob as shown below.\n",
+ " \n",
+ " How sample data in blob store looks like
\n",
+ "\n",
+ "['oj-data-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
+ "![image-4.png](mm-1.png)\n",
+ "\n",
+ "['oj-inference-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
+ "![image-3.png](mm-2.png)\n",
+ "\n",
+ "['oj-data-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
+ "\n",
+ "![image-5.png](mm-3.png)\n",
+ "\n",
+ "['oj-inference-small-tabular'](https://ms.portal.azure.com/#blade/Microsoft_Azure_Storage/ContainerMenuBlade/overview/storageAccountId/%2Fsubscriptions%2F102a16c3-37d3-48a8-9237-4c9b1e8e80e0%2FresourceGroups%2FAutoMLSampleNotebooksData%2Fproviders%2FMicrosoft.Storage%2FstorageAccounts%2Fautomlsamplenotebookdata/path/automl-sample-notebook-data/etag/%220x8D84EAA65DE50B7%22/defaultEncryptionScope/%24account-encryption-key/denyEncryptionScopeOverride//defaultId//publicAccessVal/Container)\n",
+ "![image-6.png](mm-4.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.2 Register the blob container as DataStore\n",
+ "\n",
+ "A Datastore is a place where data can be stored that is then made accessible to a compute either by means of mounting or copying the data to the compute target.\n",
+ "\n",
+ "Please refer to [Datastore](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py) documentation on how to access data from Datastore.\n",
+ "\n",
+ "In this next step, we will be registering blob storage as datastore to the Workspace."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.core import Datastore\n",
+ "\n",
+ "# Please change the following to point to your own blob container and pass in account_key\n",
+ "blob_datastore_name = \"automl_many_models\"\n",
+ "container_name = \"automl-sample-notebook-data\"\n",
+ "account_name = \"automlsamplenotebookdata\"\n",
+ "\n",
+ "oj_datastore = Datastore.register_azure_blob_container(\n",
+ " workspace=ws,\n",
+ " datastore_name=blob_datastore_name,\n",
+ " container_name=container_name,\n",
+ " account_name=account_name,\n",
+ " create_if_not_exists=True,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 2.3 Using tabular datasets \n",
+ "\n",
+ "Now that the datastore is available from the Workspace, [TabularDataset](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset?view=azure-ml-py) can be created. Datasets in Azure Machine Learning are references to specific data in a Datastore. We are using TabularDataset, so that users who have their data which can be in one or many files (*.parquet or *.csv) and have not split up data according to group columns needed for training, can do so using out of box support for 'partiion_by' feature of TabularDataset shown in section 5.0 below."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007017296
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core import Dataset\n",
+ "\n",
+ "ds_name_small = \"oj-data-small-tabular\"\n",
+ "input_ds_small = Dataset.Tabular.from_delimited_files(\n",
+ " path=oj_datastore.path(ds_name_small + \"/\"), validate=False\n",
+ ")\n",
+ "\n",
+ "inference_name_small = \"oj-inference-small-tabular\"\n",
+ "inference_ds_small = Dataset.Tabular.from_delimited_files(\n",
+ " path=oj_datastore.path(inference_name_small + \"/\"), validate=False\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 3.0 Build the training pipeline\n",
+ "Now that the dataset, WorkSpace, and datastore are set up, we can put together a pipeline for training.\n",
+ "\n",
+ "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Choose a compute target\n",
+ "\n",
+ "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
+ "\n",
+ "\\*\\*Creation of AmlCompute takes approximately 5 minutes.**\n",
+ "\n",
+ "If the AmlCompute with that name is already in your workspace this code will skip the creation process. As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read this [article](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas) on the default limits and how to request more quota."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007037308
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+ "\n",
+ "# Name your cluster\n",
+ "compute_name = \"mm-compute\"\n",
+ "\n",
+ "\n",
+ "if compute_name in ws.compute_targets:\n",
+ " compute_target = ws.compute_targets[compute_name]\n",
+ " if compute_target and type(compute_target) is AmlCompute:\n",
+ " print(\"Found compute target: \" + compute_name)\n",
+ "else:\n",
+ " print(\"Creating a new compute target...\")\n",
+ " provisioning_config = AmlCompute.provisioning_configuration(\n",
+ " vm_size=\"STANDARD_D16S_V3\", max_nodes=20\n",
+ " )\n",
+ " # Create the compute target\n",
+ " compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)\n",
+ "\n",
+ " # Can poll for a minimum number of nodes and for a specific timeout.\n",
+ " # If no min node count is provided it will use the scale settings for the cluster\n",
+ " compute_target.wait_for_completion(\n",
+ " show_output=True, min_node_count=None, timeout_in_minutes=20\n",
+ " )\n",
+ "\n",
+ " # For a more detailed view of current cluster status, use the 'status' property\n",
+ " print(compute_target.status.serialize())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up training parameters\n",
+ "\n",
+ "This dictionary defines the AutoML and many models settings. For this forecasting task we need to define several settings inncluding the name of the time column, the maximum forecast horizon, and the partition column name definition.\n",
+ "\n",
+ "| Property | Description|\n",
+ "| :--------------- | :------------------- |\n",
+ "| **task** | forecasting |\n",
+ "| **primary_metric** | This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error |\n",
+ "| **blocked_models** | Blocked models won't be used by AutoML. |\n",
+ "| **iteration_timeout_minutes** | Maximum amount of time in minutes that the model can train. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **iterations** | Number of models to train. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **experiment_timeout_hours** | Maximum amount of time in hours that the experiment can take before it terminates. This is optional but provides customers with greater control on exit criteria. |\n",
+ "| **label_column_name** | The name of the label column. |\n",
+ "| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). Periods are inferred from your data. |\n",
+ "| **n_cross_validations** | Number of cross validation splits. Rolling Origin Validation is used to split time-series in a temporally consistent way. |\n",
+ "| **enable_early_stopping** | Flag to enable early termination if the score is not improving in the short term. |\n",
+ "| **time_column_name** | The name of your time column. |\n",
+ "| **enable_engineered_explanations** | Engineered feature explanations will be downloaded if enable_engineered_explanations flag is set to True. By default it is set to False to save storage space. |\n",
+ "| **time_series_id_column_name** | The column names used to uniquely identify timeseries in data that has multiple rows with the same timestamp. |\n",
+ "| **track_child_runs** | Flag to disable tracking of child runs. Only best run is tracked if the flag is set to False (this includes the model and metrics of the run). |\n",
+ "| **pipeline_fetch_max_batch_size** | Determines how many pipelines (training algorithms) to fetch at a time for training, this helps reduce throttling when training at large scale. |\n",
+ "| **partition_column_names** | The names of columns used to group your models. For timeseries, the groups must not split up individual time-series. That is, each group must contain one or more whole time-series. |"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "gather": {
+ "logged": 1613007061544
+ }
+ },
+ "outputs": [],
+ "source": [
+ "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
+ " ManyModelsTrainParameters,\n",
+ ")\n",
+ "\n",
+ "partition_column_names = [\"Store\", \"Brand\"]\n",
+ "automl_settings = {\n",
+ " \"task\": \"forecasting\",\n",
+ " \"primary_metric\": \"normalized_root_mean_squared_error\",\n",
+ " \"iteration_timeout_minutes\": 10, # This needs to be changed based on the dataset. We ask customer to explore how long training is taking before settings this value\n",
+ " \"iterations\": 15,\n",
+ " \"experiment_timeout_hours\": 0.25,\n",
+ " \"label_column_name\": \"Quantity\",\n",
+ " \"n_cross_validations\": 3,\n",
+ " \"time_column_name\": \"WeekStarting\",\n",
+ " \"drop_column_names\": \"Revenue\",\n",
+ " \"max_horizon\": 6,\n",
+ " \"grain_column_names\": partition_column_names,\n",
+ " \"track_child_runs\": False,\n",
+ "}\n",
+ "\n",
+ "mm_paramters = ManyModelsTrainParameters(\n",
+ " automl_settings=automl_settings, partition_column_names=partition_column_names\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up many models pipeline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Parallel run step is leveraged to train multiple models at once. To configure the ParallelRunConfig you will need to determine the appropriate number of workers and nodes for your use case. The process_count_per_node is based off the number of cores of the compute VM. The node_count will determine the number of master nodes to use, increasing the node count will speed up the training process.\n",
+ "\n",
+ "| Property | Description|\n",
+ "| :--------------- | :------------------- |\n",
+ "| **experiment** | The experiment used for training. |\n",
+ "| **train_data** | The file dataset to be used as input to the training run. |\n",
+ "| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with 3 and increase the node_count if the training time is taking too long. |\n",
+ "| **process_count_per_node** | Process count per node, we recommend 2:1 ratio for number of cores: number of processes per node. eg. If node has 16 cores then configure 8 or less process count per node or optimal performance. |\n",
+ "| **train_pipeline_parameters** | The set of configuration parameters defined in the previous section. |\n",
+ "\n",
+ "Calling this method will create a new aggregated dataset which is generated dynamically on pipeline execution."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
+ "\n",
+ "\n",
+ "training_pipeline_steps = AutoMLPipelineBuilder.get_many_models_train_steps(\n",
+ " experiment=experiment,\n",
+ " train_data=input_ds_small,\n",
+ " compute_target=compute_target,\n",
+ " node_count=2,\n",
+ " process_count_per_node=8,\n",
+ " run_invocation_timeout=920,\n",
+ " train_pipeline_parameters=mm_paramters,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.pipeline.core import Pipeline\n",
+ "\n",
+ "training_pipeline = Pipeline(ws, steps=training_pipeline_steps)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Submit the pipeline to run\n",
+ "Next we submit our pipeline to run. The whole training pipeline takes about 40m using a STANDARD_D16S_V3 VM with our current ParallelRunConfig setting."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "training_run = experiment.submit(training_pipeline)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "training_run.wait_for_completion(show_output=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Check the run status, if training_run is in completed state, continue to forecasting. If training_run is in another state, check the portal for failures."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 5.0 Publish and schedule the train pipeline (Optional)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 5.1 Publish the pipeline\n",
+ "\n",
+ "Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# published_pipeline = training_pipeline.publish(name = 'automl_train_many_models',\n",
+ "# description = 'train many models',\n",
+ "# version = '1',\n",
+ "# continue_on_step_failure = False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 7.2 Schedule the pipeline\n",
+ "You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain models every month or based on another trigger such as data drift."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
+ "\n",
+ "# training_pipeline_id = published_pipeline.id\n",
+ "\n",
+ "# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
+ "# recurring_schedule = Schedule.create(ws, name=\"automl_training_recurring_schedule\",\n",
+ "# description=\"Schedule Training Pipeline to run on the first day of every month\",\n",
+ "# pipeline_id=training_pipeline_id,\n",
+ "# experiment_name=experiment.name,\n",
+ "# recurrence=recurrence)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 6.0 Forecasting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Set up output dataset for inference data\n",
+ "Output of inference can be represented as [OutputFileDatasetConfig](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.output_dataset_config.outputdatasetconfig?view=azure-ml-py) object and OutputFileDatasetConfig can be registered as a dataset. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.data import OutputFileDatasetConfig\n",
+ "\n",
+ "output_inference_data_ds = OutputFileDatasetConfig(\n",
+ " name=\"many_models_inference_output\", destination=(dstore, \"oj/inference_data/\")\n",
+ ").register_on_complete(name=\"oj_inference_data_ds\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For many models we need to provide the ManyModelsInferenceParameters object.\n",
+ "\n",
+ "#### ManyModelsInferenceParameters arguments\n",
+ "| Property | Description|\n",
+ "| :--------------- | :------------------- |\n",
+ "| **partition_column_names** | List of column names that identifies groups. |\n",
+ "| **target_column_name** | \\[Optional] Column name only if the inference dataset has the target. |\n",
+ "| **time_column_name** | \\[Optional] Column name only if it is timeseries. |\n",
+ "| **many_models_run_id** | \\[Optional] Many models run id where models were trained. |\n",
+ "\n",
+ "#### get_many_models_batch_inference_steps arguments\n",
+ "| Property | Description|\n",
+ "| :--------------- | :------------------- |\n",
+ "| **experiment** | The experiment used for inference run. |\n",
+ "| **inference_data** | The data to use for inferencing. It should be the same schema as used for training.\n",
+ "| **compute_target** The compute target that runs the inference pipeline.|\n",
+ "| **node_count** | The number of compute nodes to be used for running the user script. We recommend to start with the number of cores per node (varies by compute sku). |\n",
+ "| **process_count_per_node** The number of processes per node.\n",
+ "| **train_run_id** | \\[Optional] The run id of the hierarchy training, by default it is the latest successful training many model run in the experiment. |\n",
+ "| **train_experiment_name** | \\[Optional] The train experiment that contains the train pipeline. This one is only needed when the train pipeline is not in the same experiement as the inference pipeline. |\n",
+ "| **process_count_per_node** | \\[Optional] The number of processes per node, by default it's 4. |"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.contrib.automl.pipeline.steps import AutoMLPipelineBuilder\n",
+ "from azureml.train.automl.runtime._many_models.many_models_parameters import (\n",
+ " ManyModelsInferenceParameters,\n",
+ ")\n",
+ "\n",
+ "mm_parameters = ManyModelsInferenceParameters(\n",
+ " partition_column_names=[\"Store\", \"Brand\"],\n",
+ " time_column_name=\"WeekStarting\",\n",
+ " target_column_name=\"Quantity\",\n",
+ ")\n",
+ "\n",
+ "inference_steps = AutoMLPipelineBuilder.get_many_models_batch_inference_steps(\n",
+ " experiment=experiment,\n",
+ " inference_data=inference_ds_small,\n",
+ " node_count=2,\n",
+ " process_count_per_node=8,\n",
+ " compute_target=compute_target,\n",
+ " run_invocation_timeout=300,\n",
+ " output_datastore=output_inference_data_ds,\n",
+ " train_run_id=training_run.id,\n",
+ " train_experiment_name=training_run.experiment.name,\n",
+ " inference_pipeline_parameters=mm_parameters,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.pipeline.core import Pipeline\n",
+ "\n",
+ "inference_pipeline = Pipeline(ws, steps=inference_steps)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "inference_run = experiment.submit(inference_pipeline)\n",
+ "inference_run.wait_for_completion(show_output=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Retrieve results\n",
+ "\n",
+ "The forecasting pipeline forecasts the orange juice quantity for a Store by Brand. The pipeline returns one file with the predictions for each store and outputs the result to the forecasting_output Blob container. The details of the blob container is listed in 'forecasting_output.txt' under Outputs+logs. \n",
+ "\n",
+ "The following code snippet:\n",
+ "1. Downloads the contents of the output folder that is passed in the parallel run step \n",
+ "2. Reads the parallel_run_step.txt file that has the predictions as pandas dataframe and \n",
+ "3. Displays the top 10 rows of the predictions"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.contrib.automl.pipeline.steps.utilities import get_output_from_mm_pipeline\n",
+ "\n",
+ "forecasting_results_name = \"forecasting_results\"\n",
+ "forecasting_output_name = \"many_models_inference_output\"\n",
+ "forecast_file = get_output_from_mm_pipeline(\n",
+ " inference_run, forecasting_results_name, forecasting_output_name\n",
+ ")\n",
+ "df = pd.read_csv(forecast_file, delimiter=\" \", header=None)\n",
+ "df.columns = [\n",
+ " \"Week Starting\",\n",
+ " \"Store\",\n",
+ " \"Brand\",\n",
+ " \"Quantity\",\n",
+ " \"Advert\",\n",
+ " \"Price\",\n",
+ " \"Revenue\",\n",
+ " \"Predicted\",\n",
+ "]\n",
+ "print(\n",
+ " \"Prediction has \", df.shape[0], \" rows. Here the first 10 rows are being displayed.\"\n",
+ ")\n",
+ "df.head(10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.0 Publish and schedule the inference pipeline (Optional)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 7.1 Publish the pipeline\n",
+ "\n",
+ "Once you have a pipeline you're happy with, you can publish a pipeline so you can call it programmatically later on. See this [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline#publish-a-pipeline) for additional information on publishing and calling pipelines."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# published_pipeline_inf = inference_pipeline.publish(name = 'automl_forecast_many_models',\n",
+ "# description = 'forecast many models',\n",
+ "# version = '1',\n",
+ "# continue_on_step_failure = False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 7.2 Schedule the pipeline\n",
+ "You can also [schedule the pipeline](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-schedule-pipelines) to run on a time-based or change-based schedule. This could be used to automatically retrain or forecast models every month or based on another trigger such as data drift."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# from azureml.pipeline.core import Schedule, ScheduleRecurrence\n",
+ "\n",
+ "# forecasting_pipeline_id = published_pipeline.id\n",
+ "\n",
+ "# recurrence = ScheduleRecurrence(frequency=\"Month\", interval=1, start_time=\"2020-01-01T09:00:00\")\n",
+ "# recurring_schedule = Schedule.create(ws, name=\"automl_forecasting_recurring_schedule\",\n",
+ "# description=\"Schedule Forecasting Pipeline to run on the first day of every week\",\n",
+ "# pipeline_id=forecasting_pipeline_id,\n",
+ "# experiment_name=experiment.name,\n",
+ "# recurrence=recurrence)"
+ ]
+ }
+ ],
+ "metadata": {
+ "authors": [
+ {
+ "name": "jialiu"
+ }
+ ],
+ "categories": [
+ "how-to-use-azureml",
+ "automated-machine-learning"
+ ],
+ "kernelspec": {
+ "display_name": "Python 3.6 - AzureML",
+ "language": "python",
+ "name": "python3-azureml"
},
- "nbformat": 4,
- "nbformat_minor": 4
-}
\ No newline at end of file
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.8"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/01_userfilesupdate.PNG b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/01_userfilesupdate.PNG
new file mode 100644
index 000000000..6b46a9c02
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/01_userfilesupdate.PNG differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/Flow_map.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/Flow_map.png
new file mode 100644
index 000000000..e895d0bcc
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/Flow_map.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/ai show.gif b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/ai show.gif
new file mode 100644
index 000000000..98d280ae0
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/ai show.gif differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/computes_view.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/computes_view.png
new file mode 100644
index 000000000..634ab83cb
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/computes_view.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/create_notebook_vm.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/create_notebook_vm.png
new file mode 100644
index 000000000..59f632920
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/create_notebook_vm.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa-overview.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa-overview.png
new file mode 100644
index 000000000..d95817c80
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa-overview.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa.png
new file mode 100644
index 000000000..2e0f12f7e
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/mmsa.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/terminal.png b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/terminal.png
new file mode 100644
index 000000000..d0d342db8
Binary files /dev/null and b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/images/terminal.png differ
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-many-models/update_env.yml b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/update_env.yml
new file mode 100644
index 000000000..d0b193dab
--- /dev/null
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-many-models/update_env.yml
@@ -0,0 +1,3 @@
+dependencies:
+- pip:
+ - azureml-contrib-automl-pipeline-steps
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
index d41a93bc0..6f4967b9c 100644
--- a/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.ipynb
@@ -1,834 +1,844 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Copyright (c) Microsoft Corporation. All rights reserved.\n",
- "\n",
- "Licensed under the MIT License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.png)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Automated Machine Learning\n",
- "_**Orange Juice Sales Forecasting**_\n",
- "\n",
- "## Contents\n",
- "1. [Introduction](#introduction)\n",
- "1. [Setup](#setup)\n",
- "1. [Compute](#compute)\n",
- "1. [Data](#data)\n",
- "1. [Train](#train)\n",
- "1. [Forecast](#forecast)\n",
- "1. [Operationalize](#operationalize)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Introduction\n",
- "In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
- "\n",
- "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
- "\n",
- "The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Setup"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import json\n",
- "import logging\n",
- "\n",
- "import azureml.core\n",
- "import pandas as pd\n",
- "from azureml.automl.core.featurization import FeaturizationConfig\n",
- "from azureml.core.experiment import Experiment\n",
- "from azureml.core.workspace import Workspace\n",
- "from azureml.train.automl import AutoMLConfig\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This sample notebook may use features that are not available in previous versions of the Azure ML SDK."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "print(\"This notebook was created using version 1.38.0 of the Azure ML SDK\")\n",
- "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "As part of the setup you have already created a Workspace. To run AutoML, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "ws = Workspace.from_config()\n",
- "\n",
- "# choose a name for the run history container in the workspace\n",
- "experiment_name = \"automl-ojforecasting\"\n",
- "\n",
- "experiment = Experiment(ws, experiment_name)\n",
- "\n",
- "output = {}\n",
- "output[\"Subscription ID\"] = ws.subscription_id\n",
- "output[\"Workspace\"] = ws.name\n",
- "output[\"SKU\"] = ws.sku\n",
- "output[\"Resource Group\"] = ws.resource_group\n",
- "output[\"Location\"] = ws.location\n",
- "output[\"Run History Name\"] = experiment_name\n",
- "pd.set_option(\"display.max_colwidth\", -1)\n",
- "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
- "outputDf.T"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Compute\n",
- "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
- "\n",
- "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
- "\n",
- "#### Creation of AmlCompute takes approximately 5 minutes. \n",
- "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
- "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.core.compute import ComputeTarget, AmlCompute\n",
- "from azureml.core.compute_target import ComputeTargetException\n",
- "\n",
- "# Choose a name for your CPU cluster\n",
- "amlcompute_cluster_name = \"oj-cluster\"\n",
- "\n",
- "# Verify that cluster does not exist already\n",
- "try:\n",
- " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
- " print(\"Found existing cluster, use it.\")\n",
- "except ComputeTargetException:\n",
- " compute_config = AmlCompute.provisioning_configuration(\n",
- " vm_size=\"STANDARD_D12_V2\", max_nodes=6\n",
- " )\n",
- " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
- "\n",
- "compute_target.wait_for_completion(show_output=True)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Data\n",
- "You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "time_column_name = \"WeekStarting\"\n",
- "data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
- "\n",
- "# Drop the columns 'logQuantity' as it is a leaky feature.\n",
- "data.drop(\"logQuantity\", axis=1, inplace=True)\n",
- "\n",
- "data.head()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
- "\n",
- "The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we define the **time_series_id_column_names** - the columns whose values determine the boundaries between time-series: "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "time_series_id_column_names = [\"Store\", \"Brand\"]\n",
- "nseries = data.groupby(time_series_id_column_names).ngroups\n",
- "print(\"Data contains {0} individual time-series.\".format(nseries))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For demonstration purposes, we extract sales time-series for just a few of the stores:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "use_stores = [2, 5, 8]\n",
- "data_subset = data[data.Store.isin(use_stores)]\n",
- "nseries = data_subset.groupby(time_series_id_column_names).ngroups\n",
- "print(\"Data subset contains {0} individual time-series.\".format(nseries))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Data Splitting\n",
- "We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the time series identifier columns."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "n_test_periods = 20\n",
- "\n",
- "\n",
- "def split_last_n_by_series_id(df, n):\n",
- " \"\"\"Group df by series identifiers and split on last n rows for each group.\"\"\"\n",
- " df_grouped = df.sort_values(time_column_name).groupby( # Sort by ascending time\n",
- " time_series_id_column_names, group_keys=False\n",
- " )\n",
- " df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
- " df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
- " return df_head, df_tail\n",
- "\n",
- "\n",
- "train, test = split_last_n_by_series_id(data_subset, n_test_periods)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Upload data to datastore\n",
- "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the train and test data and create [tabular datasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training and testing. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.data.dataset_factory import TabularDatasetFactory\n",
- "\n",
- "datastore = ws.get_default_datastore()\n",
- "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
- " train, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_train\"\n",
- ")\n",
- "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
- " test, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_test\"\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Create dataset for training"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "train_dataset.to_pandas_dataframe().tail()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Modeling\n",
- "\n",
- "For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
- "* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
- "* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
- "* Create features based on time series identifiers to enable fixed effects across different series\n",
- "* Create time-based features to assist in learning seasonal patterns\n",
- "* Encode categorical variables to numeric quantities\n",
- "\n",
- "In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please see the many-models notebook.\n",
- "\n",
- "You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "target_column_name = \"Quantity\""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Customization\n",
- "\n",
- "The featurization customization in forecasting is an advanced feature in AutoML which allows our customers to change the default forecasting featurization behaviors and column types through `FeaturizationConfig`. The supported scenarios include:\n",
- "\n",
- "1. Column purposes update: Override feature type for the specified column. Currently supports DateTime, Categorical and Numeric. This customization can be used in the scenario that the type of the column cannot correctly reflect its purpose. Some numerical columns, for instance, can be treated as Categorical columns which need to be converted to categorical while some can be treated as epoch timestamp which need to be converted to datetime. To tell our SDK to correctly preprocess these columns, a configuration need to be add with the columns and their desired types.\n",
- "2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods. The supported imputing methods for target column are constant and ffill (forward fill). The supported imputing methods for feature columns are mean, median, most frequent, constant and ffill (forward fill). This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
- "3. Drop columns: Columns to drop from being featurized. These usually are the columns which are leaky or the columns contain no useful data."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "tags": [
- "sample-featurizationconfig-remarks"
- ]
- },
- "outputs": [],
- "source": [
- "featurization_config = FeaturizationConfig()\n",
- "# Force the CPWVOL5 feature to be numeric type.\n",
- "featurization_config.add_column_purpose(\"CPWVOL5\", \"Numeric\")\n",
- "# Fill missing values in the target column, Quantity, with zeros.\n",
- "featurization_config.add_transformer_params(\n",
- " \"Imputer\", [\"Quantity\"], {\"strategy\": \"constant\", \"fill_value\": 0}\n",
- ")\n",
- "# Fill missing values in the INCOME column with median value.\n",
- "featurization_config.add_transformer_params(\n",
- " \"Imputer\", [\"INCOME\"], {\"strategy\": \"median\"}\n",
- ")\n",
- "# Fill missing values in the Price column with forward fill (last value carried forward).\n",
- "featurization_config.add_transformer_params(\"Imputer\", [\"Price\"], {\"strategy\": \"ffill\"})"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Forecasting Parameters\n",
- "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
- "\n",
- "\n",
- "|Property|Description|\n",
- "|-|-|\n",
- "|**time_column_name**|The name of your time column.|\n",
- "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
- "|**time_series_id_column_names**|This optional parameter represents the column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined or incorrectly defined, time series identifiers will be created automatically if they exist.|\n",
- "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Train\n",
- "\n",
- "The [AutoMLConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters.\n",
- "\n",
- "For forecasting tasks, there are some additional parameters that can be set in the `ForecastingParameters` class: the name of the column holding the date/time, the timeseries id column names, and the maximum forecast horizon. A time column is required for forecasting, while the time_series_id is optional. If time_series_id columns are not given or incorrectly given, AutoML automatically creates time_series_id columns if they exist. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
- "\n",
- "The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the forecast horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning application that estimates the next month of sales should set the horizon according to suitable planning time-scales. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
- "\n",
- "We note here that AutoML can sweep over two types of time-series models:\n",
- "* Models that are trained for each series such as ARIMA and Facebook's Prophet.\n",
- "* Models trained across multiple time-series using a regression approach.\n",
- "\n",
- "In the first case, AutoML loops over all time-series in your dataset and trains one model (e.g. AutoArima or Prophet, as the case may be) for each series. This can result in long runtimes to train these models if there are a lot of series in the data. One way to mitigate this problem is to fit models for different series in parallel if you have multiple compute cores available. To enable this behavior, set the `max_cores_per_iteration` parameter in your AutoMLConfig as shown in the example in the next cell. \n",
- "\n",
- "\n",
- "Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n",
- "\n",
- "Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
- "\n",
- "|Property|Description|\n",
- "|-|-|\n",
- "|**task**|forecasting|\n",
- "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error\n",
- "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
- "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
- "|**training_data**|Input dataset, containing both features and label column.|\n",
- "|**label_column_name**|The name of the label column.|\n",
- "|**compute_target**|The remote compute for training.|\n",
- "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
- "|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models|\n",
- "|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models|\n",
- "|**debug_log**|Log file path for writing debugging information|\n",
- "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
- "|**max_cores_per_iteration**|Maximum number of cores to utilize per iteration. A value of -1 indicates all available cores should be used"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
- "\n",
- "forecasting_parameters = ForecastingParameters(\n",
- " time_column_name=time_column_name,\n",
- " forecast_horizon=n_test_periods,\n",
- " freq=\"W-THU\", # Set the forecast frequency to be weekly (start on each Thursday)\n",
- ")\n",
- "\n",
- "automl_config = AutoMLConfig(\n",
- " task=\"forecasting\",\n",
- " debug_log=\"automl_oj_sales_errors.log\",\n",
- " primary_metric=\"normalized_mean_absolute_error\",\n",
- " experiment_timeout_hours=0.25,\n",
- " training_data=train_dataset,\n",
- " label_column_name=target_column_name,\n",
- " compute_target=compute_target,\n",
- " enable_early_stopping=True,\n",
- " featurization=featurization_config,\n",
- " n_cross_validations=3,\n",
- " verbosity=logging.INFO,\n",
- " max_cores_per_iteration=-1,\n",
- " forecasting_parameters=forecasting_parameters,\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "You can now submit a new training run. Depending on the data and number of iterations this operation may take several minutes.\n",
- "Information from each iteration will be printed to the console. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "remote_run = experiment.submit(automl_config, show_output=False)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "remote_run.wait_for_completion()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Retrieve the Best Run details\n",
- "Below we retrieve the best Run object from among all the runs in the experiment."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "best_run = remote_run.get_best_child()\n",
- "model_name = best_run.properties[\"model_name\"]\n",
- "best_run"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "## Transparency\n",
- "\n",
- "View updated featurization summary"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Download the featurization summary JSON file locally\n",
- "best_run.download_file(\"outputs/featurization_summary.json\", \"featurization_summary.json\")\n",
- "\n",
- "# Render the JSON as a pandas DataFrame\n",
- "with open(\"featurization_summary.json\", \"r\") as f:\n",
- " records = json.load(f)\n",
- "fs = pd.DataFrame.from_records(records)\n",
- "\n",
- "# View a summary of the featurization \n",
- "fs[[\"RawFeatureName\", \"TypeDetected\", \"Dropped\", \"EngineeredFeatureCount\", \"Transformations\"]]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Forecast\n",
- "\n",
- "Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
- "\n",
- "The inference will run on a remote compute. In this example, it will re-use the training compute."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "test_experiment = Experiment(ws, experiment_name + \"_inference\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Retreiving forecasts from the model\n",
- "We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from run_forecast import run_remote_inference\n",
- "\n",
- "remote_run_infer = run_remote_inference(\n",
- " test_experiment=test_experiment,\n",
- " compute_target=compute_target,\n",
- " train_run=best_run,\n",
- " test_dataset=test_dataset,\n",
- " target_column_name=target_column_name,\n",
- ")\n",
- "remote_run_infer.wait_for_completion(show_output=False)\n",
- "\n",
- "# download the forecast file to the local machine\n",
- "remote_run_infer.download_file(\"outputs/predictions.csv\", \"predictions.csv\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Evaluate\n",
- "\n",
- "To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals).\n",
- "\n",
- "We'll add predictions and actuals into a single dataframe for convenience in calculating the metrics."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# load forecast data frame\n",
- "fcst_df = pd.read_csv(\"predictions.csv\", parse_dates=[time_column_name])\n",
- "fcst_df.head()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.automl.core.shared import constants\n",
- "from azureml.automl.runtime.shared.score import scoring\n",
- "from matplotlib import pyplot as plt\n",
- "\n",
- "# use automl scoring module\n",
- "scores = scoring.score_regression(\n",
- " y_test=fcst_df[target_column_name],\n",
- " y_pred=fcst_df[\"predicted\"],\n",
- " metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
- ")\n",
- "\n",
- "print(\"[Test data scores]\\n\")\n",
- "for key, value in scores.items():\n",
- " print(\"{}: {:.3f}\".format(key, value))\n",
- "\n",
- "# Plot outputs\n",
- "%matplotlib inline\n",
- "test_pred = plt.scatter(fcst_df[target_column_name], fcst_df[\"predicted\"], color=\"b\")\n",
- "test_test = plt.scatter(\n",
- " fcst_df[target_column_name], fcst_df[target_column_name], color=\"g\"\n",
- ")\n",
- "plt.legend(\n",
- " (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
- ")\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Operationalize"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "description = \"AutoML OJ forecaster\"\n",
- "tags = None\n",
- "model = remote_run.register_model(\n",
- " model_name=model_name, description=description, tags=tags\n",
- ")\n",
- "\n",
- "print(remote_run.model_id)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Develop the scoring script\n",
- "\n",
- "For the deployment we need a function which will run the forecast on serialized data. It can be obtained from the best_run."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "script_file_name = \"score_fcast.py\"\n",
- "best_run.download_file(\"outputs/scoring_file_v_1_0_0.py\", script_file_name)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Deploy the model as a Web Service on Azure Container Instance"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "from azureml.core.model import InferenceConfig\n",
- "from azureml.core.webservice import AciWebservice\n",
- "from azureml.core.webservice import Webservice\n",
- "from azureml.core.model import Model\n",
- "\n",
- "inference_config = InferenceConfig(\n",
- " environment=best_run.get_environment(), entry_script=script_file_name\n",
- ")\n",
- "\n",
- "aciconfig = AciWebservice.deploy_configuration(\n",
- " cpu_cores=2,\n",
- " memory_gb=4,\n",
- " tags={\"type\": \"automl-forecasting\"},\n",
- " description=\"Automl forecasting sample service\",\n",
- ")\n",
- "\n",
- "aci_service_name = \"automl-oj-forecast-01\"\n",
- "print(aci_service_name)\n",
- "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
- "aci_service.wait_for_deployment(True)\n",
- "print(aci_service.state)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "aci_service.get_logs()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Call the service"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import json\n",
- "\n",
- "X_query = test.copy()\n",
- "X_query.pop(target_column_name)\n",
- "# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n",
- "X_query[time_column_name] = X_query[time_column_name].astype(str)\n",
- "# The Service object accept the complex dictionary, which is internally converted to JSON string.\n",
- "# The section 'data' contains the data frame in the form of dictionary.\n",
- "sample_quantiles = [0.025, 0.975]\n",
- "test_sample = json.dumps(\n",
- " {\"data\": X_query.to_dict(orient=\"records\"), \"quantiles\": sample_quantiles}\n",
- ")\n",
- "response = aci_service.run(input_data=test_sample)\n",
- "# translate from networkese to datascientese\n",
- "try:\n",
- " res_dict = json.loads(response)\n",
- " y_fcst_all = pd.DataFrame(res_dict[\"index\"])\n",
- " y_fcst_all[time_column_name] = pd.to_datetime(\n",
- " y_fcst_all[time_column_name], unit=\"ms\"\n",
- " )\n",
- " y_fcst_all[\"forecast\"] = res_dict[\"forecast\"]\n",
- " y_fcst_all[\"prediction_interval\"] = res_dict[\"prediction_interval\"]\n",
- "except:\n",
- " print(res_dict)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "y_fcst_all.head()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "### Delete the web service if desired"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "serv = Webservice(ws, \"automl-oj-forecast-01\")\n",
- "serv.delete() # don't do it accidentally"
- ]
- }
- ],
- "metadata": {
- "authors": [
- {
- "name": "jialiu"
- }
- ],
- "category": "tutorial",
- "celltoolbar": "Raw Cell Format",
- "compute": [
- "Remote"
- ],
- "datasets": [
- "Orange Juice Sales"
- ],
- "deployment": [
- "Azure Container Instance"
- ],
- "exclude_from_index": false,
- "framework": [
- "Azure ML AutoML"
- ],
- "friendly_name": "Forecasting orange juice sales with deployment",
- "index_order": 1,
- "kernelspec": {
- "display_name": "Python 3.6",
- "language": "python",
- "name": "python36"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.6.9"
- },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+ "\n",
+ "Licensed under the MIT License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-orange-juice-sales/auto-ml-forecasting-orange-juice-sales.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Automated Machine Learning\n",
+ "_**Orange Juice Sales Forecasting**_\n",
+ "\n",
+ "## Contents\n",
+ "1. [Introduction](#introduction)\n",
+ "1. [Setup](#setup)\n",
+ "1. [Compute](#compute)\n",
+ "1. [Data](#data)\n",
+ "1. [Train](#train)\n",
+ "1. [Forecast](#forecast)\n",
+ "1. [Operationalize](#operationalize)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Introduction\n",
+ "In this example, we use AutoML to train, select, and operationalize a time-series forecasting model for multiple time-series.\n",
+ "\n",
+ "Make sure you have executed the [configuration notebook](../../../configuration.ipynb) before running this notebook.\n",
+ "\n",
+ "The examples in the follow code samples use the University of Chicago's Dominick's Finer Foods dataset to forecast orange juice sales. Dominick's was a grocery chain in the Chicago metropolitan area."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Setup"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "import logging\n",
+ "\n",
+ "import azureml.core\n",
+ "import pandas as pd\n",
+ "from azureml.automl.core.featurization import FeaturizationConfig\n",
+ "from azureml.core.experiment import Experiment\n",
+ "from azureml.core.workspace import Workspace\n",
+ "from azureml.train.automl import AutoMLConfig"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This notebook is compatible with Azure ML SDK version 1.35.0 or later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As part of the setup you have already created a Workspace. To run AutoML, you also need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ws = Workspace.from_config()\n",
+ "\n",
+ "# choose a name for the run history container in the workspace\n",
+ "experiment_name = \"automl-ojforecasting\"\n",
+ "\n",
+ "experiment = Experiment(ws, experiment_name)\n",
+ "\n",
+ "output = {}\n",
+ "output[\"Subscription ID\"] = ws.subscription_id\n",
+ "output[\"Workspace\"] = ws.name\n",
+ "output[\"SKU\"] = ws.sku\n",
+ "output[\"Resource Group\"] = ws.resource_group\n",
+ "output[\"Location\"] = ws.location\n",
+ "output[\"Run History Name\"] = experiment_name\n",
+ "pd.set_option(\"display.max_colwidth\", -1)\n",
+ "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
+ "outputDf.T"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Compute\n",
+ "You will need to create a [compute target](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute) for your AutoML run. In this tutorial, you create AmlCompute as your training compute resource.\n",
+ "\n",
+ "> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.\n",
+ "\n",
+ "#### Creation of AmlCompute takes approximately 5 minutes. \n",
+ "If the AmlCompute with that name is already in your workspace this code will skip the creation process.\n",
+ "As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.core.compute import ComputeTarget, AmlCompute\n",
+ "from azureml.core.compute_target import ComputeTargetException\n",
+ "\n",
+ "# Choose a name for your CPU cluster\n",
+ "amlcompute_cluster_name = \"oj-cluster\"\n",
+ "\n",
+ "# Verify that cluster does not exist already\n",
+ "try:\n",
+ " compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)\n",
+ " print(\"Found existing cluster, use it.\")\n",
+ "except ComputeTargetException:\n",
+ " compute_config = AmlCompute.provisioning_configuration(\n",
+ " vm_size=\"STANDARD_D12_V2\", max_nodes=6\n",
+ " )\n",
+ " compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)\n",
+ "\n",
+ "compute_target.wait_for_completion(show_output=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Data\n",
+ "You are now ready to load the historical orange juice sales data. We will load the CSV file into a plain pandas DataFrame; the time column in the CSV is called _WeekStarting_, so it will be specially parsed into the datetime type."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "time_column_name = \"WeekStarting\"\n",
+ "data = pd.read_csv(\"dominicks_OJ.csv\", parse_dates=[time_column_name])\n",
+ "\n",
+ "# Drop the columns 'logQuantity' as it is a leaky feature.\n",
+ "data.drop(\"logQuantity\", axis=1, inplace=True)\n",
+ "\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Each row in the DataFrame holds a quantity of weekly sales for an OJ brand at a single store. The data also includes the sales price, a flag indicating if the OJ brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also include the logarithm of the sales quantity. The Dominick's grocery data is commonly used to illustrate econometric modeling techniques where logarithms of quantities are generally preferred. \n",
+ "\n",
+ "The task is now to build a time-series model for the _Quantity_ column. It is important to note that this dataset is comprised of many individual time-series - one for each unique combination of _Store_ and _Brand_. To distinguish the individual time-series, we define the **time_series_id_column_names** - the columns whose values determine the boundaries between time-series: "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "time_series_id_column_names = [\"Store\", \"Brand\"]\n",
+ "nseries = data.groupby(time_series_id_column_names).ngroups\n",
+ "print(\"Data contains {0} individual time-series.\".format(nseries))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For demonstration purposes, we extract sales time-series for just a few of the stores:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "use_stores = [2, 5, 8]\n",
+ "data_subset = data[data.Store.isin(use_stores)]\n",
+ "nseries = data_subset.groupby(time_series_id_column_names).ngroups\n",
+ "print(\"Data subset contains {0} individual time-series.\".format(nseries))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Data Splitting\n",
+ "We now split the data into a training and a testing set for later forecast evaluation. The test set will contain the final 20 weeks of observed sales for each time-series. The splits should be stratified by series, so we use a group-by statement on the time series identifier columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "n_test_periods = 20\n",
+ "\n",
+ "\n",
+ "def split_last_n_by_series_id(df, n):\n",
+ " \"\"\"Group df by series identifiers and split on last n rows for each group.\"\"\"\n",
+ " df_grouped = df.sort_values(time_column_name).groupby( # Sort by ascending time\n",
+ " time_series_id_column_names, group_keys=False\n",
+ " )\n",
+ " df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
+ " df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
+ " return df_head, df_tail\n",
+ "\n",
+ "\n",
+ "train, test = split_last_n_by_series_id(data_subset, n_test_periods)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Upload data to datastore\n",
+ "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the train and test data and create [tabular datasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training and testing. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.data.dataset_factory import TabularDatasetFactory\n",
+ "\n",
+ "datastore = ws.get_default_datastore()\n",
+ "train_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
+ " train, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_train\"\n",
+ ")\n",
+ "test_dataset = TabularDatasetFactory.register_pandas_dataframe(\n",
+ " test, target=(datastore, \"dataset/\"), name=\"dominicks_OJ_test\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Create dataset for training"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_dataset.to_pandas_dataframe().tail()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Modeling\n",
+ "\n",
+ "For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
+ "* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span \n",
+ "* Impute missing values in the target (via forward-fill) and feature columns (using median column values) \n",
+ "* Create features based on time series identifiers to enable fixed effects across different series\n",
+ "* Create time-based features to assist in learning seasonal patterns\n",
+ "* Encode categorical variables to numeric quantities\n",
+ "\n",
+ "In this notebook, AutoML will train a single, regression-type model across **all** time-series in a given training set. This allows the model to generalize across related series. If you're looking for training multiple models for different time-series, please see the many-models notebook.\n",
+ "\n",
+ "You are almost ready to start an AutoML training job. First, we need to separate the target column from the rest of the DataFrame: "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "target_column_name = \"Quantity\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Customization\n",
+ "\n",
+ "The featurization customization in forecasting is an advanced feature in AutoML which allows our customers to change the default forecasting featurization behaviors and column types through `FeaturizationConfig`. The supported scenarios include:\n",
+ "\n",
+ "1. Column purposes update: Override feature type for the specified column. Currently supports DateTime, Categorical and Numeric. This customization can be used in the scenario that the type of the column cannot correctly reflect its purpose. Some numerical columns, for instance, can be treated as Categorical columns which need to be converted to categorical while some can be treated as epoch timestamp which need to be converted to datetime. To tell our SDK to correctly preprocess these columns, a configuration need to be add with the columns and their desired types.\n",
+ "2. Transformer parameters update: Currently supports parameter change for Imputer only. User can customize imputation methods. The supported imputing methods for target column are constant and ffill (forward fill). The supported imputing methods for feature columns are mean, median, most frequent, constant and ffill (forward fill). This customization can be used for the scenario that our customers know which imputation methods fit best to the input data. For instance, some datasets use NaN to represent 0 which the correct behavior should impute all the missing value with 0. To achieve this behavior, these columns need to be configured as constant imputation with `fill_value` 0.\n",
+ "3. Drop columns: Columns to drop from being featurized. These usually are the columns which are leaky or the columns contain no useful data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
"tags": [
- "None"
- ],
- "task": "Forecasting"
+ "sample-featurizationconfig-remarks"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "featurization_config = FeaturizationConfig()\n",
+ "# Force the CPWVOL5 feature to be numeric type.\n",
+ "featurization_config.add_column_purpose(\"CPWVOL5\", \"Numeric\")\n",
+ "# Fill missing values in the target column, Quantity, with zeros.\n",
+ "featurization_config.add_transformer_params(\n",
+ " \"Imputer\", [\"Quantity\"], {\"strategy\": \"constant\", \"fill_value\": 0}\n",
+ ")\n",
+ "# Fill missing values in the INCOME column with median value.\n",
+ "featurization_config.add_transformer_params(\n",
+ " \"Imputer\", [\"INCOME\"], {\"strategy\": \"median\"}\n",
+ ")\n",
+ "# Fill missing values in the Price column with forward fill (last value carried forward).\n",
+ "featurization_config.add_transformer_params(\"Imputer\", [\"Price\"], {\"strategy\": \"ffill\"})"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Forecasting Parameters\n",
+ "To define forecasting parameters for your experiment training, you can leverage the ForecastingParameters class. The table below details the forecasting parameter we will be passing into our experiment.\n",
+ "\n",
+ "\n",
+ "|Property|Description|\n",
+ "|-|-|\n",
+ "|**time_column_name**|The name of your time column.|\n",
+ "|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|\n",
+ "|**time_series_id_column_names**|The column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined, the data set is assumed to be one time series.|\n",
+ "|**freq**|Forecast frequency. This optional parameter represents the period with which the forecast is desired, for example, daily, weekly, yearly, etc. Use this parameter for the correction of time series containing irregular data points or for padding of short time series. The frequency needs to be a pandas offset alias. Please refer to [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects) for more information."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Train\n",
+ "\n",
+ "The [AutoMLConfig](https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py) object defines the settings and data for an AutoML training job. Here, we set necessary inputs like the task type, the number of AutoML iterations to try, the training data, and cross-validation parameters.\n",
+ "\n",
+ "For forecasting tasks, there are some additional parameters that can be set in the `ForecastingParameters` class: the name of the column holding the date/time, the timeseries id column names, and the maximum forecast horizon. A time column is required for forecasting, while the time_series_id is optional. If time_series_id columns are not given, AutoML assumes that the whole dataset is a single time-series. We also pass a list of columns to drop prior to modeling. The _logQuantity_ column is completely correlated with the target quantity, so it must be removed to prevent a target leak.\n",
+ "\n",
+ "The forecast horizon is given in units of the time-series frequency; for instance, the OJ series frequency is weekly, so a horizon of 20 means that a trained model will estimate sales up to 20 weeks beyond the latest date in the training data for each series. In this example, we set the forecast horizon to the number of samples per series in the test set (n_test_periods). Generally, the value of this parameter will be dictated by business needs. For example, a demand planning application that estimates the next month of sales should set the horizon according to suitable planning time-scales. Please see the [energy_demand notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand) for more discussion of forecast horizon.\n",
+ "\n",
+ "We note here that AutoML can sweep over two types of time-series models:\n",
+ "* Models that are trained for each series such as ARIMA and Facebook's Prophet.\n",
+ "* Models trained across multiple time-series using a regression approach.\n",
+ "\n",
+ "In the first case, AutoML loops over all time-series in your dataset and trains one model (e.g. AutoArima or Prophet, as the case may be) for each series. This can result in long runtimes to train these models if there are a lot of series in the data. One way to mitigate this problem is to fit models for different series in parallel if you have multiple compute cores available. To enable this behavior, set the `max_cores_per_iteration` parameter in your AutoMLConfig as shown in the example in the next cell. \n",
+ "\n",
+ "\n",
+ "Finally, a note about the cross-validation (CV) procedure for time-series data. AutoML uses out-of-sample error estimates to select a best pipeline/model, so it is important that the CV fold splitting is done correctly. Time-series can violate the basic statistical assumptions of the canonical K-Fold CV strategy, so AutoML implements a [rolling origin validation](https://robjhyndman.com/hyndsight/tscv/) procedure to create CV folds for time-series data. To use this procedure, you just need to specify the desired number of CV folds in the AutoMLConfig object. It is also possible to bypass CV and use your own validation set by setting the *validation_data* parameter of AutoMLConfig.\n",
+ "\n",
+ "Here is a summary of AutoMLConfig parameters used for training the OJ model:\n",
+ "\n",
+ "|Property|Description|\n",
+ "|-|-|\n",
+ "|**task**|forecasting|\n",
+ "|**primary_metric**|This is the metric that you want to optimize.
Forecasting supports the following primary metrics
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error\n",
+ "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
+ "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
+ "|**training_data**|Input dataset, containing both features and label column.|\n",
+ "|**label_column_name**|The name of the label column.|\n",
+ "|**compute_target**|The remote compute for training.|\n",
+ "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
+ "|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models|\n",
+ "|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models|\n",
+ "|**debug_log**|Log file path for writing debugging information|\n",
+ "|**featurization**| 'auto' / 'off' / FeaturizationConfig Indicator for whether featurization step should be done automatically or not, or whether customized featurization should be used. Setting this enables AutoML to perform featurization on the input to handle *missing data*, and to perform some common *feature extraction*.|\n",
+ "|**max_cores_per_iteration**|Maximum number of cores to utilize per iteration. A value of -1 indicates all available cores should be used"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.automl.core.forecasting_parameters import ForecastingParameters\n",
+ "\n",
+ "forecasting_parameters = ForecastingParameters(\n",
+ " time_column_name=time_column_name,\n",
+ " forecast_horizon=n_test_periods,\n",
+ " time_series_id_column_names=time_series_id_column_names,\n",
+ " freq=\"W-THU\", # Set the forecast frequency to be weekly (start on each Thursday)\n",
+ ")\n",
+ "\n",
+ "automl_config = AutoMLConfig(\n",
+ " task=\"forecasting\",\n",
+ " debug_log=\"automl_oj_sales_errors.log\",\n",
+ " primary_metric=\"normalized_mean_absolute_error\",\n",
+ " experiment_timeout_hours=0.25,\n",
+ " training_data=train_dataset,\n",
+ " label_column_name=target_column_name,\n",
+ " compute_target=compute_target,\n",
+ " enable_early_stopping=True,\n",
+ " featurization=featurization_config,\n",
+ " n_cross_validations=3,\n",
+ " verbosity=logging.INFO,\n",
+ " max_cores_per_iteration=-1,\n",
+ " forecasting_parameters=forecasting_parameters,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "You can now submit a new training run. Depending on the data and number of iterations this operation may take several minutes.\n",
+ "Information from each iteration will be printed to the console. Validation errors and current status will be shown when setting `show_output=True` and the execution will be synchronous."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "remote_run = experiment.submit(automl_config, show_output=False)"
+ ]
},
- "nbformat": 4,
- "nbformat_minor": 4
-}
\ No newline at end of file
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "remote_run.wait_for_completion()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Retrieve the Best Run details\n",
+ "Below we retrieve the best Run object from among all the runs in the experiment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "best_run = remote_run.get_best_child()\n",
+ "model_name = best_run.properties[\"model_name\"]\n",
+ "best_run"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Transparency\n",
+ "\n",
+ "View updated featurization summary"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Download the featurization summary JSON file locally\n",
+ "best_run.download_file(\n",
+ " \"outputs/featurization_summary.json\", \"featurization_summary.json\"\n",
+ ")\n",
+ "\n",
+ "# Render the JSON as a pandas DataFrame\n",
+ "with open(\"featurization_summary.json\", \"r\") as f:\n",
+ " records = json.load(f)\n",
+ "fs = pd.DataFrame.from_records(records)\n",
+ "\n",
+ "# View a summary of the featurization\n",
+ "fs[\n",
+ " [\n",
+ " \"RawFeatureName\",\n",
+ " \"TypeDetected\",\n",
+ " \"Dropped\",\n",
+ " \"EngineeredFeatureCount\",\n",
+ " \"Transformations\",\n",
+ " ]\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Forecast\n",
+ "\n",
+ "Now that we have retrieved the best pipeline/model, it can be used to make predictions on test data. We will do batch scoring on the test dataset which should have the same schema as training dataset.\n",
+ "\n",
+ "The inference will run on a remote compute. In this example, it will re-use the training compute."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "test_experiment = Experiment(ws, experiment_name + \"_inference\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Retrieving forecasts from the model\n",
+ "We have created a function called `run_forecast` that submits the test data to the best model determined during the training run and retrieves forecasts. This function uses a helper script `forecasting_script` which is uploaded and expecuted on the remote compute."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To produce predictions on the test set, we need to know the feature values at all dates in the test set. This requirement is somewhat reasonable for the OJ sales data since the features mainly consist of price, which is usually set in advance, and customer demographics which are approximately constant for each store over the 20 week forecast horizon in the testing data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from run_forecast import run_remote_inference\n",
+ "\n",
+ "remote_run_infer = run_remote_inference(\n",
+ " test_experiment=test_experiment,\n",
+ " compute_target=compute_target,\n",
+ " train_run=best_run,\n",
+ " test_dataset=test_dataset,\n",
+ " target_column_name=target_column_name,\n",
+ ")\n",
+ "remote_run_infer.wait_for_completion(show_output=False)\n",
+ "\n",
+ "# download the forecast file to the local machine\n",
+ "remote_run_infer.download_file(\"outputs/predictions.csv\", \"predictions.csv\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Evaluate\n",
+ "\n",
+ "To evaluate the accuracy of the forecast, we'll compare against the actual sales quantities for some select metrics, included the mean absolute percentage error (MAPE). For more metrics that can be used for evaluation after training, please see [supported metrics](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#regressionforecasting-metrics), and [how to calculate residuals](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#residuals).\n",
+ "\n",
+ "We'll add predictions and actuals into a single dataframe for convenience in calculating the metrics."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# load forecast data frame\n",
+ "fcst_df = pd.read_csv(\"predictions.csv\", parse_dates=[time_column_name])\n",
+ "fcst_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.automl.core.shared import constants\n",
+ "from azureml.automl.runtime.shared.score import scoring\n",
+ "from matplotlib import pyplot as plt\n",
+ "\n",
+ "# use automl scoring module\n",
+ "scores = scoring.score_regression(\n",
+ " y_test=fcst_df[target_column_name],\n",
+ " y_pred=fcst_df[\"predicted\"],\n",
+ " metrics=list(constants.Metric.SCALAR_REGRESSION_SET),\n",
+ ")\n",
+ "\n",
+ "print(\"[Test data scores]\\n\")\n",
+ "for key, value in scores.items():\n",
+ " print(\"{}: {:.3f}\".format(key, value))\n",
+ "\n",
+ "# Plot outputs\n",
+ "%matplotlib inline\n",
+ "test_pred = plt.scatter(fcst_df[target_column_name], fcst_df[\"predicted\"], color=\"b\")\n",
+ "test_test = plt.scatter(\n",
+ " fcst_df[target_column_name], fcst_df[target_column_name], color=\"g\"\n",
+ ")\n",
+ "plt.legend(\n",
+ " (test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8\n",
+ ")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Operationalize"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "_Operationalization_ means getting the model into the cloud so that other can run it after you close the notebook. We will create a docker running on Azure Container Instances with the model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "description = \"AutoML OJ forecaster\"\n",
+ "tags = None\n",
+ "model = remote_run.register_model(\n",
+ " model_name=model_name, description=description, tags=tags\n",
+ ")\n",
+ "\n",
+ "print(remote_run.model_id)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Develop the scoring script\n",
+ "\n",
+ "For the deployment we need a function which will run the forecast on serialized data. It can be obtained from the best_run."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "script_file_name = \"score_fcast.py\"\n",
+ "best_run.download_file(\"outputs/scoring_file_v_1_0_0.py\", script_file_name)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Deploy the model as a Web Service on Azure Container Instance"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from azureml.core.model import InferenceConfig\n",
+ "from azureml.core.webservice import AciWebservice\n",
+ "from azureml.core.webservice import Webservice\n",
+ "from azureml.core.model import Model\n",
+ "\n",
+ "inference_config = InferenceConfig(\n",
+ " environment=best_run.get_environment(), entry_script=script_file_name\n",
+ ")\n",
+ "\n",
+ "aciconfig = AciWebservice.deploy_configuration(\n",
+ " cpu_cores=2,\n",
+ " memory_gb=4,\n",
+ " tags={\"type\": \"automl-forecasting\"},\n",
+ " description=\"Automl forecasting sample service\",\n",
+ ")\n",
+ "\n",
+ "aci_service_name = \"automl-oj-forecast-01\"\n",
+ "print(aci_service_name)\n",
+ "aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
+ "aci_service.wait_for_deployment(True)\n",
+ "print(aci_service.state)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "aci_service.get_logs()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Call the service"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "X_query = test.copy()\n",
+ "X_query.pop(target_column_name)\n",
+ "# We have to convert datetime to string, because Timestamps cannot be serialized to JSON.\n",
+ "X_query[time_column_name] = X_query[time_column_name].astype(str)\n",
+ "# The Service object accept the complex dictionary, which is internally converted to JSON string.\n",
+ "# The section 'data' contains the data frame in the form of dictionary.\n",
+ "sample_quantiles = [0.025, 0.975]\n",
+ "test_sample = json.dumps(\n",
+ " {\"data\": X_query.to_dict(orient=\"records\"), \"quantiles\": sample_quantiles}\n",
+ ")\n",
+ "response = aci_service.run(input_data=test_sample)\n",
+ "# translate from networkese to datascientese\n",
+ "try:\n",
+ " res_dict = json.loads(response)\n",
+ " y_fcst_all = pd.DataFrame(res_dict[\"index\"])\n",
+ " y_fcst_all[time_column_name] = pd.to_datetime(\n",
+ " y_fcst_all[time_column_name], unit=\"ms\"\n",
+ " )\n",
+ " y_fcst_all[\"forecast\"] = res_dict[\"forecast\"]\n",
+ " y_fcst_all[\"prediction_interval\"] = res_dict[\"prediction_interval\"]\n",
+ "except:\n",
+ " print(res_dict)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y_fcst_all.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Delete the web service if desired"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "serv = Webservice(ws, \"automl-oj-forecast-01\")\n",
+ "serv.delete() # don't do it accidentally"
+ ]
+ }
+ ],
+ "metadata": {
+ "authors": [
+ {
+ "name": "jialiu"
+ }
+ ],
+ "category": "tutorial",
+ "celltoolbar": "Raw Cell Format",
+ "compute": [
+ "Remote"
+ ],
+ "datasets": [
+ "Orange Juice Sales"
+ ],
+ "deployment": [
+ "Azure Container Instance"
+ ],
+ "exclude_from_index": false,
+ "framework": [
+ "Azure ML AutoML"
+ ],
+ "friendly_name": "Forecasting orange juice sales with deployment",
+ "index_order": 1,
+ "kernelspec": {
+ "display_name": "Python 3.6 - AzureML",
+ "language": "python",
+ "name": "python3-azureml"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.6.9"
+ },
+ "tags": [
+ "None"
+ ],
+ "task": "Forecasting"
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/auto-ml-forecasting-univariate-recipe-experiment-settings.ipynb b/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/auto-ml-forecasting-univariate-recipe-experiment-settings.ipynb
index 2e773fbdf..a5f18de72 100644
--- a/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/auto-ml-forecasting-univariate-recipe-experiment-settings.ipynb
+++ b/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/auto-ml-forecasting-univariate-recipe-experiment-settings.ipynb
@@ -1,494 +1,494 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Copyright (c) Microsoft Corporation. All rights reserved.\n",
- "\n",
- "Licensed under the MIT License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/1_determine_experiment_settings.png)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In this notebook we will explore the univaraite time-series data to determine the settings for an automated ML experiment. We will follow the thought process depicted in the following diagram:
\n",
- "![Forecasting after training](figures/univariate_settings_map_20210408.jpg)\n",
- "\n",
- "The objective is to answer the following questions:\n",
- "\n",
- "\n",
- " - Is there a seasonal pattern in the data?
\n",
- " \n",
- " - Importance: If we are able to detect regular seasonal patterns, the forecast accuracy may be improved by extracting these patterns and including them as features into the model.
\n",
- "
\n",
- " - Is the data stationary?
\n",
- " \n",
- " - Importance: In the absense of features that capture trend behavior, ML models (regression and tree based) are not well equiped to predict stochastic trends. Working with stationary data solves this problem.
\n",
- "
\n",
- " - Is there a detectable auto-regressive pattern in the stationary data?
\n",
- " \n",
- " - Importance: The accuracy of ML models can be improved if serial correlation is modeled by including lags of the dependent/target varaible as features. Including target lags in every experiment by default will result in a regression in accuracy scores if such setting is not warranted.
\n",
- "
\n",
- "
\n",
- "\n",
- "The answers to these questions will help determine the appropriate settings for the automated ML experiment.\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "import os\n",
- "import warnings\n",
- "import pandas as pd\n",
- "\n",
- "from statsmodels.graphics.tsaplots import plot_acf, plot_pacf\n",
- "import matplotlib.pyplot as plt\n",
- "from pandas.plotting import register_matplotlib_converters\n",
- "\n",
- "register_matplotlib_converters() # fixes the future warning issue\n",
- "\n",
- "from helper_functions import unit_root_test_wrapper\n",
- "from statsmodels.tools.sm_exceptions import InterpolationWarning\n",
- "\n",
- "warnings.simplefilter(\"ignore\", InterpolationWarning)\n",
- "\n",
- "\n",
- "# set printing options\n",
- "pd.set_option(\"display.max_columns\", 500)\n",
- "pd.set_option(\"display.width\", 1000)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# load data\n",
- "main_data_loc = \"data\"\n",
- "train_file_name = \"S4248SM144SCEN.csv\"\n",
- "\n",
- "TARGET_COLNAME = \"S4248SM144SCEN\"\n",
- "TIME_COLNAME = \"observation_date\"\n",
- "COVID_PERIOD_START = \"2020-03-01\"\n",
- "\n",
- "df = pd.read_csv(os.path.join(main_data_loc, train_file_name))\n",
- "df[TIME_COLNAME] = pd.to_datetime(df[TIME_COLNAME], format=\"%Y-%m-%d\")\n",
- "df.sort_values(by=TIME_COLNAME, inplace=True)\n",
- "df.set_index(TIME_COLNAME, inplace=True)\n",
- "df.head(2)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# plot the entire dataset\n",
- "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
- "ax.plot(df)\n",
- "ax.title.set_text(\"Original Data Series\")\n",
- "locs, labels = plt.xticks()\n",
- "plt.xticks(rotation=45)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "The graph plots the alcohol sales in the United States. Because the data is trending, it can be difficult to see cycles, seasonality or other interestng behaviors due to the scaling issues. For example, if there is a seasonal pattern, which we will discuss later, we cannot see them on the trending data. In such case, it is worth plotting the same data in first differences."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# plot the entire dataset in first differences\n",
- "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
- "ax.plot(df.diff().dropna())\n",
- "ax.title.set_text(\"Data in first differences\")\n",
- "locs, labels = plt.xticks()\n",
- "plt.xticks(rotation=45)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In the previous plot we observe that the data is more volatile towards the end of the series. This period coincides with the Covid-19 period, so we will exclude it from our experiment. Since in this example there are no user-provided features it is hard to make an argument that a model trained on the less volatile pre-covid data will be able to accurately predict the covid period."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# 1. Seasonality\n",
- "\n",
- "#### Questions that need to be answered in this section:\n",
- "1. Is there a seasonality?\n",
- "2. If it's seasonal, does the data exhibit a trend (up or down)?\n",
- "\n",
- "It is hard to visually detect seasonality when the data is trending. The reason being is scale of seasonal fluctuations is dwarfed by the range of the trend in the data. One way to deal with this is to de-trend the data by taking the first differences. We will discuss this in more detail in the next section."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# plot the entire dataset in first differences\n",
- "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
- "ax.plot(df.diff().dropna())\n",
- "ax.title.set_text(\"Data in first differences\")\n",
- "locs, labels = plt.xticks()\n",
- "plt.xticks(rotation=45)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "For the next plot, we will exclude the Covid period again. We will also shorten the length of data because plotting a very long time series may prevent us from seeing seasonal patterns, if there are any, because the plot may look like a random walk."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# remove COVID period\n",
- "df = df[:COVID_PERIOD_START]\n",
- "\n",
- "# plot the entire dataset in first differences\n",
- "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
- "ax.plot(df[\"2015-01-01\":].diff().dropna())\n",
- "ax.title.set_text(\"Data in first differences\")\n",
- "locs, labels = plt.xticks()\n",
- "plt.xticks(rotation=45)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- " Conclusion
\n",
- "\n",
- "Visual examination does not suggest clear seasonal patterns. We will set the STL_TYPE = None, and we will move to the next section that examines stationarity. \n",
- "\n",
- "\n",
- "Say, we are working with a different data set that shows clear patterns of seasonality, we have several options for setting the settings:is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. \n",
- "\n",
- " - If the data does not appear to be trending, set DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"season\"
\n",
- " - If the data appears to be trending, consider one of the following two settings:\n",
- "
\n",
- " \n",
- " - DIFFERENCE_SERIES=True, TARGET_LAGS=None and STL_TYPE = \"season\", or
\n",
- " - DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"trend_season\"
\n",
- "
\n",
- " - In the first case, by taking first differences we are removing stochastic trend, but we do not remove seasonal patterns. In the second case, we do not remove the stochastic trend and it can be captured by the trend component of the STL decomposition. It is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts.
\n",
- "
\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# 2. Stationarity\n",
- "If the data does not exhibit seasonal patterns, we would like to see if the data is non-stationary. Particularly, we want to see if there is a clear trending behavior. If such behavior is observed, we would like to first difference the data and examine the plot of an auto-correlation function (ACF) known as correlogram. If the data is seasonal, differencing it will not get rid off the seasonality and this will be shown on the correlogram as well.\n",
- "\n",
- "\n",
- " - Question: What is stationarity and how to we detect it?
\n",
- " \n",
- " - This is a fairly complex topic. Please read the following link for a high level discussion on this subject.
\n",
- " - Simply put, we are looking for scenario when examining the time series plots the mean of the series is roughly the same, regardless which time interval you pick to compute it. Thus, trending and seasonal data are examples of non-stationary series.
\n",
- "
\n",
- "
\n",
- "\n",
- "\n",
- "\n",
- " - Question: Why do want to work with stationary data?
\n",
- " \n",
- " - In the absence of features that capture stochastic trends, the ML models that use (deterministic) time based features (hour of the day, day of the week, month of the year, etc) cannot capture such trends, and will over or under predict depending on the behavior of the time series. By working with stationary data, we eliminate the need to predict such trends, which improves the forecast accuracy. Classical time series models such as Arima and Exponential Smoothing handle non-stationary series by design and do not need such transformations. By differencing the data we are still able to run the same family of models.
\n",
- "
\n",
- "
\n",
- "\n",
- "#### Questions that need to be answered in this section:\n",
- " \n",
- " - Is the data stationary?
\n",
- " - Does the stationarized data (either the original or the differenced series) exhibit a clear auto-regressive pattern?
\n",
- "
\n",
- "\n",
- "To answer the first question, we run a series of tests (we call them unit root tests)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# unit root tests\n",
- "test = unit_root_test_wrapper(df[TARGET_COLNAME])\n",
- "print(\"---------------\", \"\\n\")\n",
- "print(\"Summary table\", \"\\n\", test[\"summary\"], \"\\n\")\n",
- "print(\"Is the {} series stationary?: {}\".format(TARGET_COLNAME, test[\"stationary\"]))\n",
- "print(\"---------------\", \"\\n\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "In the previous cell, we ran a series of unit root tests. The summary table contains the following columns:\n",
- " \n",
- " - test_name is the name of the test.\n",
- "
\n",
- " - ADF: Augmented Dickey-Fuller test
\n",
- " - KPSS: Kwiatkowski-Phillips\u00e2\u20ac\u201cSchmidt\u00e2\u20ac\u201cShin test
\n",
- " - PP: Phillips-Perron test\n",
- "
- ADF GLS: Augmented Dickey-Fuller using generalized least squares method
\n",
- " - AZ: Andrews-Zivot test
\n",
- "
\n",
- " - statistic: test statistic
\n",
- " - crit_val: critical value of the test statistic
\n",
- " - p_val: p-value of the test statistic. If the p-val is less than 0.05, the null hypothesis is rejected.
\n",
- " - stationary: is the series stationary based on the test result?
\n",
- " - Null hypothesis: what is being tested. Notice, some test such as ADF and PP assume the process has a unit root and looks for evidence to reject this hypothesis. Other tests, ex.g: KPSS, assumes the process is stationary and looks for evidence to reject such claim.\n",
- "
\n",
- "\n",
- "Each of the tests shows that the original time series is non-stationary. The final decision is based on the majority rule. If, there is a split decision, the algorithm will claim it is stationary. We run a series of tests because each test by itself may not be accurate. In many cases when there are conflicting test results, the user needs to make determination if the series is stationary or not.\n",
- "\n",
- "Since we found the series to be non-stationary, we will difference it and then test if the differenced series is stationary."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# unit root tests\n",
- "test = unit_root_test_wrapper(df[TARGET_COLNAME].diff().dropna())\n",
- "print(\"---------------\", \"\\n\")\n",
- "print(\"Summary table\", \"\\n\", test[\"summary\"], \"\\n\")\n",
- "print(\"Is the {} series stationary?: {}\".format(TARGET_COLNAME, test[\"stationary\"]))\n",
- "print(\"---------------\", \"\\n\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Four out of five tests show that the series in first differences is stationary. Notice that this decision is not unanimous. Next, let's plot the original series in first-differences to illustrate the difference between non-stationary (unit root) process vs the stationary one."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# plot original and stationary data\n",
- "fig = plt.figure(figsize=(10, 10))\n",
- "ax1 = fig.add_subplot(211)\n",
- "ax1.plot(df[TARGET_COLNAME], \"-b\")\n",
- "ax2 = fig.add_subplot(212)\n",
- "ax2.plot(df[TARGET_COLNAME].diff().dropna(), \"-b\")\n",
- "ax1.title.set_text(\"Original data\")\n",
- "ax2.title.set_text(\"Data in first differences\")"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "If you were asked a question \"What is the mean of the series before and after 2008?\", for the series titled \"Original data\" the mean values will be significantly different. This implies that the first moment of the series (in this case, it is the mean) is time dependent, i.e., mean changes depending on the interval one is looking at. Thus, the series is deemed to be non-stationary. On the other hand, for the series titled \"Data in first differences\" the means for both periods are roughly the same. Hence, the first moment is time invariant; meaning it does not depend on the interval of time one is looking at. In this example it is easy to visually distinguish between stationary and non-stationary data. Often this distinction is not easy to make, therefore we rely on the statistical tests described above to help us make an informed decision. "
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- " Conclusion
\n",
- "Since we found the original process to be non-stationary (contains unit root), we will have to model the data in first differences. As a result, we will set the DIFFERENCE_SERIES parameter to True."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# 3 Check if there is a clear autoregressive pattern\n",
- "We need to determine if we should include lags of the target variable as features in order to improve forecast accuracy. To do this, we will examine the ACF and partial ACF (PACF) plots of the stationary series. In our case, it is a series in first diffrences.\n",
- "\n",
- "\n",
- " - Question: What is an Auto-regressive pattern? What are we looking for?
\n",
- " \n",
- " - We are looking for a classical profiles for an AR(p) process such as an exponential decay of an ACF and a the first $p$ significant lags of the PACF. For a more detailed explanation of ACF and PACF please refer to the appendix at the end of this notebook. For illustration purposes, let's examine the ACF/PACF profiles of the simulated data that follows a second order auto-regressive process, abbreviated as an AR(2).
\n",
- " - \n",
- "
\n",
- " The lag order is on the x-axis while the auto- and partial-correlation coefficients are on the y-axis. Vertical lines that are outside the shaded area represent statistically significant lags. Notice, the ACF function decays to zero and the PACF shows 2 significant spikes (we ignore the first spike for lag 0 in both plots since the linear relationship of any series with itself is always 1). \n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- " - Question: What do I do if I observe an auto-regressive behavior?
\n",
- " \n",
- " - If such behavior is observed, we might improve the forecast accuracy by enabling the target lags feature in AutoML. There are a few options of doing this
\n",
- " \n",
- " - Set the target lags parameter to 'auto', or
\n",
- " - Specify the list of lags you want to include. Ex.g: target_lags = [1,2,5]
\n",
- "
\n",
- "
\n",
- "
\n",
- " - Next, let's examine the ACF and PACF plots of the stationary target variable (depicted below). Here, we do not see a decay in the ACF, instead we see a decay in PACF. It is hard to make an argument the the target variable exhibits auto-regressive behavior.
\n",
- "
"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": [
- "# Plot the ACF/PACF for the series in differences\n",
- "fig, ax = plt.subplots(1, 2, figsize=(10, 5))\n",
- "plot_acf(df[TARGET_COLNAME].diff().dropna().values.squeeze(), ax=ax[0])\n",
- "plot_pacf(df[TARGET_COLNAME].diff().dropna().values.squeeze(), ax=ax[1])\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- " Conclusion
\n",
- "Since we do not see a clear indication of an AR(p) process, we will not be using target lags and will set the TARGET_LAGS parameter to None."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- " AutoML Experiment Settings
\n",
- "Based on the analysis performed, we should try the following settings for the AutoML experiment and use them in the \"2_run_experiment\" notebook.\n",
- "\n",
- " - STL_TYPE=None
\n",
- " - DIFFERENCE_SERIES=True
\n",
- " - TARGET_LAGS=None
\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "# Appendix: ACF, PACF and Lag Selection\n",
- "To do this, we will examine the ACF and partial ACF (PACF) plots of the differenced series. \n",
- "\n",
- "\n",
- " - Question: What is the ACF?
\n",
- " \n",
- " - To understand the ACF, first let's look at the correlation coefficient $\\rho_{xz}$\n",
- " \\begin{equation}\n",
- " \\rho_{xz} = \\frac{\\sigma_{xz}}{\\sigma_{x} \\sigma_{zy}}\n",
- " \\end{equation}\n",
- "
\n",
- " where $\\sigma_{xzy}$ is the covariance between two random variables $X$ and $Z$; $\\sigma_x$ and $\\sigma_z$ is the variance for $X$ and $Z$, respectively. The correlation coefficient measures the strength of linear relationship between two random variables. This metric can take any value from -1 to 1. \n",
- "
\n",
- " - The auto-correlation coefficient $\\rho_{Y_{t} Y_{t-k}}$ is the time series equivalent of the correlation coefficient, except instead of measuring linear association between two random variables $X$ and $Z$, it measures the strength of a linear relationship between a random variable $Y_t$ and its lag $Y_{t-k}$ for any positive interger value of $k$.
\n",
- "
\n",
- " - To visualize the ACF for a particular lag, say lag 2, plot the second lag of a series $y_{t-2}$ on the x-axis, and plot the series itself $y_t$ on the y-axis. The autocorrelation coefficient is the slope of the best fitted regression line and can be interpreted as follows. A one unit increase in the lag of a variable one period ago leads to a $\\rho_{Y_{t} Y_{t-2}}$ units change in the variable in the current period. This interpreation can be applied to any lag.
\n",
- "
\n",
- " - In the interpretation posted above we need to be careful not to confuse the word \"leads\" with \"causes\" since these are not the same thing. We do not know the lagged value of the varaible causes it to change. Afterall, there are probably many other features that may explain the movement in $Y_t$. All we are trying to do in this section is to identify situations when the variable contains the strong auto-regressive components that needs to be included in the model to improve forecast accuracy.
\n",
- "
\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- " - Question: What is the PACF?
\n",
- " \n",
- " - When describing the ACF we essentially running a regression between a partigular lag of a series, say, lag 4, and the series itself. What this implies is the regression coefficient for lag 4 captures the impact of everything that happens in lags 1, 2 and 3. In other words, if lag 1 is the most important lag and we exclude it from the regression, naturally, the regression model will assign the importance of the 1st lag to the 4th one. Partial auto-correlation function fixes this problem since it measures the contribution of each lag accounting for the information added by the intermediary lags. If we were to illustrate ACF and PACF for the fourth lag using the regression analogy, the difference is a follows: \n",
- " \\begin{align}\n",
- " Y_{t} &= a_{0} + a_{4} Y_{t-4} + e_{t} \\\\\n",
- " Y_{t} &= b_{0} + b_{1} Y_{t-1} + b_{2} Y_{t-2} + b_{3} Y_{t-3} + b_{4} Y_{t-4} + \\varepsilon_{t} \\\\\n",
- " \\end{align}\n",
- "
\n",
- "
\n",
- " - \n",
- " Here, you can think of $a_4$ and $b_{4}$ as the auto- and partial auto-correlation coefficients for lag 4. Notice, in the second equation we explicitely accounting for the intermediate lags by adding them as regrerssors.\n",
- "
\n",
- "
\n",
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "\n",
- " - Question: Auto-regressive pattern? What are we looking for?
\n",
- " \n",
- " - We are looking for a classical profiles for an AR(p) process such as an exponential decay of an ACF and a the first $p$ significant lags of the PACF. Let's examine the ACF/PACF profiles of the same simulated AR(2) shown in Section 3, and check if the ACF/PACF explanation are refelcted in these plots.
\n",
- " - \n",
- "
- The autocorrelation coefficient for the 3rd lag is 0.6, which can be interpreted that a one unit increase in the value of the target varaible three periods ago leads to 0.6 units increase in the current period. However, the PACF plot shows that the partial autocorrealtion coefficient is zero (from a statistical point of view since it lies within the shaded region). This is happening because the 1st and 2nd lags are good predictors of the target variable. Ommiting these two lags from the regression results in the misleading conclusion that the third lag is a good prediciton.
\n",
- "
\n",
- " - This is why it is important to examine both the ACF and the PACF plots when tring to determine the auto regressive order for the variable in question.
\n",
- "
\n",
- "
"
- ]
- }
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Copyright (c) Microsoft Corporation. All rights reserved.\n",
+ "\n",
+ "Licensed under the MIT License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/automated-machine-learning/forecasting-recipes-univariate/1_determine_experiment_settings.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this notebook we will explore the univaraite time-series data to determine the settings for an automated ML experiment. We will follow the thought process depicted in the following diagram:
\n",
+ "![Forecasting after training](figures/univariate_settings_map_20210408.jpg)\n",
+ "\n",
+ "The objective is to answer the following questions:\n",
+ "\n",
+ "\n",
+ " - Is there a seasonal pattern in the data?
\n",
+ " \n",
+ " - Importance: If we are able to detect regular seasonal patterns, the forecast accuracy may be improved by extracting these patterns and including them as features into the model.
\n",
+ "
\n",
+ " - Is the data stationary?
\n",
+ " \n",
+ " - Importance: In the absense of features that capture trend behavior, ML models (regression and tree based) are not well equiped to predict stochastic trends. Working with stationary data solves this problem.
\n",
+ "
\n",
+ " - Is there a detectable auto-regressive pattern in the stationary data?
\n",
+ " \n",
+ " - Importance: The accuracy of ML models can be improved if serial correlation is modeled by including lags of the dependent/target varaible as features. Including target lags in every experiment by default will result in a regression in accuracy scores if such setting is not warranted.
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "The answers to these questions will help determine the appropriate settings for the automated ML experiment.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import warnings\n",
+ "import pandas as pd\n",
+ "\n",
+ "from statsmodels.graphics.tsaplots import plot_acf, plot_pacf\n",
+ "import matplotlib.pyplot as plt\n",
+ "from pandas.plotting import register_matplotlib_converters\n",
+ "\n",
+ "register_matplotlib_converters() # fixes the future warning issue\n",
+ "\n",
+ "from helper_functions import unit_root_test_wrapper\n",
+ "from statsmodels.tools.sm_exceptions import InterpolationWarning\n",
+ "\n",
+ "warnings.simplefilter(\"ignore\", InterpolationWarning)\n",
+ "\n",
+ "\n",
+ "# set printing options\n",
+ "pd.set_option(\"display.max_columns\", 500)\n",
+ "pd.set_option(\"display.width\", 1000)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# load data\n",
+ "main_data_loc = \"data\"\n",
+ "train_file_name = \"S4248SM144SCEN.csv\"\n",
+ "\n",
+ "TARGET_COLNAME = \"S4248SM144SCEN\"\n",
+ "TIME_COLNAME = \"observation_date\"\n",
+ "COVID_PERIOD_START = \"2020-03-01\"\n",
+ "\n",
+ "df = pd.read_csv(os.path.join(main_data_loc, train_file_name))\n",
+ "df[TIME_COLNAME] = pd.to_datetime(df[TIME_COLNAME], format=\"%Y-%m-%d\")\n",
+ "df.sort_values(by=TIME_COLNAME, inplace=True)\n",
+ "df.set_index(TIME_COLNAME, inplace=True)\n",
+ "df.head(2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# plot the entire dataset\n",
+ "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
+ "ax.plot(df)\n",
+ "ax.title.set_text(\"Original Data Series\")\n",
+ "locs, labels = plt.xticks()\n",
+ "plt.xticks(rotation=45)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The graph plots the alcohol sales in the United States. Because the data is trending, it can be difficult to see cycles, seasonality or other interestng behaviors due to the scaling issues. For example, if there is a seasonal pattern, which we will discuss later, we cannot see them on the trending data. In such case, it is worth plotting the same data in first differences."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# plot the entire dataset in first differences\n",
+ "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
+ "ax.plot(df.diff().dropna())\n",
+ "ax.title.set_text(\"Data in first differences\")\n",
+ "locs, labels = plt.xticks()\n",
+ "plt.xticks(rotation=45)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the previous plot we observe that the data is more volatile towards the end of the series. This period coincides with the Covid-19 period, so we will exclude it from our experiment. Since in this example there are no user-provided features it is hard to make an argument that a model trained on the less volatile pre-covid data will be able to accurately predict the covid period."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 1. Seasonality\n",
+ "\n",
+ "#### Questions that need to be answered in this section:\n",
+ "1. Is there a seasonality?\n",
+ "2. If it's seasonal, does the data exhibit a trend (up or down)?\n",
+ "\n",
+ "It is hard to visually detect seasonality when the data is trending. The reason being is scale of seasonal fluctuations is dwarfed by the range of the trend in the data. One way to deal with this is to de-trend the data by taking the first differences. We will discuss this in more detail in the next section."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# plot the entire dataset in first differences\n",
+ "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
+ "ax.plot(df.diff().dropna())\n",
+ "ax.title.set_text(\"Data in first differences\")\n",
+ "locs, labels = plt.xticks()\n",
+ "plt.xticks(rotation=45)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "For the next plot, we will exclude the Covid period again. We will also shorten the length of data because plotting a very long time series may prevent us from seeing seasonal patterns, if there are any, because the plot may look like a random walk."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# remove COVID period\n",
+ "df = df[:COVID_PERIOD_START]\n",
+ "\n",
+ "# plot the entire dataset in first differences\n",
+ "fig, ax = plt.subplots(figsize=(6, 2), dpi=180)\n",
+ "ax.plot(df[\"2015-01-01\":].diff().dropna())\n",
+ "ax.title.set_text(\"Data in first differences\")\n",
+ "locs, labels = plt.xticks()\n",
+ "plt.xticks(rotation=45)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Conclusion
\n",
+ "\n",
+ "Visual examination does not suggest clear seasonal patterns. We will set the STL_TYPE = None, and we will move to the next section that examines stationarity. \n",
+ "\n",
+ "\n",
+ "Say, we are working with a different data set that shows clear patterns of seasonality, we have several options for setting the settings:is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts. \n",
+ "\n",
+ " - If the data does not appear to be trending, set DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"season\"
\n",
+ " - If the data appears to be trending, consider one of the following two settings:\n",
+ "
\n",
+ " \n",
+ " - DIFFERENCE_SERIES=True, TARGET_LAGS=None and STL_TYPE = \"season\", or
\n",
+ " - DIFFERENCE_SERIES=False, TARGET_LAGS=None and STL_TYPE = \"trend_season\"
\n",
+ "
\n",
+ " - In the first case, by taking first differences we are removing stochastic trend, but we do not remove seasonal patterns. In the second case, we do not remove the stochastic trend and it can be captured by the trend component of the STL decomposition. It is hard to say which option will work best in your case, hence you will need to run both options to see which one results in more accurate forecasts.
\n",
+ "
\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 2. Stationarity\n",
+ "If the data does not exhibit seasonal patterns, we would like to see if the data is non-stationary. Particularly, we want to see if there is a clear trending behavior. If such behavior is observed, we would like to first difference the data and examine the plot of an auto-correlation function (ACF) known as correlogram. If the data is seasonal, differencing it will not get rid off the seasonality and this will be shown on the correlogram as well.\n",
+ "\n",
+ "\n",
+ " - Question: What is stationarity and how to we detect it?
\n",
+ " \n",
+ " - This is a fairly complex topic. Please read the following link for a high level discussion on this subject.
\n",
+ " - Simply put, we are looking for scenario when examining the time series plots the mean of the series is roughly the same, regardless which time interval you pick to compute it. Thus, trending and seasonal data are examples of non-stationary series.
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "\n",
+ "\n",
+ " - Question: Why do want to work with stationary data?
\n",
+ " \n",
+ " - In the absence of features that capture stochastic trends, the ML models that use (deterministic) time based features (hour of the day, day of the week, month of the year, etc) cannot capture such trends, and will over or under predict depending on the behavior of the time series. By working with stationary data, we eliminate the need to predict such trends, which improves the forecast accuracy. Classical time series models such as Arima and Exponential Smoothing handle non-stationary series by design and do not need such transformations. By differencing the data we are still able to run the same family of models.
\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "#### Questions that need to be answered in this section:\n",
+ " \n",
+ " - Is the data stationary?
\n",
+ " - Does the stationarized data (either the original or the differenced series) exhibit a clear auto-regressive pattern?
\n",
+ "
\n",
+ "\n",
+ "To answer the first question, we run a series of tests (we call them unit root tests)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# unit root tests\n",
+ "test = unit_root_test_wrapper(df[TARGET_COLNAME])\n",
+ "print(\"---------------\", \"\\n\")\n",
+ "print(\"Summary table\", \"\\n\", test[\"summary\"], \"\\n\")\n",
+ "print(\"Is the {} series stationary?: {}\".format(TARGET_COLNAME, test[\"stationary\"]))\n",
+ "print(\"---------------\", \"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In the previous cell, we ran a series of unit root tests. The summary table contains the following columns:\n",
+ " \n",
+ " - test_name is the name of the test.\n",
+ "
\n",
+ " - ADF: Augmented Dickey-Fuller test
\n",
+ " - KPSS: Kwiatkowski-Phillips–Schmidt–Shin test
\n",
+ " - PP: Phillips-Perron test\n",
+ "
- ADF GLS: Augmented Dickey-Fuller using generalized least squares method
\n",
+ " - AZ: Andrews-Zivot test
\n",
+ "
\n",
+ " - statistic: test statistic
\n",
+ " - crit_val: critical value of the test statistic
\n",
+ " - p_val: p-value of the test statistic. If the p-val is less than 0.05, the null hypothesis is rejected.
\n",
+ " - stationary: is the series stationary based on the test result?
\n",
+ " - Null hypothesis: what is being tested. Notice, some test such as ADF and PP assume the process has a unit root and looks for evidence to reject this hypothesis. Other tests, ex.g: KPSS, assumes the process is stationary and looks for evidence to reject such claim.\n",
+ "
\n",
+ "\n",
+ "Each of the tests shows that the original time series is non-stationary. The final decision is based on the majority rule. If, there is a split decision, the algorithm will claim it is stationary. We run a series of tests because each test by itself may not be accurate. In many cases when there are conflicting test results, the user needs to make determination if the series is stationary or not.\n",
+ "\n",
+ "Since we found the series to be non-stationary, we will difference it and then test if the differenced series is stationary."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# unit root tests\n",
+ "test = unit_root_test_wrapper(df[TARGET_COLNAME].diff().dropna())\n",
+ "print(\"---------------\", \"\\n\")\n",
+ "print(\"Summary table\", \"\\n\", test[\"summary\"], \"\\n\")\n",
+ "print(\"Is the {} series stationary?: {}\".format(TARGET_COLNAME, test[\"stationary\"]))\n",
+ "print(\"---------------\", \"\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Four out of five tests show that the series in first differences is stationary. Notice that this decision is not unanimous. Next, let's plot the original series in first-differences to illustrate the difference between non-stationary (unit root) process vs the stationary one."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# plot original and stationary data\n",
+ "fig = plt.figure(figsize=(10, 10))\n",
+ "ax1 = fig.add_subplot(211)\n",
+ "ax1.plot(df[TARGET_COLNAME], \"-b\")\n",
+ "ax2 = fig.add_subplot(212)\n",
+ "ax2.plot(df[TARGET_COLNAME].diff().dropna(), \"-b\")\n",
+ "ax1.title.set_text(\"Original data\")\n",
+ "ax2.title.set_text(\"Data in first differences\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you were asked a question \"What is the mean of the series before and after 2008?\", for the series titled \"Original data\" the mean values will be significantly different. This implies that the first moment of the series (in this case, it is the mean) is time dependent, i.e., mean changes depending on the interval one is looking at. Thus, the series is deemed to be non-stationary. On the other hand, for the series titled \"Data in first differences\" the means for both periods are roughly the same. Hence, the first moment is time invariant; meaning it does not depend on the interval of time one is looking at. In this example it is easy to visually distinguish between stationary and non-stationary data. Often this distinction is not easy to make, therefore we rely on the statistical tests described above to help us make an informed decision. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " Conclusion
\n",
+ "Since we found the original process to be non-stationary (contains unit root), we will have to model the data in first differences. As a result, we will set the DIFFERENCE_SERIES parameter to True."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 3 Check if there is a clear autoregressive pattern\n",
+ "We need to determine if we should include lags of the target variable as features in order to improve forecast accuracy. To do this, we will examine the ACF and partial ACF (PACF) plots of the stationary series. In our case, it is a series in first diffrences.\n",
+ "\n",
+ "