Merge pull request #1183 from facebookexperimental/robynpy_release

Introducing Robynpy - Robyn powered by Python.
facebookexperimental · Dec 9, 2024 · 368806e · 368806e
2 parents c8bb458 + 988913d
commit 368806e
Show file tree

Hide file tree

Showing 104 changed files with 116,079 additions and 0 deletions.
diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml
@@ -0,0 +1,63 @@
+# This workflow will install Python dependencies, run tests and lint with a single version of Python
+# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
+# A continuous integration (CI) workflow to build and test Robyn Python project
+
+name: Robyn Python application
+
+on:
+  push:
+    branches: ['robynpy_release']
+  pull_request:
+    branches: ['robynpy_release']
+
+permissions:
+  contents: read
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+
+    strategy:
+      matrix:
+        python-version: ['3.10']
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: 'pip'
+      - name: Display Python version
+        run: python -c "import sys; print(sys.version)"
+
+      - name: updating PATH to enable importing robyn modules
+        run: |
+          echo "PYTHONPATH=$PYTHONPATH:$(pwd)/python/src" >> $GITHUB_ENV
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install flake8 pytest pytest-cov
+          if [ -f python/requirements.txt ]; then pip install -r python/requirements.txt; fi
+      - name: Lint with flake8
+        run: |
+          # stop the build if there are Python syntax errors or undefined names
+          flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics --exclude=./robyn_api/*.py
+          # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
+          flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics --exclude=./robyn_api/*.py
+      - name: Test with pytest. Enable this once first set of tests are written
+        run: |
+          pytest ./python/tests --doctest-modules --junitxml=junit/test-results.xml --cov=robyn --cov-report=html
+      - name: 'Upload Unit Test Results'
+        uses: actions/upload-artifact@v4
+        with:
+          name: robynpy-output-artifact
+          path: junit/test-results.xml
+          retention-days: 30
+      - name: Upload Coverage Report
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: htmlcov
+          retention-days: 30
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
 .DS_Store
 .Rproj.user
 .Rhistory
+.venv/
+robynpy.egg-info/
 node_modules/
 RobynApp.Rcheck/00_pkg_src/RobynApp/R/ui.R
 RobynApp.Rcheck/00_pkg_src/RobynApp/README.md
@@ -31,3 +33,22 @@ RobynApp.Rcheck/RobynApp/R/RobynApp.rdb
 RobynApp.Rcheck/RobynApp/R/RobynApp.rdx
 RobynApp_1.0.0.tar.gz
 Robyn_Fork.Rproj
+python/src/tutorials/demo.py
+python/.venv
+python/**/.venv
+python/dist
+python/**/__pycache__
+python/.vscode
+python/src/robynpy.egg-info*
+python/oldportedcode
+python/src/tutorials/mytestenv
+*.log
+python/src/tutorials/test_modeling.py
+python/src/tutorials/data/*
+python/src/tutorials/test_modeling.py
+python/src/tutorials/data/R/*
+python/src/tutorials/data/*
+*.pkl
+python/src/robyn/_deprecate/*
+python/src/robyn/tutorials/output/*
+python/src/robyn/debug/*
diff --git a/python/LICENSE b/python/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) Meta Platforms, Inc. and its affiliates.
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/python/README.md b/python/README.md
@@ -0,0 +1,70 @@
+# Robyn: Continuous & Semi-Automated MMM <img src='R/man/figures/logo.png' align="right" height="139px" />
+### The Open Source Marketing Mix Model Package from Meta Marketing Science
+
+<!-- [![Pypi\_Status\_Badge](https://www.r-pkg.org/badges/version/Robyn)](https://cran.r-project.org/package=Robyn) [![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/Robyn?color=green)](https://cranlogs.r-pkg.org/badges/grand-total/Robyn?color=green) [![Site](https://img.shields.io/badge/site-Robyn-blue.svg)](https://facebookexperimental.github.io/Robyn/) [![Facebook](https://img.shields.io/badge/group-Facebook-blue.svg)](https://www.facebook.com/groups/robynmmm/) [![CodeFactor](https://www.codefactor.io/repository/github/facebookexperimental/robyn/badge)](https://www.codefactor.io/repository/github/facebookexperimental/robyn) -->
+---
+
+## Introduction
+
+  * **What is Robyn?**: Robyn is an experimental, semi-automated and open-sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. It uses various machine learning techniques (Ridge regression, multi-objective evolutionary algorithm for hyperparameter optimization, time-series decomposition for trend & season, gradient-based optimization for budget allocation, clustering, etc.) to define media channel efficiency and effectivity, explore adstock rates and saturation curves. It's built for granular datasets with many independent variables and therefore especially suitable for digital and direct response advertisers with rich data sources. 
+
+  * **Why are we doing this?**: MMM used to be a resource-intensive technique that was only affordable for "big players". As the privacy needs of the measurement landscape evolve, there's a clear trend of increasing demand for modern MMM as a privacy-safe solution. At Meta Marketing Science, our mission is to help all businesses grow by transforming marketing practices grounded in data and science. It's highly aligned with our mission to democratizing MMM and making it accessible for advertisers of all sizes. With Project Robyn, we want to contribute to the measurement landscape, inspire the industry and build a community for exchange and innovation around the future of MMM and Marketing Science in general.
+
+## Quick start for Python (Beta)
+
+  The Python version of Robyn is rewritten from Robyn's R package version `3.11.1` to Python using object oriented programming principles and modular architecture for a robust solution. It was developed by utilizing various LLMs and AI workflows. As is common with any AI-based solutions, there may be potential challenges in translating code from one language to another.
+  In this case, we anticipate that there could be some issues in the translation from R to Python. However, we believe in the power of community collaboration and open-source contribution. Therefore, we are opening this project to the community to participate and contribute.
+  Together, we can address and resolve any issues that may arise, enhancing the functionality and efficiency of the Python version of Robyn. We look forward to your contributions and to the continuous improvement of this project.
+
+**1. Installing the package**
+
+  * Install Robyn latest package version:
+```{r}
+## Pypi
+pip3 install robynpy
+
+## DEV VERSION
+# if you are pulling source from github, install dependencies using requirements.txt
+pip3 install -r requirements.txt
+```
+
+**2. Getting started**
+
+  * python/src/robyn/tutorials contains tutorials for most common scenarios. Tutorials use simulated dataset provided in the package.
+
+  * There are two ways of running Python Robyn; one is `tutorial1.ipynb` and second is `tutorial1_src.ipynb`.
+
+**3. Running end-to-end**
+
+Option 1:
+  * `tutorial1.ipynb` is the main notebook that runs the end-to-end flow. It is designed for majority of the users who would prefer a one click solution that runs the robyn flow end-to-end with minimal knowledge of the underlying logic. It should run without any changes required if you wish to use the simulated dataset for testing purposes. 
+
+  * This notebook uses APIs available in `python/src/robyn/robyn.py` to set the configs, run feature engineering, run model training, evaluate models with clustering, generate one pagers and perform budget allocation.
+
+  * Change any of the configs directly in the notebook and avoid changes to robyn.py for what can be configurable.
+
+Option 2:
+  * `tutorial1_src.ipynb` runs the end-to-end flow of robyn python but with a lot more flexibility. It is designed for users who would like to have more control over which modules are and aren't run (ie. skipping clustering/one pager plots/budget allocation etc.). It should run without any changes required if you wish to use the simulated dataset for testing purposes. 
+
+  * This notebook doesn't use APIs available in `python/src/robyn/robyn.py` but instead, calls the modules directly with the appropriate parameters. In this way, it is more flexible but still expects the users to understand the underlying logic that may change when using various parameter values.
+
+## Helpful Links
+
+  * Visit our [website](https://facebookexperimental.github.io/Robyn/) to explore more details about Project Robyn.
+
+  * Join our [public group](https://www.facebook.com/groups/robyn/) to exchange with other users and interact with team Robyn.
+
+  * Take Meta's [official Robyn blueprint course](https://www.facebookblueprint.com/student/path/253121-marketing-mix-models?utm_source=readme) online 
+
+## License
+
+Meta's Robyn is MIT licensed, as found in the LICENSE file.
+
+- Terms of Use - https://opensource.facebook.com/legal/terms 
+- Privacy Policy - https://opensource.facebook.com/legal/privacy
+- Defensive Publication - https://www.tdcommons.org/dpubs_series/4627/
+
+## Contact
+
+* gufeng@meta.com, Gufeng Zhou, Marketing Science, Robyn creator
+* igorskokan@meta.com, Igor Skokan, Marketing Science Director, open source
diff --git a/python/docs/robyn/modeling/feature_engineering.md b/python/docs/robyn/modeling/feature_engineering.md
@@ -0,0 +1,181 @@
+# CLASS
+## FeaturizedMMMData
+* This class is a data container specifically used to store the results of feature engineering for Marketing Mix Modeling (MMM) data.
+* It holds the modulated data, rolling window modulated data, and the results from non-linear models.
+* This class is located in the main file for feature engineering.
+
+# CONSTRUCTORS
+## FeaturizedMMMData `(dt_mod: pd.DataFrame, dt_modRollWind: pd.DataFrame, modNLS: Dict[str, Any])`
+* **dt_mod**: A pandas DataFrame that contains the modulated data after feature engineering.
+* **dt_modRollWind**: A pandas DataFrame representing the rolling window modulated data.
+* **modNLS**: A dictionary that holds the results of non-linear model fitting, with keys as model names and values as model outcomes.
+
+### USAGE
+* The constructor is used to instantiate a `FeaturizedMMMData` object, encapsulating the results of feature engineering for further analysis or modeling.
+
+### IMPL
+* The `@dataclass` decorator is employed, which automatically provides an `__init__` method that initializes class attributes based on the provided parameters.
+
+# CLASS
+## FeatureEngineering
+* This class is designed to carry out feature engineering specifically for Marketing Mix Modeling (MMM) data.
+* It incorporates external data such as holidays and utilizes statistical models to transform and prepare data for analysis.
+* The class is located in the main file dedicated to feature engineering tasks.
+
+# CONSTRUCTORS
+## FeatureEngineering `(mmm_data: MMMData, hyperparameters: Hyperparameters, holidays_data: Optional[HolidaysData] = None)`
+* **mmm_data**: An instance of `MMMData` that includes the dataset and specifications required for MMM.
+* **hyperparameters**: An instance of `Hyperparameters` that configures the feature engineering process.
+* **holidays_data**: An optional instance of `HolidaysData`, used to include holiday effects via Prophet decomposition.
+
+### USAGE
+* Instantiate this class when there is a need to perform feature engineering on MMM data using specific hyperparameters, with the option to factor in holidays data.
+
+### IMPL
+* The constructor initializes class variables such as `mmm_data`, `hyperparameters`, `holidays_data`, and a logger instance.
+* The logger is set up using Python's `logging` library to log information and warnings throughout the feature engineering process.
+
+# METHODS
+## `perform_feature_engineering(quiet: bool = False) -> FeaturizedMMMData`
+### USAGE
+* **quiet**: A boolean flag to indicate whether logging output should be suppressed. Defaults to `False`.
+* This method orchestrates the entire feature engineering process and returns a `FeaturizedMMMData` object containing the results.
+
+### IMPL
+* The method begins by preparing the initial dataset through `_prepare_data()`.
+* It checks for the presence of Prophet variables and performs decomposition if required, logging the process unless `quiet` is set to `True`.
+* Collects all relevant independent variables and transforms the dataset.
+* Generates rolling window data and computes the media cost factor.
+* Runs models using `_run_models()`.
+* Filters columns to retain necessary data in the resulting DataFrames and addresses any missing values.
+* Logs the completion of feature engineering if `quiet` is `False`.
+* Finally, returns an instance of `FeaturizedMMMData` containing the processed data and model results.
+
+## `_prepare_data() -> pd.DataFrame`
+### USAGE
+* Prepares the dataset by transforming the date and dependent variable columns for further processing.
+
+### IMPL
+* Copies the original data to avoid modifying the input directly.
+* Converts the date column to a standardized `YYYY-MM-DD` format.
+* Sets the dependent variable column for easier access and transformations.
+* Ensures specific variable types, such as converting `competitor_sales_B` to `int64`.
+
+## `_create_rolling_window_data(dt_transform: pd.DataFrame) -> pd.DataFrame`
+### USAGE
+* **dt_transform**: A pandas DataFrame representing the transformed dataset.
+* Creates a rolling window dataset based on specified start and end dates for analysis.
+
+### IMPL
+* Filters the dataset according to the window start and end specifications provided in `mmm_data`.
+* Raises a `ValueError` if the window specifications are inconsistent with the dataset, ensuring logical integrity.
+
+## `_calculate_media_cost_factor(dt_input_roll_wind: pd.DataFrame) -> pd.Series`
+### USAGE
+* **dt_input_roll_wind**: A pandas DataFrame of the rolling window input data.
+* Calculates the media cost factor for the given rolling window dataset.
+
+### IMPL
+* Computes the total spend from the specified paid media spends.
+* Returns the media cost factor as a pandas Series, representing the proportion of spend for each media type.
+
+## `_run_models(dt_modRollWind: pd.DataFrame, media_cost_factor: float) -> Dict[str, Dict[str, Any]]`
+### USAGE
+* **dt_modRollWind**: A pandas DataFrame containing rolling window modulated data.
+* **media_cost_factor**: A float representing the media cost factor.
+* Runs statistical models for each paid media variable and returns the results.
+
+### IMPL
+* Initializes a dictionary `modNLS` to store model results, yhat predictions, and plots.
+* Iterates over each paid media variable, calling `_fit_spend_exposure()` to fit models.
+* Aggregates model results into `modNLS` and returns it.
+
+## `_fit_spend_exposure(dt_modRollWind: pd.DataFrame, paid_media_var: str, media_cost_factor: float) -> Dict[str, Any]`
+### USAGE
+* **dt_modRollWind**: A pandas DataFrame of rolling window modulated data.
+* **paid_media_var**: A string representing the paid media variable.
+* **media_cost_factor**: A float representing the media cost factor.
+* Fits spend-exposure models for a given paid media variable and returns the results.
+
+### IMPL
+* Logs the processing of the paid media variable.
+* Attempts to fit data using the Michaelis-Menten and linear regression models.
+* Computes R-squared values to assess model fit and selects the better-performing model.
+* Handles exceptions by defaulting to a linear model and logs warnings if necessary.
+* Returns a dictionary containing model results, plots, and predicted values.
+
+## `_hill_function(x, alpha, gamma)`
+### USAGE
+* Static method to apply the Hill function transformation to a dataset.
+
+### IMPL
+* Computes the Hill function using the mathematical formula: `x^alpha / (x^alpha + gamma^alpha)`.
+* This transformation is used in the feature engineering process to model certain types of relationships.
+
+## `_prophet_decomposition(dt_mod: pd.DataFrame) -> pd.DataFrame`
+### USAGE
+* **dt_mod**: A pandas DataFrame representing the modulated data.
+* Performs Prophet decomposition on the dataset and returns the transformed data with additional features.
+
+### IMPL
+* Configures and fits a Prophet model using available holiday and seasonal data.
+* Incorporates custom parameters if available and manages multiple regressors.
+* Logs warnings for known Prophet issues to prevent unexpected errors.
+* Updates the dataset with trends, seasonalities, and holidays information.
+
+## `_set_holidays(dt_transform: pd.DataFrame, dt_holidays: pd.DataFrame, interval_type: str) -> pd.DataFrame`
+### USAGE
+* **dt_transform**: A pandas DataFrame representing the transformed dataset.
+* **dt_holidays**: A pandas DataFrame containing holiday data.
+* **interval_type**: A string indicating the data interval type ("day", "week", or "month").
+* Sets holidays in the dataset based on the specified interval type.
+
+### IMPL
+* Ensures date columns are in datetime format for consistency.
+* Adjusts holidays according to the interval type and raises a `ValueError` for invalid types.
+* Handles aggregation for weekly and monthly intervals to ensure accurate holiday representation.
+
+## `_apply_transformations(x: pd.Series, params: ChannelHyperparameters) -> pd.Series`
+### USAGE
+* **x**: A pandas Series representing the data to be transformed.
+* **params**: An instance of `ChannelHyperparameters` that contains transformation parameters.
+* Applies adstock and saturation transformations to the given series.
+
+### IMPL
+* Calls `_apply_adstock()` and `_apply_saturation()` to perform the necessary transformations.
+* Returns the transformed series for further analysis or modeling.
+
+## `_apply_adstock(x: pd.Series, params: ChannelHyperparameters) -> pd.Series`
+### USAGE
+* **x**: A pandas Series representing the data to be adstocked.
+* **params**: An instance of `ChannelHyperparameters` containing adstock parameters.
+* Applies the specified adstock transformation to the given series.
+
+### IMPL
+* Selects the appropriate adstock function based on the hyperparameter type.
+* Supports both geometric and Weibull adstock types.
+* Raises a `ValueError` for unsupported adstock types to ensure proper error handling.
+
+## `_geometric_adstock(x: pd.Series, theta: float) -> pd.Series`
+### USAGE
+* Static method to apply geometric adstock transformation to a dataset.
+
+### IMPL
+* Utilizes an exponential weighted moving average with a specified `theta`.
+* Computes the geometric adstocked series for the provided data.
+
+## `_weibull_adstock(x: pd.Series, shape: float, scale: float) -> pd.Series`
+### USAGE
+* Static method to apply Weibull adstock transformation.
+
+### IMPL
+* Computes the Weibull probability density function to generate weights for the transformation.
+* Convolves the weights with the input series to produce the adstocked series, capturing delayed effects.
+
+## `_apply_saturation(x: pd.Series, params: ChannelHyperparameters) -> pd.Series`
+### USAGE
+* Static method to apply saturation transformation to a dataset.
+
+### IMPL
+* Computes the saturation transformation using a formula involving `alpha` and `gamma`.
+* Returns the transformed series, modeling diminishing returns or saturation effects.