Skip to content

Oriane-O/Marketing-Mix-Modeling

Repository files navigation

Implementation of Marketing Mix Modeling

This project is based on the pymc implementation of the MMM presented in the paper Jin, Yuxue, et al. “Bayesian methods for media mix modeling with carryover and shape effects.” (2017). Here we work on simulated data to set the parameters ourselves and allow us to conduct a parameter recovery exercise.The data generation process is as an adaptation of the blog posts “Media Effect Estimation with PyMC: Adstock, Saturation & Diminishing Returns” and Mastering Marketing Mix Modelling In Python. It also uses as references this other sources:

🔎 https://github.com/sibylhe/mmm_stan/tree/main
🔎 https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html

1. Introduction

Marketing mix models or Media mix models are used to understand how media spending affects sales, so as to optimize future budget allocation.

These models are usually based on weekly or monthly aggregated national or geo level data. The data may include sales, price, product distribution, media spend in different channels, and external factors such as macroeconomic forces, weather, seasonality, and market competition.

ROAS (return on ad spend) and mROAS (marginal ROAS) are the key metrics to look at. High ROAS indicates the channel is efficient, high mROAS means increasing spend in the channel will yield a high return based on current spending level.

The ultimate goal is to create the best funnel to have the best ROI (return on investment).

2.Business Case Study

2.1 Problem Definition

Let's first define the business problem we are trying to solve. We want to optimize the marketing budget allocation of our client with the following characteristics:

  • Sales data: weekly sales of the client.
  • Media spend data: weekly spend on 5 different media channels
  • Domain knowledge:
    • We know that there has a been an positive sales trend which we believe comes from a strong economic growth.
    • We also know that there is a yearly seasonality effect.

There is a causal relationship between marketing and sales, but what is the nature of that relationship? We have to take into account that there is :

  • a carry-over effect (adstock). Meaning, the effect of spend on sales is not instantaneous but accumulates over time.
  • a saturation effect. Meaning, the effect of spend on sales is not linear but saturates at some point

2.2 Data Generation

📄 Find all the Generation process in the Notebook 1-Data_generation.ipynb
📄 Find a simplified Data_generation function in the Script data_generator_function.py

As described in the section above, we want a dataset with:

  • Sales variables: Sales ( the target variable)

  • Media Variables:

    • ooh (Out of home spend)
    • tv (Television spend)
    • print (Print media spend)
    • facebook ( Facebook ads spend)
    • search (Google search ads spend)
    • facebook_I (facebook impressions)
    • search_clicks_P (Google search ads performence,number of clicks)
  • Control Variables: competitor_sales_B (competitor sales baseline)

To construct our dataset we considered 4 years of weekly data.

From what we know from the domain knowledge, we have described the demand with an increasing trend for organic growth, with a seasonality (oscillation) in the demand each year.

alt text

We also created a proxy for demand as in reality, the true demand is never observable, but we can find proxies.

alt text

After that, we created synthetic data for each marketing channel.The different channel spends are correlated with demand, and also are designed by different marketing strategy (for example high budget, bursty campaigns for TV, or relatively consistent with moderate noise for out-of-home).

alt text

Next, we pass the raw signal through the two transformations: first the geometric adstock (carryover effect) and then the logistic saturation. For the adstock, we set a maximum lag effect of 8 weeks, and we chose our alpha parameter accordingly to the media:

Channel Type alpha Justification
tv Offline, mass media 0.5 – 0.8 Strong long-term effect (brand awareness, memorability)
ooh Offline, visual 0.4 – 0.7 Moderately lasting impact, repeated exposure in public spaces
print Offline, print media 0.2 – 0.5 Lower memorability, short-lived effect, rarely drives direct action
facebook Digital, paid social 0.1 – 0.4 Short-term performance focus, quick decay of impact
search Digital, intent-based 0.0 – 0.2 No carryover effect: impact is immediate (direct response channel)

Same for the saturation:

Channel Type λ Justification
tv Offline, mass media 0.5 – 1.5 Strong saturation: TV reach saturates quickly (broad audience)
ooh Offline, visual 1.0 – 2.0 Moderate saturation, especially in high-exposure urban areas
print Offline, print media 1.5 – 3.0 Low saturation (narrow audience); hard to reach saturation point
facebook Digital, paid social 0.5 – 1.5 Can saturate fast with high budget, algorithmically optimized
search Digital, intent-based 2.0 – 4.0 Very low saturation: conversion effectiveness remains linear longer (pull channel)

alt text

And we can visualize the effect signal for each channel after each transformation:

alt text

We then add the Facebook impressions and google search clicks, and then the control variable that is the competitor sales baseline ( that also follows trend and seasonality)

Finally, we create our target value, the sales, that we assume it is a linear combination of the effect signal, the trend and the seasonal components, plus the two events and an intercept. We also add some Gaussian noise.

alt text

2.2 Exploratory Data Analysis and Feature Engineering

📄 Find more details on the EDA and FE in 2-EDA and FE.ipynb

Now that we have our data let's do the exploratory data analysis. (This one will be faster than the usuals because we already know well our dataset given that we constructed it).

We reduced our dataset to only the columns that would be available from real raw data delivered by the company(sales,spend per channel,sales of competitor,impressions and clicks), and start by visualizing them.

alt text

We also compare the monthly stats on ad spends:

tv_s ooh_s print_s facebook_s search_s
count 209 209 209 209 209
mean $6,342 $1,657 $142 $6,886 $3,328
std $11,416 $436 $476 $2,470 $809
min $0 $810 $0 $2,052 $1,589
25% $0 $1,326 $0 $4,927 $2,673
50% $0 $1,611 $0 $6,642 $3,322
75% $12,002 $1,959 $0 $8,313 $3,834
max $49,311 $2,779 $2,839 $15,080 $5,387

NB : Media contribution du modele vs de nos données generees à comparer

We can see the current allocations on the different channels but now the question is if this is the most profitable split and spendings.(Always remembering that the ad spends are not the only source explaining the sales).

And now we do a quick feature engineering step befor modeling the process.For this step, we add more layers to describe temporality and we include a trend feature (that will help us see the seasonality as 4 Fourier modes).

date sales tv_s ooh_s print_s facebook_s search_s trend year month dayofyear
0 2021-01-04 162277.109282 0.0 963.639807 0.0 4296.052070 2182.363211 0 2021 1 4
1 2021-01-11 170493.217138 0.0 1015.604279 0.0 4324.637643 2151.854375 1 2021 1 11
2 2021-01-18 144523.455074 0.0 1102.396049 0.0 4926.757799 2168.730601 2 2021 1 18
3 2021-01-25 239399.578756 0.0 1371.427530 0.0 7538.774028 3306.440497 3 2021 1 25
4 2021-02-01 195422.511307 0.0 1015.958803 0.0 5212.689979 2378.653779 4 2021 2 32

About

pymc implementation of the MMM with simulated data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published