Mathematical Formalism of Synthetic Controls in GeoLift

This document provides a rigorous econometric and statistical foundation for the methods implemented in GeoLift, intended for Data Scientists and Statisticians.

1. The Potential Outcomes Framework

Let $Y_{it}$ denote the outcome of interest for region $i \in \{1, \dots, N\}$ at time period $t \in \{1, \dots, T\}$. We observe a pre-treatment period $t \in \{1, \dots, T_0\}$ and a post-treatment period $t \in \{T_0+1, \dots, T\}$.

Without loss of generality, let unit $i=1$ be the treated unit, and units $i \in \{2, \dots, N\}$ be the donor pool (control units).

Following the Rubin Causal Model, we define potential outcomes:

$Y_{it}^N$: The outcome that would be observed for unit $i$ at time $t$ absent the intervention.
$Y_{it}^I$: The outcome that would be observed for unit $i$ at time $t$ exposed to the intervention.

The observed outcome is:

$$ Y_{it} = Y_{it}^N + \alpha_{it} D_{it} $$

Where $D_{it}$ is an indicator variable equal to 1 if unit $i$ receives treatment at time $t$, and 0 otherwise. $\alpha_{it} = Y_{it}^I - Y_{it}^N$ is the treatment effect for unit $i$ at time $t$.

Our goal is to estimate the Average Treatment Effect on the Treated (ATT) during the post-treatment period:

$$ \tau = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} \alpha_{1t} = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} (Y_{1t}^I - Y_{1t}^N) $$

Since $Y_{1t}^I$ is observed post-intervention ($Y_{1t}$), the fundamental problem of causal inference is estimating the unobserved counterfactual $Y_{1t}^N$.

2. The Standard Synthetic Control Estimator

The Synthetic Control Method (Abadie, Diamond, and Hainmueller, 2010) estimates the counterfactual $Y_{1t}^N$ as a weighted combination of the donor pool:

$$ \hat{Y}_{1t}^N = \sum_{j=2}^{N} w_j Y_{jt} $$

Where $\mathbf{W} = (w_2, \dots, w_N)'$ is a vector of weights satisfying:

$w_j \geq 0 \quad \forall j$ (Non-negativity)
$\sum_{j=2}^{N} w_j = 1$ (Simplex constraint)

The weights are chosen to minimise the discrepancy between the treated unit and the synthetic control in the pre-treatment period, typically by minimizing:

$$ || \mathbf{X}_1 - \mathbf{X}_0 \mathbf{W} ||_V = \sqrt{(\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})' V (\mathbf{X}_1 - \mathbf{X}_0 \mathbf{W})} $$

Where $\mathbf{X}_1$ is a $(K \times 1)$ vector of pre-intervention characteristics for the treated unit, $\mathbf{X}_0$ is a $(K \times J)$ matrix of the same variables for the donor pool, and $V$ is a positive semi-definite weighting matrix.

3. Why SparseSC over Synthetic Difference-in-Differences

Synthetic Difference-in-Differences (SDiD) (Arkhangelsky et al., 2021) proposes adding a time fixed effect $\hat{\lambda}$ and a unit fixed effect $\hat{\omega}$ to the synthetic control estimator:

$$ \hat{\tau}^{SDiD} = \frac{1}{T - T_0} \sum_{t=T_0+1}^{T} \left( Y_{1t} - \sum_{j=2}^{N} \hat{w}_j Y_{jt} - \hat{\omega} - \hat{\lambda}_t \right) $$

While theoretically appealing, SDiD relies heavily on a parallel trends assumption between the treated unit and the reweighted donor pool. In modern media-mix modelling and retail geo-experiments, geographic markets often exhibit highly nonlinear localized trends (e.g., localized COVID recovery, regional promotional overlap, weather shocks) that violate parallel trends even after reweighting.

GeoLift employs SparseSC (Sparse Synthetic Controls), which strictly enforces sparsity via an L1/L2 penalty (Lasso/Ridge regularisation) rather than relying on intercepts to correct for poor fit.

By forcing the algorithm to select a small, interpretable subset of donors and penalizing complex weights, SparseSC minimises the interpolation bias bounds (Abadie and L’Hour, 2021) and provides a counterfactual that structurally mimics the treated unit’s DGP without requiring a strict parallel trends assumption.

4. SparseSC Regularisation Mechanics

Let $\mathbf{Y}_0^{pre}$ be the $(T_0 \times J)$ matrix of pre-treatment outcomes for donors, and $\mathbf{Y}_1^{pre}$ be the $(T_0 \times 1)$ vector for the treated unit.

SparseSC modifies the standard SC optimisation by introducing a regularisation parameter $\lambda$:

$$ \hat{\mathbf{W}} = \arg \min_{\mathbf{W}} || \mathbf{Y}_1^{pre} - \mathbf{Y}_0^{pre} \mathbf{W} ||_2^2 + \lambda ||\mathbf{W}||_1 $$

Subject to the simplex constraints $\mathbf{1}'\mathbf{W} = 1, \mathbf{W} \geq 0$.

The out-of-sample tuning of $\lambda$ is achieved via rolling-origin cross-validation over the pre-treatment period, minimizing the out-of-sample mean squared prediction error (MSPE). This avoids overfitting to pre-treatment noise (a common failure mode in standard SCM when $T_0$ is small relative to $J$).

5. Placebo Inference and Exact P-Values

Because geo-experiments often have $N_1 = 1$ or a small integer, asymptotic inference is invalid. GeoLift calculates exact p-values using in-space placebo permutations.

We iteratively reassign the treatment status to every donor unit $j \in \{2, \dots, N\}$, calculate a placebo synthetic control $\hat{Y}_{jt}^N$, and derive a placebo effect $\hat{\alpha}_{jt}$.

The ratio of post-treatment MSPE to pre-treatment MSPE is calculated for the actual treated unit and all placebos:

$$ r_i = \frac{\frac{1}{T-T_0} \sum_{t=T_0+1}^T (Y_{it} - \hat{Y}_{it}^N)^2}{\frac{1}{T_0} \sum_{t=1}^{T_0} (Y_{it} - \hat{Y}_{it}^N)^2} $$

The exact p-value is the empirical probability of observing a ratio at least as extreme as the treated unit’s ratio:

$$ p = \frac{1}{N} \sum_{j=1}^N \mathbf{I}(r_j \geq r_1) $$