generate_hlr_data#

causalpy.data.simulate_data.generate_hlr_data(seed=42, n_groups=12, n_obs_per_group=40, beta_fixed_true=(2.0, 1.4), sigma_random_true=(0.7, 0.5), sigma_noise=0.8)[source]#

Generate grouped synthetic data for hierarchical linear regression.

Simulates outcomes from a hierarchical data-generating process with fixed and group-level random effects:

\[\mu_i = x_i^\top\beta_{\text{fixed}} + z_i^\top\beta_{\text{random}, g(i)}\]
\[y_i = \mu_i + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0, \sigma_{\text{noise}})\]
Parameters:
  • seed (int) – Seed used to initialize the NumPy random number generator.

  • n_groups (int) – Number of groups in the grouped structure.

  • n_obs_per_group (int) – Number of observations generated per group.

  • beta_fixed_true (tuple[float, float]) – True fixed-effect coefficients used in the simulation.

  • sigma_random_true (tuple[float, float]) – Group-level standard deviations for the random intercept and slope.

  • sigma_noise (float) – Observation-level noise scale.

Returns:

XY contains the fixed-effect inputs, group index, and observed outcome. Z contains the random-effect design matrix. params contains the true simulation values.

Return type:

tuple[pd.DataFrame, pd.DataFrame, dict[str, np.ndarray]]