When a GLS problem involves hundreds of equations, the
Install the library from PyPI:
pip install alsglsFor local development, clone the repo and use an editable install:
pip install -e .from alsgls import ALSGLS, ALSGLSSystem, simulate_sur
Xs_tr, Y_tr, Xs_te, Y_te = simulate_sur(N_tr=240, N_te=120, K=60, p=3, k=4)
# Scikit-learn style estimator
est = ALSGLS(rank="auto", max_sweeps=12)
est.fit(Xs_tr, Y_tr)
test_score = est.score(Xs_te, Y_te) # negative test NLL per observation
# Statsmodels-style system interface
system = {f"eq{j}": (Y_tr[:, j], Xs_tr[j]) for j in range(Y_tr.shape[1])}
sys_model = ALSGLSSystem(system, rank="auto")
sys_results = sys_model.fit()
params = sys_results.params_as_series() # pandas optionalThe benchmarks/compare_sur.py script contrasts ALS-GLS with statsmodels and
linearmodels SUR implementations on matched simulation grids while recording
peak memory (via Memray, Fil, or the POSIX RSS high-water mark).
Background material and reproducible experiments are available in the notebooks under als_sim/, such as als_sim/als_comparison.ipynb and als_sim/als_sur.ipynb.
This package provides a modern, type-safe implementation of Alternating-Least-Squares (ALS) for low-rank GLS problems. The Woodbury identity reduces the expensive inverse to a tiny k × k system, and the β-update can be written without explicitly forming dense matrices.
Key features in v1.0:
- Full type safety with mypy compliance and comprehensive type hints
- Numerically stable implementation using Cholesky factorization throughout
- Clean API with single computational path and enhanced error messages
- Memory efficient with O(K k) complexity, converging in 5–6 sweeps
- Breaking changes for a cleaner, more maintainable codebase
Rule of thumb: if your GLS routine keeps looping between
Random‑effects models, feasible GLS with estimated heteroskedastic weights, optimal‑weight GMM, and spatial autoregressive GLS all iterate β ↔ Σ̂. Each can adopt the same ALS trick: treat the weight matrix as low‑rank + diagonal, invert only the k × k core, and avoid the dense K × K algebra. Memory savings in published examples range from 5× to 20×, depending on k.
To demonstrate performance, we benchmark ALS against traditional methods with N = 300 observations, three regressors, rank‑3 factors, and K ranging from 50 to 120 equations. The largest array that traditional methods need is the dense Σ⁻¹ (K×K), whereas ALS's largest is the skinny factor matrix F (K×k).
| K | β‑RMSE Traditional | β‑RMSE ALS | Peak MB Traditional | Peak MB ALS | Memory ratio |
|---|---|---|---|---|---|
| 50 | 0.021 | 0.021 | 0.020 | 0.002 | 10× |
| 80 | 0.020 | 0.020 | 0.051 | 0.003 | 17× |
| 120 | 0.020 | 0.020 | 0.115 | 0.004 | 29× |
The ALS implementation achieves the same statistical performance while using only a few megabytes of memory, providing substantial computational advantages for large systems.
- Rank (
k) – By default the high-level APIs pickmin(8, ceil(K / 10)), a conservative fraction of the number of equations. Increaserankif the cross-equation correlation matrix is slow to decay; decrease it when the diagonal dominates. - ALS ridge terms (
lam_F,lam_B) – Defaults to1e-3for both the latent-factor and regression updates; raise them slightly (e.g.1e-2) if CG struggles to converge or the NLL trace plateaus early. - Noise floor (
d_floor) – Keeps the diagonal component positive; the default1e-8protects against breakdowns when an equation is nearly deterministic. Increase it in highly ill-conditioned settings. - Stopping criteria – ALS stops when the relative drop in NLL per sweep is
below
1e-6(configurable viarel_tol) or aftermax_sweeps. Inspectinfo["nll_trace"]to diagnose stagnation. - Possible failures – Large condition numbers or nearly-collinear regressors
can make the β-step CG solve slow; adjust
cg_tol/cg_maxit, add stronger ridge, or re-scale predictors. Ifinfo["accept_t"]stays at zero and the NLL does not improve, the factor rank may be too large relative to the sample size.