A domain-specific programming language designed for statistical analysis and reporting.
Statica provides a clean, English-like syntax for performing statistical tests, regression modeling, and data visualization with automatic human-readable conclusions. The language interprets user-written .sta scripts through a custom parser and execution engine built in Python, making statistical analysis approachable, transparent, and expressive.
- Natural Language Syntax: Simple, intuitive commands that read like English
- Comprehensive Statistical Tests: Built-in support for one-sample t-tests, linear regression modeling, and scatter plot generation
- Automatic Interpretation: Human-readable conclusions generated automatically from statistical results
- Interactive Analysis: Context-aware system that prompts for missing statistical table values when needed
- Extensible Architecture: Modular design allows easy addition of new commands and statistical tests
data = load "study.csv"
t = test ttest mean of data.score = 75
m = regress score ~ age + group on data
plot data.age vs data.score scatter
conclude t
conclude m
Expected Output:
[Loaded study.csv]
[Ran t-test on score]
[Ran regression: score ~ age + group]
[Plot displayed: age vs score]
The mean (72.34) is significantly different from 75 (t = -2.14, p = 0.037).
Model Summary:
==========================
Dep. Variable: score
R-squared: 0.742
P > |t|: 0.001
==========================
- Python 3.7 or higher
- pip package manager
- Clone the Repository
git clone https://github.com/your-username/statica.git
cd statica- Create Virtual Environment (Recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install Dependencies
pip install -r requirements.txtRequired Dependencies:
- lark-parser
- pandas
- matplotlib
- scipy
- statsmodels
statica-proto/
├── README.md
├── requirements.txt
├── examples/
│ ├── study.csv
│ └── example.sta
└── statica/
├── __init__.py
├── grammar.lark
├── parser.py
├── runtime.py
├── nlg.py
└── cli.py
From source:
python -m src.cli examples/program.staAfter installation:
statica examples/program.stadata = load "filename.csv"
t = test ttest mean of data.column = value
m = regress dependent ~ independent1 + independent2 on data
plot data.x vs data.y scatter
conclude t
conclude m
Statica uses the Lark parser to define its grammar and interpret .sta code. The parser reads source code and converts it into an Abstract Syntax Tree (AST) for execution.
The StaticaExec engine walks through the AST and executes each instruction sequentially:
- Loads CSV data into Pandas DataFrames
- Performs statistical tests using SciPy and StatsModels
- Generates visualizations using Matplotlib
- Produces human-readable conclusions
The conclude command analyzes test or model results and generates natural language interpretations:
Significant Result:
The mean (72.34) is significantly different from 75 (t = -2.14, p = 0.037).
Non-Significant Result:
No significant difference between the sample mean (74.56) and 75 (t = -1.02, p = 0.32).
Statica's modular architecture supports easy extension:
- Update Grammar: Modify
grammar.larkto define new syntax rules - Implement Logic: Add corresponding Python logic in
interpreter.py - Test: Create test cases for the new functionality
- ANOVA: One-way and multi-way analysis of variance
- Chi-Square Tests: Categorical data analysis
- Correlation Analysis: Pearson and Spearman correlation
- Additional Plots: Boxplots, histograms, heatmaps
- Data Preprocessing: Filtering, transformation, aggregation
- Enhanced Statistical Library: ANOVA, chi-square, correlation tests
- Interactive Mode: REPL for exploratory analysis
- Report Generation: Export results to PDF or Markdown
- Advanced Error Handling: Detailed diagnostics and suggestions
- Data Validation: Automatic checking for statistical assumptions
- Multiple Dataset Support: Work with multiple data sources simultaneously
Contributions are welcome. Please follow these guidelines:
- Fork the repository
- Create a feature branch
- Commit your changes with clear messages
- Submit a pull request with a detailed description
This project is licensed under the MIT License. You are free to use, modify, and distribute Statica for personal or commercial projects.
Hirula
AI Engineering Student, SLIIT
Built with:
- Lark Parser - Parsing framework
- Pandas - Data manipulation
- SciPy - Scientific computing
- StatsModels - Statistical modeling
- Matplotlib - Data visualization
For issues, questions, or suggestions, please open an issue on the GitHub repository.