What is a statistical model?
A simplification of reality that uses random variables and probability to describe data.
Definition. A statistical model replaces a complex real-world situation with a small set of random variables and a probability distribution. The model captures what we care about (often the variation between observations) and ignores everything else.
Why model? Reality is messy: every light bulb has its own lifetime, every leaf its own length, every commute its own duration. We cannot describe each individual case. Instead we ask: 'what is the typical pattern, and how does it vary?' A statistical model is the answer.
Examples.
- Lifetimes of light bulbs modelled as .
- Number of phone calls per hour at a help-desk modelled as Poisson with mean .
- Outcome of a coin toss modelled as Bernoulli.
Key vocabulary.
| Term | Meaning |
|---|---|
| Population | the entire set of items/events we care about |
| Sample | a finite subset of the population we observe |
| Parameter | a number that fixes the model (e.g. , , , ) |
| Statistic | a number computed from a sample (e.g. , ) — used to estimate parameters |
- Models simplify reality using probability distributions.
- Population sample; parameter statistic.
- We use sample statistics to estimate population parameters.
- Choice of model is itself a modelling assumption.