The 5 evaluation categories
Use this checklist for every fieldwork evaluation. Pearson examiners credit students who address all five.
The 5 evaluation categories.
(1) SAMPLE.
Was the sample large enough + well-distributed + representative?
- Size. Was the number of sites + repeats + respondents adequate? Small samples reduce confidence.
- Spatial coverage. Were sites evenly distributed across the study area? Were any zones missed?
- Temporal coverage. Was the data collected over enough time periods (days, seasons) to capture variation?
- Sampling strategy. Was it random, systematic, stratified, or opportunistic? Was the choice justified?
- Representativeness. Was the sample typical of the population it claims to describe?
Example evaluation: 'My 5 sites were chosen by opportunistic sampling along one transect on one Saturday afternoon. Sample size is small + temporal coverage narrow (one snapshot). Increasing to 10+ sites systematic every 200 m across 3 different days would give better spatial + temporal coverage.'
(2) SOURCES.
Were primary + secondary data sources reliable?
- Primary. Was equipment calibrated? Were methods standardised? Were repeats made?
- Secondary. Were sources named + cited? Were they current? Were they reliable?
- Uncertainty. Was the inherent uncertainty in each source acknowledged?
- Currency. Are sources still relevant β Census decadal, news immediate but biased?
- Bias. Each source has its own biases β government data may have political bias, news may have narrative bias, advocacy may have ideological bias.
Example evaluation: 'My pH meter was not calibrated on the field day, introducing uncertainty in the readings. UK ONS Census 2021 data is high-quality but 5 years old by 2026 + at output-area level may mask within-area variation. Improvements: calibrate all instruments at the start of the day; triangulate Census data with more recent council data.'
(3) METHODOLOGY.
Were methods replicable + accurate + standardised?
- Replicability. Could another team replicate the study at the same sites + reach similar conclusions?
- Accuracy. Were measurements precise? Were repeat readings taken + averaged?
- Standardisation. Same observer, same equipment, same conditions across sites?
- Pilot study. Was a pilot conducted before the main field day?
- Risk assessment. Was it documented as required by Pearson 4GE1 Appendix 5?
- Observer bias. Could the observer have systematically affected measurements?
Example evaluation: 'My beach-width measurements were taken by a single observer who may have drifted in defining the swash line. No pilot study was conducted; recording sheets had ambiguous columns. Improvements: standardise the swash-line definition; two observers per site with inter-observer reliability check; pilot study at one site before main fieldwork.'
(4) CONFOUNDING VARIABLES.
What else might have affected the results?
- Weather + season. River flow + pedestrian flow + visibility vary with weather.
- Recent events. Storm damage, road closures, sales events, sports matches affect data.
- Geological + topographic variation. Sites may have different underlying conditions.
- Demographic + socioeconomic variation. Wards differ in age, ethnicity, income β all confounders.
- Historic + cultural context. Past management decisions + development shape current patterns.
Example evaluation: 'My EQS scores may be affected by recent road repairs (visible at 2 of 5 sites) + a major festival the previous weekend that increased litter. My fast-food + obesity correlation does not control for DEPRIVATION β a confounding variable that drives both. Improvements: document recent events affecting sites; stratify analysis by deprivation tier using IMD; match comparable sites.'
(5) REPRODUCIBILITY.
Could another team reproduce the study + reach similar conclusions?
- Documentation. Are equipment, sites, methods documented enough to repeat?
- GPS coordinates for sites.
- Standardised methods. Same procedures across sites + days.
- Materials available. Could another team source the same equipment?
- Conclusions defensible. Are conclusions cautious + theory-grounded enough that another team would reach similar ones?
Example evaluation: 'My methodology is partially replicable β equipment + GPS coordinates documented but observer-skill effects + weather variability mean another team might reach slightly different specific conclusions while supporting the same general finding. Improvements: more detailed observer-training documentation; standardised photographs as cross-reference.'
Using the 5 categories.
Pearson 4GE1 mark schemes specifically credit students who address ALL FIVE categories. Use the checklist:
- Have I evaluated SAMPLE size + coverage + sampling strategy?
- Have I evaluated SOURCES β primary + secondary?
- Have I evaluated METHODOLOGY β replicability + accuracy + standardisation?
- Have I identified CONFOUNDING VARIABLES not controlled for?
- Have I addressed REPRODUCIBILITY?
A complete evaluation hits all five.
Examiner tip. Use the 5 categories as your evaluation skeleton. Then expand each with specific limitations + improvements + biases identified. This structure consistently produces top-band evaluations.
- (1) SAMPLE: size + coverage + sampling strategy + representativeness.
- (2) SOURCES: primary calibration + secondary reliability + currency + bias.
- (3) METHODOLOGY: replicability + accuracy + standardisation + pilot + risk.
- (4) CONFOUNDING VARIABLES: weather, events, geological, demographic, historic.
- (5) REPRODUCIBILITY: documentation + GPS + standardisation + cautious conclusions.
- Pearson mark schemes credit all 5 categories β use as checklist.