Why is "Vague evaluation like 'I had no time' or 'it rained'" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Students don't think evaluation through. How to avoid it: Vague excuses score nothing. Pearson examiners reward SPECIFIC + QUANTIFIED critique addressing the 5 categories (sample / sources / methodology / confounding / reproducibility) + improvements. Source: Pearson 4GE1 Examiner Reports — Paper 2 Section B

Why is "Listing limitations without proposing improvements" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Evaluation feels like the end. How to avoid it: Every limitation should be paired with a SPECIFIC + QUANTIFIED improvement. 'Sample was 5 sites' → 'Increase to 10+ sites with systematic sampling at 200 m intervals'. Improvements earn marks alongside limitations. Source: Pearson 4GE1 Examiner Reports — Paper 2 Section B

Why is "Conflating limitations with biases" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: They feel similar. How to avoid it: Distinguish: LIMITATION = what wasn't done (sample, time, methods); BIAS = systematic distortion (observer, sampling, confirmation). Limitations reducible by doing more; biases need methodological change.

Why is "Ignoring confounding variables" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Students focus on the variables they measured. How to avoid it: Always ask: what ELSE could have affected my results? Weather, season, recent events, demographic composition, geological variation, storm history. Pearson 4GE1 mark schemes specifically credit confounding-variable identification. Source: Pearson 4GE1 Examiner Reports — Paper 2 Section B

Why is "Forgetting to address replicability / reproducibility" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Feels separate from data quality. How to avoid it: Ask: could another team repeat this study at the same sites + reach similar conclusions? Replicability is the public face of reliability. Pearson 4GE1 mark schemes specifically credit it.

Why is "Using secondary sources without evaluating reliability" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Government data feels authoritative. How to avoid it: Every secondary source has strengths + weaknesses. 'ONS Census 2021 is high-quality but decadal — 2026 data is 5 years out of date.' Pearson examiners credit students who evaluate sources.

Why is "Evaluating only positives + ignoring weaknesses" a common mistake in Evaluation on data, methods and conclusions?

Why it happens: Students don't want to undermine their work. How to avoid it: Honest evaluation acknowledges BOTH strengths AND limitations. Stating limitations is NOT failure — it's the mark of methodological maturity. Pearson 4GE1 mark schemes reward this honesty consistently. Source: Pearson 4GE1 Examiner Reports — Paper 2 Section B

Edexcel IGCSE Geography (4GE1)

Evaluation on data, methods and conclusions

0% Complete

Detailed Study Notes

Detailed notes on Development and human welfare for Edexcel IGCSE Geography, covering key concepts, explanations, examples, and exam-focused revision points.

Evaluation on Data, Methods + Conclusions — Pearson Edexcel International GCSE Geography (4GE1) Study Notes

Step 6 of the Pearson 4GE1 enquiry process: HONEST REFLECTION on enquiry quality. The five evaluation categories: SAMPLE, SOURCES, METHODOLOGY, CONFOUNDING VARIABLES, REPRODUCIBILITY. Distinguishes LIMITATIONS (what wasn't done) from BIASES (systematic distortion). Pearson rewards thoughtful self-critique paired with specific quantified improvements. Avoids weak excuses ('I had no time'). Honest acknowledgement of uncertainty = methodological maturity = top-band marks.

At a glance

5 evaluation categories: Sample / Sources / Methodology / Confounding variables / Reproducibility.
Limitation = what wasn't done (sample, time, methods). Reducible by doing more.
Bias = systematic distortion (observer, sampling, self-selection, confirmation). Requires methodological change.
Avoid weak excuses: 'I had no time', 'It rained' — these score nothing alone.
Quantify improvements: '10+ sites every 200 m', not 'more sites'.
Distinguish strengths + limitations explicitly.
Pair every limitation with a specific improvement.
Honesty = methodological maturity — Pearson examiners credit this discipline.

What you’ll learn

Mapped to the Edexcel IGCSE 4GE1 syllabus (2026 onwards).

Evaluate sample size + spatial + temporal + representativeness of a fieldwork enquiry.
Evaluate the reliability + uncertainty + currency of primary + secondary data sources.
Evaluate methodology — replicability, accuracy, observer bias, weather + time.
Identify confounding variables not controlled for in the enquiry.
Evaluate reproducibility — could another team replicate the study?
Distinguish limitations from biases.
Propose specific + quantified improvements paired with limitations.
Acknowledge uncertainty + strengths together in a balanced evaluation.

The 5 evaluation categories

Use this checklist for every fieldwork evaluation. Pearson examiners credit students who address all five.

The 5 evaluation categories.

(1) SAMPLE.

Was the sample large enough + well-distributed + representative?

Size. Was the number of sites + repeats + respondents adequate? Small samples reduce confidence.
Spatial coverage. Were sites evenly distributed across the study area? Were any zones missed?
Temporal coverage. Was the data collected over enough time periods (days, seasons) to capture variation?
Sampling strategy. Was it random, systematic, stratified, or opportunistic? Was the choice justified?
Representativeness. Was the sample typical of the population it claims to describe?

Example evaluation: 'My 5 sites were chosen by opportunistic sampling along one transect on one Saturday afternoon. Sample size is small + temporal coverage narrow (one snapshot). Increasing to 10+ sites systematic every 200 m across 3 different days would give better spatial + temporal coverage.'

(2) SOURCES.

Were primary + secondary data sources reliable?

Primary. Was equipment calibrated? Were methods standardised? Were repeats made?
Secondary. Were sources named + cited? Were they current? Were they reliable?
Uncertainty. Was the inherent uncertainty in each source acknowledged?
Currency. Are sources still relevant — Census decadal, news immediate but biased?
Bias. Each source has its own biases — government data may have political bias, news may have narrative bias, advocacy may have ideological bias.

Example evaluation: 'My pH meter was not calibrated on the field day, introducing uncertainty in the readings. UK ONS Census 2021 data is high-quality but 5 years old by 2026 + at output-area level may mask within-area variation. Improvements: calibrate all instruments at the start of the day; triangulate Census data with more recent council data.'

(3) METHODOLOGY.

Were methods replicable + accurate + standardised?

Replicability. Could another team replicate the study at the same sites + reach similar conclusions?
Accuracy. Were measurements precise? Were repeat readings taken + averaged?
Standardisation. Same observer, same equipment, same conditions across sites?
Pilot study. Was a pilot conducted before the main field day?
Risk assessment. Was it documented as required by Pearson 4GE1 Appendix 5?
Observer bias. Could the observer have systematically affected measurements?

Example evaluation: 'My beach-width measurements were taken by a single observer who may have drifted in defining the swash line. No pilot study was conducted; recording sheets had ambiguous columns. Improvements: standardise the swash-line definition; two observers per site with inter-observer reliability check; pilot study at one site before main fieldwork.'

(4) CONFOUNDING VARIABLES.

What else might have affected the results?

Weather + season. River flow + pedestrian flow + visibility vary with weather.
Recent events. Storm damage, road closures, sales events, sports matches affect data.
Geological + topographic variation. Sites may have different underlying conditions.
Demographic + socioeconomic variation. Wards differ in age, ethnicity, income — all confounders.
Historic + cultural context. Past management decisions + development shape current patterns.

Example evaluation: 'My EQS scores may be affected by recent road repairs (visible at 2 of 5 sites) + a major festival the previous weekend that increased litter. My fast-food + obesity correlation does not control for DEPRIVATION — a confounding variable that drives both. Improvements: document recent events affecting sites; stratify analysis by deprivation tier using IMD; match comparable sites.'

(5) REPRODUCIBILITY.

Could another team reproduce the study + reach similar conclusions?

Documentation. Are equipment, sites, methods documented enough to repeat?
GPS coordinates for sites.
Standardised methods. Same procedures across sites + days.
Materials available. Could another team source the same equipment?
Conclusions defensible. Are conclusions cautious + theory-grounded enough that another team would reach similar ones?

Example evaluation: 'My methodology is partially replicable — equipment + GPS coordinates documented but observer-skill effects + weather variability mean another team might reach slightly different specific conclusions while supporting the same general finding. Improvements: more detailed observer-training documentation; standardised photographs as cross-reference.'

Using the 5 categories.

Pearson 4GE1 mark schemes specifically credit students who address ALL FIVE categories. Use the checklist:

Have I evaluated SAMPLE size + coverage + sampling strategy?
Have I evaluated SOURCES — primary + secondary?
Have I evaluated METHODOLOGY — replicability + accuracy + standardisation?
Have I identified CONFOUNDING VARIABLES not controlled for?
Have I addressed REPRODUCIBILITY?

A complete evaluation hits all five.

Examiner tip. Use the 5 categories as your evaluation skeleton. Then expand each with specific limitations + improvements + biases identified. This structure consistently produces top-band evaluations.

(1) SAMPLE: size + coverage + sampling strategy + representativeness.
(2) SOURCES: primary calibration + secondary reliability + currency + bias.
(3) METHODOLOGY: replicability + accuracy + standardisation + pilot + risk.
(4) CONFOUNDING VARIABLES: weather, events, geological, demographic, historic.
(5) REPRODUCIBILITY: documentation + GPS + standardisation + cautious conclusions.
Pearson mark schemes credit all 5 categories — use as checklist.

Limitations vs biases

Different problems with different solutions. Distinguishing them is methodological maturity.

Limitations are what WASN'T DONE or COULDN'T BE DONE in the enquiry.

Examples:

Sample size limited to 5 sites because of time.
Only one field day because of school schedule.
No secondary data because of internet issues at school.
Equipment unavailable for proper measurement.
Statistical tests not calculated because of skill / software gap.

Reducible by doing more. Larger sample, more days, additional sources, training in statistics.

Biases are SYSTEMATIC DISTORTIONS in how data was collected or interpreted.

Examples:

Observer bias — scoring EQS more harshly at certain sites.
Sampling bias — opportunistic sampling that misses certain areas.
Self-selection bias — questionnaire respondents are those willing to participate.
Measurement bias — uncalibrated equipment giving systematically wrong readings.
Confirmation bias — interpreting data in favour of the hypothesis.

Requires methodological change — not just more of the same. To address bias, you need different methods.

Distinguishing — worked example.

A student measures river velocity at 5 sites with 1 reading per site using a paper float. They conclude that velocity rises downstream.

Limitations:

5 sites is small (LIMITATION — fix by adding more sites).
1 reading per site is small (LIMITATION — fix by repeating + averaging).
No statistical test (LIMITATION — fix by calculating ρ).

Biases:

Paper float too light + wind-blown (BIAS — fix by using proper orange / cork; requires methodological change).
Observer judging when float passes marker (BIAS — fix by having one observer at each marker + clear visual line).
Hypothesis confirmation expected (BIAS — fix by blind analysis or peer review).

Both can be present. Distinguishing them shows methodological maturity.

Why distinguishing matters.

Improvement plan. Limitations are addressed by doing more of the same; biases need methodological change. The improvement plan differs.
Conclusion strength. Biases threaten the validity of the conclusion + can systematically push it in a wrong direction. Limitations weaken confidence but not direction.
Replicability. Biased methods reproduce the bias when repeated; limited methods improve with more.
Marks. Pearson 4GE1 mark schemes specifically credit distinguishing them.

Types of bias to learn:

Type	Description	Example	Mitigation
Observer bias	Different observers score differently	EQS scores drift between observers	Anchored descriptors + inter-observer reliability
Sampling bias	Sample not representative of population	Opportunistic sampling	Random / systematic sampling
Self-selection bias	Respondents not typical of those approached	Volunteer questionnaire respondents	Stratified sampling + minimum response rate
Measurement bias	Equipment systematic error	Uncalibrated pH meter	Calibration + cross-validation
Confirmation bias	Interpreting data to favour hypothesis	Cherry-picking supportive measurements	Pre-registered methods + blind analysis
Hindsight bias	Reinterpreting data after seeing results	Reframing original hypothesis	Pre-registered hypotheses
Aggregation bias	Average masks within-unit variation	Census ward-level masks street-level	Use finer spatial unit

Pearson 4GE1 examiner-credited evaluations identify BOTH limitations + biases + propose specific solutions for each.

Examiner tip. Use the LIMITATION vs BIAS distinction explicitly in your evaluation. 'I had several LIMITATIONS — only 5 sites, one day. I also had BIASES — observer bias in EQS scoring + sampling bias in choosing accessible sites only.' This earns marks Pearson examiners specifically reward.

Limitation = what wasn't done; reducible by doing more.
Bias = systematic distortion; requires methodological change.
Distinguishing them shows methodological maturity.
Types of bias: observer, sampling, self-selection, measurement, confirmation, hindsight, aggregation.
Each bias has specific mitigation strategies.

Improvement plans

Every limitation should have a specific + quantified improvement. Vague suggestions score nothing.

Improvement plans are essential for top-band evaluations. Pearson 4GE1 mark schemes credit students who pair every limitation with a SPECIFIC + QUANTIFIED improvement.

Format for an improvement.

'LIMITATION: [specific problem]. IMPROVEMENT: [specific solution with numbers + justification].'

Examples.

Limitation	Vague (low marks)	Specific (high marks)
5 sites	'More sites'	'10+ sites at systematic intervals every 200 m to give better spatial coverage along a 2 km transect'
One field day	'More days'	'Repeat across 3 days (May / August / November) + at low + mid tide to capture seasonal + tide-stage variation'
1 velocity reading per site	'More readings'	'3+ velocity readings per site + take mean; use proper float (orange / cork) + apply 0.85 correction factor for mean-velocity estimate'
No statistical test	'Add statistics'	'Calculate Spearman's rank correlation coefficient (ρ) between distance + variable; expected ρ > +0.7 = strong support'
No secondary data	'Add secondary'	'Integrate UK Environment Agency historic discharge records + Met Office antecedent rainfall + OS historic maps for the catchment'
Single observer	'Multiple observers'	'Two observers measure independently at each site + calculate inter-observer agreement; refine method if differences > acceptable threshold'
Subjective Likert scoring	'Standardise'	'Anchored descriptors at each Likert level (e.g. 1 = streets piled with rubbish; 5 = pristine); 2 observers + inter-observer check'
No pilot	'Run a pilot'	'Pilot study at 1 site 1 week before main field day; test recording sheets + timings + equipment; refine method based on pilot findings'

Why specificity matters.

A vague improvement ('more sites') is not actionable. A specific improvement ('10+ sites every 200 m') tells the reader exactly what would strengthen the study + can be followed up.

Pearson 4GE1 examiners credit specificity — naming techniques, quantifying numbers, justifying choices.

Common improvements to memorise.

For sampling:

Larger sample size.
Systematic / stratified / random sampling.
Multi-site, multi-day, multi-season repetition.
Pilot study.

For methodology:

Calibrated equipment.
Repeats + means (3+ × per measurement).
Standardised procedures (same observer, same time, same conditions).
Inter-observer reliability check.
Anchored Likert descriptors.

For sources:

Multiple named secondary sources (Environment Agency, ONS, OS, USGS, World Bank).
Triangulation primary + secondary.
Academic literature integration.

For analysis:

Statistical tests (Spearman's ρ, mean comparisons, line of best fit).
Outlier identification + explanation.
Apply geographical theory.
Distinguish correlation from causation.

For conclusion + evaluation:

4-part conclusion structure.
Cautious causal language.
Honest acknowledgement of limitations + biases.

For reproducibility:

Detailed methods documentation.
GPS coordinates for sites.
Standardised photographs.

Worked improvement plan.

For a 5-site, 1-day coastal-fieldwork enquiry on Mappleton beach width:

Sample size + sampling strategy. Increase to 10+ sites systematic every 200 m. Why: even spatial coverage; reproducibility.
Temporal coverage. Repeat across 3 different field days (different tide stages + seasons). Why: capture variation + check whether snapshot is representative.
Observer reliability. Two observers measure beach width independently at each site; calculate inter-observer agreement. Why: identify + reduce observer bias.
Methods triangulation. Add beach profile + sediment sampling + cliff retreat to complement beach-width measurement. Why: multiple methods strengthen conclusions through triangulation.
Secondary data. Integrate UK Environment Agency historic Mappleton records + Channel Coastal Observatory aerial photographs + historic OS maps. Why: provides multi-decade context primary fieldwork cannot.
Statistical analysis. Calculate Spearman's rank between distance from groyne + beach width; compute confidence intervals. Why: quantifies relationship strength.
Confounding variable control. Document recent storm history + tide stage on field day + sediment-supply context. Compare with similar protected sites (e.g. Sea Palling, Yorkshire). Why: isolates engineering effect from natural variability.
Wider context. Note Withernsea downdrift consequence (~10-15 m additional retreat over 30 years). Why: places local conclusion in regional system.
Replicability documentation. GPS-mark all sites; document observer + equipment + procedures + conditions. Why: enables another team to replicate.
Acknowledge limitations openly. State what the data CAN'T support; use 'consistent with' language.

This 10-point plan addresses all 5 evaluation categories + multiple biases + specific improvements. Pearson 4GE1 examiners would credit it as gold-standard.

Examiner tip. The pairing 'LIMITATION → IMPROVEMENT' should be visible in every evaluation paragraph. Memorise the common improvement framings + adapt to each specific enquiry.

Every limitation paired with SPECIFIC + QUANTIFIED improvement.
Specific = numbers + named techniques + justified.
Vague ('more sites') = low marks; specific ('10+ sites every 200 m') = top-band.
Common improvements: sampling + methodology + sources + analysis + conclusion + reproducibility.
Use 10-point checklist for complex evaluations.

Honest evaluation as methodological maturity

Honest acknowledgement of limitations + biases = the mark of scholarly maturity. Pearson rewards it consistently.

Why honest evaluation matters.

Many students fear that evaluation will weaken their conclusion. The opposite is true. Pearson 4GE1 mark schemes consistently reward students who:

Acknowledge LIMITATIONS explicitly.
Distinguish them from BIASES.
Propose specific + quantified IMPROVEMENTS.
Use CAUTIOUS CAUSAL LANGUAGE.
Apply 5-CATEGORY thinking systematically.

Why this is methodological maturity.

Reliability + trust. An honest evaluation tells the reader EXACTLY what to trust + EXACTLY where to be sceptical. The reader can calibrate confidence to evidence.
Improvement guidance. Specific limitations + improvements enable the next study to address them. Science advances cumulatively.
Distinguishing 'consistent with' from 'proves'. Fieldwork data SUPPORTS or REJECTS hypotheses; it does not PROVE them. The honest evaluator says 'consistent with longshore-drift trapping, with the caveats that the sample is small...' The dishonest one says 'proves engineering works.'
Real-world stakes. In coastal management, engineering, public-health policy + climate adaptation, OVER-CONFIDENT conclusions can lead to catastrophe. The 1953 East Anglian flood + 2005 Hurricane Katrina + 2011 Tohoku tsunami all involved engineering over-confidence. Honest evaluation produces SAFER decisions.
Pedagogical value. Students who learn honest self-evaluation develop intellectual habits that serve them across disciplines + life.

What honest evaluation looks like in practice.

Compare two conclusions to the same coastal-fieldwork question:

DISHONEST. 'I PROVED that engineering protects the coast. My data shows beach width is 47 m updrift vs 18 m downdrift. The hypothesis is true.'

HONEST. 'My data SUPPORT the hypothesis that beach width is greater on the updrift side of Mappleton groynes than the downdrift side — mean updrift was 47 m, mean downdrift was 18 m. This is CONSISTENT WITH longshore-drift sediment trapping (coastal sediment budget theory). LIMITATIONS: 5 sites + one field day; cliff-edge definition subjective; recent storm history not controlled. BIASES: observer bias possible; selection bias in choosing accessible sites. The engineering protects LOCALLY but, as Environment Agency multi-decade records show, has caused ~10-15 m additional cliff retreat at Withernsea over 30 years. Improvements: 10+ sites systematic every 200 m, multi-day repetition, integration with secondary EA data, inter-observer reliability checks. The conclusion is robust but with explicit caveats. Wider implication: whole-coast Shoreline Management Plans should consider downdrift consequences.'

The honest version is more credible + earns more marks.

Wider epistemological lesson.

Modern science RECOGNISES uncertainty + acknowledges it transparently:

IPCC reports include confidence levels (high, medium, low) for each statement.
Medical trials report confidence intervals alongside point estimates.
UK National Risk Register publishes likelihoods + impacts as ranges.
COVID-19 modelling included explicit uncertainty bounds.

This is the same discipline Pearson 4GE1 is training students to apply.

Counter — does excessive evaluation undermine the conclusion?

Yes, if taken too far. A student who lists 20 limitations + concludes 'nothing can be said' is unhelpful. The mature response is BALANCED: acknowledge significant limitations + propose specific improvements + still draw the best supportable conclusion from the data.

PROPORTIONALITY matters. A 5% measurement bias is much less serious than 'I didn't sample systematically'. The first can be corrected; the second cannot.

The strongest evaluations are:

PROPORTIONATE — significant limitations matter more than minor ones.
SPECIFIC — named limitations + named improvements.
QUANTIFIED — numbers + techniques + justifications.
DISTINGUISHED — limitations vs biases.
BALANCED — strengths AND limitations.
IMPROVED-UPON — every limitation paired with specific improvement.
THEORY-GROUNDED — connect to wider geographic + scientific epistemology.

Examiner tip. Embrace honest evaluation. Pearson 4GE1 mark schemes credit it consistently. The strongest fieldwork enquiries are not those with the cleanest results but the most HONEST EVALUATIONS. Honesty is the foundation of credible geography.

Honest acknowledgement = methodological maturity.
Pearson examiners credit honesty consistently.
Use cautious causal language: 'consistent with' not 'proves'.
Balanced: strengths + limitations + biases + improvements.
Proportionate: significant matters more than minor.
Connect to scientific epistemology (IPCC confidence levels, medical CIs).
Honest evaluation is the foundation of credible geography.

Evaluation practice + checklist

Use this 11-point checklist to write thorough evaluations. Drill with each enquiry type.

Evaluation-writing checklist.

For every evaluation, ensure you have addressed:

✓	Element
1	Stated the 5 evaluation categories (sample, sources, methodology, confounding, reproducibility)?
2	Identified specific LIMITATIONS in each category?
3	Distinguished LIMITATIONS from BIASES?
4	Named specific BIASES (observer, sampling, self-selection, measurement, confirmation)?
5	Identified specific CONFOUNDING VARIABLES not controlled?
6	Paired every limitation with a SPECIFIC + QUANTIFIED improvement?
7	Suggested STATISTICAL improvements (Spearman's ρ, confidence intervals, line of best fit)?
8	Acknowledged SOURCES STRENGTHS + WEAKNESSES + specific alternatives?
9	Suggested REPLICABILITY improvements (GPS + documentation + standardisation)?
10	Maintained PROPORTIONALITY (significant > minor)?
11	Balanced strengths + limitations + biases + improvements?

Drill practice.

River fieldwork evaluation:

'My river-fieldwork enquiry has both strengths + limitations. STRENGTHS: I measured 5 sites systematically along the river; I used standard techniques (tape, ranging poles, float). LIMITATIONS: SAMPLE — 5 sites + one day is a snapshot; SOURCES — primary only, no Environment Agency triangulation; METHODOLOGY — float method had only 1 reading per site; CONFOUNDING — antecedent rainfall not controlled. BIASES: observer bias in stop-watch timing; sampling bias in choosing accessible sites. IMPROVEMENTS: 10+ sites systematic every 1 km; 3 days across 3 seasons; 3+ float readings per site; integrate EA historic discharge records + Met Office antecedent rainfall; calculate Spearman's ρ between distance + velocity; document GPS coordinates for replicability. The conclusion is consistent with the Bradshaw model but should be triangulated with multi-decade data + acknowledge snapshot nature.'

Coastal fieldwork evaluation:

'My Mappleton coastal-fieldwork enquiry has STRENGTHS: systematic sampling, multiple methods (beach width, profile, sediment, cliff retreat), triangulation with EA records. LIMITATIONS: SAMPLE — 5 sites + one day; SOURCES — partial secondary integration; METHODOLOGY — single observer, no pilot study; CONFOUNDING — recent storm history not documented; REPRODUCIBILITY — sites not GPS-marked. BIASES: observer bias in cliff-edge definition; selection bias in choosing accessible cliff faces. IMPROVEMENTS: 10+ sites systematic every 200 m; 3 days across tide stages + seasons; pilot study; 2 observers + inter-observer reliability; integrate Channel Coastal Observatory aerial photos; calculate Spearman's ρ; document GPS coordinates; consider wider Withernsea downdrift consequence. Strong, theory-grounded conclusion (longshore-drift trapping) with explicit caveats.'

Urban fieldwork evaluation:

'My urban-EQS-fieldwork enquiry has STRENGTHS: clear hypothesis, multiple variables (EQS, pedestrian count, photographs), choropleth + bar charts presentation. LIMITATIONS: SAMPLE — 5 sites along one transect + one day; SOURCES — primary only, no Census triangulation; METHODOLOGY — subjective Likert scoring + single observer; CONFOUNDING — weather + recent events not controlled; REPRODUCIBILITY — sites not detailed enough for replication. BIASES: observer bias in EQS subjectivity; selection bias in choosing accessible sites; confirmation bias in interpreting Site 4 outlier. IMPROVEMENTS: 12+ sites across 4 transects systematic every 500 m; multiple time slots + days; anchored Likert descriptors + 2 observers + inter-observer reliability; integrate UK ONS Census + IMD + DfT data; calculate Spearman's ρ; pilot study + GPS-marking. Interpretation of Site 4 outlier (shopping centre) using multi-nuclei model is theoretically grounded; conclusion qualifies the broad CBD-to-suburb gradient.'

Hazard fieldwork evaluation:

'My earthquake-hazard analysis has STRENGTHS: USGS magnitude + UN OCHA humanitarian data triangulation, application of hazard-risk framework. LIMITATIONS: SAMPLE — 20 earthquakes is small for global generalisation; SOURCES — newspaper reports for Haiti are biased (death-toll range 100k-316k); METHODOLOGY — cross-event comparison treats different events as comparable; CONFOUNDING — soil + building quality + tsunami not controlled. BIASES: media coverage bias (over-reported Western events); selection bias (using only major newspapers). IMPROVEMENTS: larger event sample; triangulate with peer-reviewed academic literature; control for soil + building quality + tsunami; calculate Spearman's ρ with confounders included. Conclusion: outliers like Christchurch reveal that magnitude alone is insufficient; HAZARD-RISK FRAMEWORK (Risk = Hazard × Exposure × Vulnerability) explains variation better. The Christchurch 2011 outlier is geographically informative + supports multi-variable analysis.'

Key insight. All four evaluations follow the same structure: STRENGTHS + LIMITATIONS by 5 categories + BIASES + SPECIFIC IMPROVEMENTS + THEORY-GROUNDED CONCLUSION. With practice, this becomes automatic.

Examiner tip. Practice each evaluation type until the 11-point checklist is fluent. Top-band Pearson 4GE1 evaluations are not difficult — they're systematic + honest + specific.

Use 11-point checklist for every evaluation.
Practise on river / coast / urban / hazard fieldwork.
Structure: STRENGTHS + LIMITATIONS by 5 categories + BIASES + IMPROVEMENTS.
Specific + quantified + balanced.
Top-band evaluations are systematic + honest + theory-grounded.

Quick recap

5 evaluation categories: Sample / Sources / Methodology / Confounding / Reproducibility.
Limitation = what wasn't done (reducible by doing more).
Bias = systematic distortion (requires methodological change).
Distinguish limitations from biases explicitly.
Identify confounding variables specifically.
Pair every limitation with specific + quantified improvement.
Avoid weak excuses ('I had no time', 'It rained').
Use cautious causal language ('consistent with' not 'proves').
Honest evaluation = methodological maturity.
Proportionate, specific, quantified, distinguished, balanced, improved-upon, theory-grounded.

Memorise this

Verbatim phrases and definitions Edexcel mark schemes credit.

5 evaluation categories: Sample, Sources, Methodology, Confounding, Reproducibility.
Limitation = what wasn't done; Bias = systematic distortion.
Bias types: observer / sampling / self-selection / measurement / confirmation / hindsight / aggregation.
Improvement format: limitation → specific + quantified solution.
Avoid weak excuses; use specific + quantified language.
Pearson examiners reward honesty + specificity + balanced evaluation.

How it’s examined

Spec Step 6 (Evaluation) is tested in Paper 2 Section B fieldwork (out of 70 marks).

Typical Paper 2 Section B question styles:

Evaluate methodology (3-4 marks) — assess one aspect of the enquiry method.
Evaluate sample (3-4 marks) — assess sample size + sampling strategy.
Evaluate sources (3-4 marks) — assess primary + secondary source reliability.
Evaluate conclusion (4-6 marks) — assess the conclusion drawn + suggest improvements.
Comprehensive evaluation (6-8 marks) — evaluate the enquiry across all 5 categories.
Identify confounding variables (3-4 marks) — name + explain confounding variables not controlled.

Mark-scheme keywords examiners credit: evaluation, sample size, spatial coverage, temporal coverage, sampling strategy, reliability, validity, accuracy, calibration, repeats, mean, inter-observer reliability, observer bias, sampling bias, self-selection bias, measurement bias, confirmation bias, confounding variable, deprivation, weather, season, recent events, replicability, reproducibility, triangulation, secondary data, Environment Agency, ONS, USGS, World Bank, improvement, larger sample, more days, additional sources, statistical tests, Spearman's rank, line of best fit, pilot study, anchored descriptors, standardisation.

Pearson 4GE1 Appendix 5 explicitly requires students to evaluate their enquiry. Pearson examiners credit students who address ALL 5 categories + DISTINGUISH limitations from biases + PROPOSE SPECIFIC + QUANTIFIED IMPROVEMENTS + USE CAUTIOUS CAUSAL LANGUAGE.

Sources: Pearson Edexcel International GCSE Geography (4GE1) Specification — Issue 4, February 2026, Appendix 5 + Appendix 7; Field Studies Council — fieldwork evaluation guidance + exemplar evaluations; Royal Geographical Society — Evaluation + Reflection skills; IPCC — confidence levels methodology (epistemological framework); Pearson 4GE1 Examiner Reports — Paper 2, Section B. Last reviewed 2026-06-03.

Take this whole topic with you

Step-by-step worked examples — Evaluation on data, methods and conclusions

Step-by-step solutions to past-paper-style questions on evaluation on data, methods and conclusions, written exactly the way a tutor would explain them at the board.

1The 5 evaluation categories
Getting started• AO3
Question
Name + describe the FIVE main evaluation categories for a fieldwork enquiry. [5]
Step-by-step solution
1. Step 1
  1. SAMPLE (1 mark). Size + spatial coverage + temporal coverage + representativeness. Was the sample large enough? Was it random / systematic / stratified or biased?
2. Step 2
  2. SOURCES (1 mark). Reliability + uncertainty + currency of data sources. Were primary methods well-calibrated? Were secondary sources current + reliable + appropriate?
3. Step 3
  3. METHODOLOGY (1 mark). Replicability + accuracy + observer bias + weather + time. Were methods well-documented enough for replication?
4. Step 4
  4. CONFOUNDING VARIABLES (1 mark). What other factors might have affected results? Have we controlled for them?
5. Step 5
  5. REPRODUCIBILITY (1 mark). Could another team replicate the study + reach similar conclusions?
Answer
(1) SAMPLE (size + bias + coverage). (2) SOURCES (reliability + currency). (3) METHODOLOGY (replicability + accuracy). (4) CONFOUNDING VARIABLES (uncontrolled influences). (5) REPRODUCIBILITY (could another team repeat?).
Examiner tip
Pearson 4GE1 mark schemes specifically credit students who address ALL five categories. Use this as a checklist for every evaluation answer.
2Limitation vs bias
Getting started• AO3
Question
Distinguish a LIMITATION from a BIAS in fieldwork enquiry. Give one example of each. [3]
Step-by-step solution
1. Step 1
  Limitation (1 mark). Something that WASN'T DONE or COULDN'T BE DONE that reduces the strength of conclusions — e.g. sample size, time available, equipment unavailable, depth of analysis.
2. Step 2
  Limitation example (1 mark). 'Only 5 sites were measured because of time constraints; 10+ would have provided better spatial coverage.'
3. Step 3
  Bias (1 mark). SYSTEMATIC DISTORTION in how data was collected or interpreted that makes the data point reliably in one direction — e.g. observer bias, self-selection bias, measurement bias, sampling bias.
Answer
Limitation = what wasn't done (sample, time, methods). Bias = systematic distortion (observer, sampling, measurement). Example: 'Only 5 sites' = limitation; 'Observer scored EQS more harshly at urban sites' = bias.
Examiner tip
Limitations are reducible by doing more; biases require methodological change. Pearson 4GE1 mark schemes credit students who distinguish the two.
3Evaluating sample
Building confidence• AO3
Question
A student counted pedestrians at 4 sites for 10 minutes each on a single Saturday afternoon to test 'pedestrian footfall is higher in the CBD than the inner suburbs'. EVALUATE the sample. [4]
Step-by-step solution
1. Step 1
  Sample SIZE (1 mark). 4 sites is small — each represents a wide area + may be unrepresentative. INCREASE to 6+ sites per area (12+ total) with systematic sampling every 200 m.
2. Step 2
  Sample TIME (1 mark). 10 minutes per site is a snapshot. Pedestrian flow varies hugely by time of day, day of week, weather, events. EXTEND to 15 mins × 3 time slots × 2 days (weekday + weekend) = 6 counts per site for averaging.
3. Step 3
  Sample TIMING (1 mark). Saturday afternoon is one specific time. Saturday afternoon CBD is typically peak shopping time, weighted toward leisure. Different from weekday-morning CBD use. Sample MULTIPLE days + times to capture variation.
4. Step 4
  Sample COVERAGE (1 mark). 4 sites only along the transect. Need sites spread across the FULL transect with even spacing. SYSTEMATIC sampling every 200 m provides this.
Answer
Sample is too small + too short + too narrow. (1) Increase to 12+ sites (systematic every 200 m). (2) 15 mins × 3 time slots × 2 days. (3) Vary days + times to capture variation. (4) Even spatial coverage.
Examiner tip
Top-band 4-mark answer addresses SIZE + TIME + COVERAGE + JUSTIFICATION. Pearson 4GE1 mark schemes specifically credit sample-size critique.
4Evaluating methodology
Building confidence• AO3
Question
EVALUATE the methodology of: 'a student measured river depth at 5 sites using a metre rule, but only at the channel CENTRE; used a stopwatch + paper float for velocity, with 1 reading per site; took photographs but no annotated sketches.' [4]
Step-by-step solution
1. Step 1
  Depth measurement (1 mark). Only at the CENTRE — ignores cross-channel variation. The CENTRE is typically deepest; mean depth across the channel would be lower. Should measure at 5 points across the channel + take mean.
2. Step 2
  Velocity (1 mark). Only 1 reading per site = no estimate of measurement uncertainty. Repeat 3+ times + take mean. Use a PROPER float (not paper — too light, blown by wind). Add 0.85 correction factor to convert surface to mean velocity.
3. Step 3
  Qualitative evidence (1 mark). Photographs without ANNOTATED SKETCHES + observations lose qualitative depth. Sketches force observation; photos can be passive. Add annotated sketches at every site.
4. Step 4
  Standardisation + reliability (1 mark). No mention of same observer / consistent metres rule / pilot study. Standardise: same observer; same equipment; pre-designed recording sheets; pilot before main field day. Pearson 4GE1 mark schemes credit replicability.
Answer
(1) Depth: measure at 5 points across channel + mean (not just centre). (2) Velocity: repeat 3+ × + use proper float + × 0.85 correction. (3) Add annotated sketches alongside photos. (4) Standardise observer + equipment + pilot study.
Examiner tip
Pearson 4GE1 mark schemes specifically credit replicability + standardisation + repeat measurements + use of correction factors (× 0.85 for float method).
5Identifying + addressing confounding variables
Stretch• AO3
Question
A student tested 'fast-food density correlates with obesity' across 10 UK wards and found ρ = +0.78. Identify TWO confounding variables + suggest how to control for them in an improved study. [4]
Step-by-step solution
1. Step 1
  Confounding 1 — DEPRIVATION (1 mark). Deprivation drives BOTH fast-food density (cheaper food in cheaper land) AND obesity (lower healthy-food spending). The correlation is partly explained by deprivation, not direct causation.
2. Step 2
  Control for deprivation (1 mark). Stratify analysis by DEPRIVATION GROUP. Compare fast-food density + obesity WITHIN each deprivation tier. If fast-food still correlates with obesity AT THE SAME DEPRIVATION LEVEL, the correlation is more likely true causation.
3. Step 3
  Confounding 2 — POPULATION COMPOSITION (1 mark). Wards with younger populations have different fast-food consumption + obesity patterns. Wards with more ethnically diverse populations have different food cultures. These compositional factors could drive both variables.
4. Step 4
  Control for population (1 mark). Use age-standardised obesity rates; analyse by age cohort; control for ethnic-cultural variation. Use INDEX OF MULTIPLE DEPRIVATION (IMD) to capture multiple compositional factors.
Answer
(1) DEPRIVATION (cheaper food in cheaper land + lower healthy spending) — stratify analysis by deprivation tier. (2) POPULATION composition (age + ethnicity affect both variables) — use age-standardised rates + IMD. Controlling these isolates true causation from confounded correlation.
Examiner tip
Top-band 4-mark answer NAMES specific confounding variables + suggests SPECIFIC controls (stratification, age-standardisation, IMD). The 'fast food causes obesity' example is the canonical 4GE1 causation question.
6Improvement plan
Stretch• AO3
Question
Design an IMPROVEMENT plan for a 5-site, 1-day coastal-fieldwork enquiry on the Mappleton beach-width hypothesis. List the specific improvements + why each one strengthens the conclusion. [6]
Step-by-step solution
1. Step 1
  Improvement 1 — Sample size (1 mark). Increase to 10+ sites along ~2 km of coast with systematic sampling every 200 m. Why: better spatial coverage; reduces influence of any single atypical site; reveals fine-scale pattern detail.
2. Step 2
  Improvement 2 — Temporal coverage (1 mark). Repeat the entire transect across 3 different days (e.g. May, August, November) + at different tide stages (low + mid). Why: captures seasonal + tide-stage variation; reveals whether the snapshot is representative.
3. Step 3
  Improvement 3 — Reliability checks (1 mark). Two observers measure beach width independently at each site; calculate inter-observer agreement. Why: reveals measurement bias + improves data quality.
4. Step 4
  Improvement 4 — Multiple methods (1 mark). Add beach profile, sediment sampling, cliff retreat measurements; integrate with secondary data (EA historic records, Channel Coastal Observatory aerial photos). Why: triangulation strengthens the conclusion.
5. Step 5
  Improvement 5 — Statistical testing (1 mark). Calculate Spearman's rank correlation between distance from groyne + beach width; compute confidence interval if possible. Why: quantifies the relationship strength + statistical confidence.
6. Step 6
  Improvement 6 — Confounding variable control (1 mark). Note recent storm history + sediment-supply context. Compare with similar protected sites (e.g. Sea Palling) for cross-validation. Why: isolates the groyne effect from natural variability.
Answer
(1) 10+ sites systematic every 200 m. (2) Repeat across 3 days + tide stages. (3) Inter-observer reliability checks. (4) Add multiple methods + secondary data triangulation. (5) Calculate Spearman's ρ + statistical confidence. (6) Note storm history + cross-validate with other protected sites.
Examiner tip
Top-band 6-mark answer addresses MULTIPLE evaluation categories (sample + time + observer + methods + statistics + confounding). Each improvement has a JUSTIFIED REASON. Pearson 4GE1 mark schemes specifically credit specific + justified improvement plans.

Model Answers — Evaluation on data, methods and conclusions

High-scoring sample answers for evaluation on data, methods and conclusions on the Cambridge IGCSE paper, with examiner-style notes mapping each response to the mark scheme and assessment objectives.

Question 1
2 marks
[EASY] What is the PURPOSE of evaluation in a fieldwork enquiry? [2]
Model answer
The purpose of evaluation is to HONESTLY REFLECT on the quality of the enquiry + identify limitations + biases + improvements.

Why this matters:

Conclusions are only reliable INSOFAR AS the data + methods are reliable. Evaluation tests this.

Identifying limitations leads to specific improvements for future research.

Honest acknowledgement of bias + uncertainty builds intellectual credibility.

Pearson 4GE1 mark schemes specifically credit students who evaluate.

A study without evaluation OVER-CLAIMS its conclusions. A study with thorough evaluation acknowledges what it doesn't know — this is the mark of academic maturity.
Why this scores
Mark scheme: 1 mark for 'honest reflection on enquiry quality'; 1 mark for explaining WHY (over-claim risk, improvements, credibility).
Question 2
3 marks
[EASY] Name the FIVE main evaluation categories for fieldwork. [3]
Model answer
The 5 evaluation categories:

SAMPLE — size + spatial coverage + temporal coverage + representativeness.

SOURCES — reliability of primary + secondary data; uncertainty + currency.

METHODOLOGY — replicability + equipment accuracy + observer bias + standardisation.

CONFOUNDING VARIABLES — other factors that could have affected results.

REPRODUCIBILITY — could another team replicate the study + reach similar conclusions?

Pearson 4GE1 mark schemes specifically credit students who address ALL five categories in any evaluation answer.
Why this scores
Mark scheme: 1 mark per 3 categories named. The 5 together are the standard checklist for any fieldwork evaluation.
Question 3
4 marks
[MEDIUM] A student counted pedestrians at 4 sites for 10 minutes each on a single Saturday afternoon. EVALUATE the sample + suggest improvements. [4]
Model answer
Evaluation of the sample.

(1) Sample SIZE is small. Only 4 sites + 10 minutes each = limited data points. Each site has to represent a wide area + each 10-min slot has to represent the whole day. With this small a sample, any unusual event (a flash crowd, a sports match, a closed road) could distort the conclusions disproportionately.

(2) Temporal coverage is narrow. Saturday afternoon is one specific time. Pedestrian flow varies enormously with:

Day of week (weekday rush hour vs weekend leisure).

Time of day (morning vs lunch vs afternoon vs evening).

Weather (sunny vs rainy).

Events (sales, festivals, sports matches).

A Saturday afternoon sample captures one slice of one demographic — unrepresentative of pedestrian flow generally.

(3) Spatial coverage may be uneven. 4 sites may not be evenly distributed along the transect. Some sites may cluster + some areas may be missed. Without systematic sampling, the spatial pattern is incomplete.

(4) The data may be highly seasonal / event-specific. A Saturday in summer is different from a Saturday in winter; Christmas shopping period skews results.

Improvements.

More sites. Increase from 4 to 12+ sites along the transect, chosen by SYSTEMATIC SAMPLING at every 200 m. This ensures even spatial coverage + reduces influence of any single atypical site.

More time slots. Count at each site for 15 mins × 3 time slots per day (08:30 rush hour, 12:30 lunch, 16:30 rush hour). This captures within-day variation.

Multiple days. Repeat across at least 2 days — one weekday + one weekend. This captures within-week variation.

Multiple weeks. If feasible, repeat across 4 weeks to capture month-level variation + average out events. This is the gold standard.

Standardisation. Same observers; same definition of 'pedestrian' (does a cyclist count? a child? a runner?); use mechanical tally counters to avoid mis-counting.

Triangulation. Compare counts with secondary data — UK Department for Transport pedestrian / vehicle counts; council retail footfall data; ONS Census journey-to-work data.

Evaluation by inter-observer reliability. Two counters at one site; if counts differ by > 10%, retrain.

Pearson 4GE1 mark schemes specifically credit students who quantify their sample-size critique + propose SPECIFIC improvements with numbers. 'More sites' alone is partial; '12+ sites every 200 m + 3 time slots × 2 days = 36 counts total' is full-mark territory.
Why this scores
Mark scheme: 1 mark for sample-size critique; 1 mark for temporal critique; 1 mark for spatial critique; 1 mark for specific improvements with numbers. Top-band 4-mark answer integrates limitations + specific quantified improvements.
Question 4
4 marks
[MEDIUM] A student uses NEWSPAPER REPORTS as a secondary source for an enquiry into the 2010 Haiti earthquake. EVALUATE the reliability of this source + suggest alternatives. [4]
Model answer
Evaluation of newspaper-report reliability.

Strengths.

Newspapers capture HUMAN IMPACTS + IMMEDIATE CONTEXT that technical sources miss.

Eyewitness accounts + photographs + named individuals provide qualitative texture.

Widely accessible + free + searchable archives.

Useful for capturing the public + political narrative of an event.

Weaknesses.

(1) Early-stage uncertainty. Death tolls were ~100,000 in early reports vs ~316,000 in the final Haitian government figure — a 3× range. Newspaper reports often cite the figures available AT TIME OF WRITING + may not be updated. Reliability is highest weeks-to-months after the event.

(2) Narrative bias. Newspapers select dramatic stories. They UNDER-REPORT recovery + over-report disaster. The narrative shaped global aid response but may distort historical understanding.

(3) National perspective. US + UK papers emphasise their own response (US military aid; UK aid). Haitian + Caribbean papers offer different perspectives. A student using only Western papers gets a culturally narrow view.

(4) Inaccurate technical details. Magnitudes, distances, casualty figures often imprecise. Newspapers are not technical sources.

(5) Sources within reports. Newspapers cite officials, NGOs, eyewitnesses. Each carries its own bias + uncertainty.

Reliability verdict. Newspaper reports are USEFUL for the human + immediate context of disasters but should NEVER be relied on for technical claims (magnitudes, casualties, mechanisms) without triangulation.

Alternative + complementary sources.

USGS — earthquake catalogue, magnitude (M7.0), epicentre, ShakeMap, technical detail. AUTHORITATIVE for technical claims.

UN OCHA ReliefWeb — official humanitarian response data + casualties + displaced + funding. CONSISTENT methodology across disasters.

Haitian government final reports — final official death toll (~316,000) + reconstruction data. ESSENTIAL for the most accurate national figure.

World Bank + IMF reports — economic impact + reconstruction financing + debt context. ESSENTIAL for economic implications.

Academic literature — peer-reviewed analyses of the disaster + recovery. Provides theoretical framing + multi-disciplinary perspectives.

NGO reports — Red Cross + Médecins Sans Frontières + Oxfam — provide humanitarian perspective with named workers + sites.

Insurance + reinsurance reports — Munich Re catastrophe analysis quantifies economic impact.

Triangulation strategy. Use newspapers for human context; USGS for technical magnitude; Haitian government + UN OCHA for casualties; World Bank for economy; academic literature for theoretical framing. When all sources agree → conclusion robust. When they diverge → investigate why.

Pearson 4GE1 mark schemes specifically credit students who NAME alternative sources + recognise the bias-uncertainty profile of each.
Why this scores
Mark scheme: 1 mark for newspaper strengths; 1 mark for narrative + uncertainty + national-perspective weaknesses; 1 mark for SPECIFIC alternative sources (USGS, UN OCHA, Haitian government, WHO); 1 mark for triangulation strategy. The Haiti 100k-316k example is the canonical 4GE1 source-reliability illustration.
Question 5
4GE1 Paper 2 Section B style (AO3)6 marks
[HARD] A student tested 'cliff retreat is faster at unprotected Holderness coast than at engineered Mappleton' over a single 1-day field visit at 5 sites. EVALUATE the conclusion they reached (it was supported) + suggest improvements. [6]
Model answer
The conclusion — 'data SUPPORT the hypothesis' — at face value.

The student measured cliff retreat at 5 sites + found higher retreat rates at unprotected sites than at Mappleton. The conclusion is consistent with established geographical theory + Environment Agency multi-decade data.

But the conclusion's confidence depends on the QUALITY of the enquiry.

Limitation 1 — Sample size + spatial coverage.

5 sites along a ~5 km transect is small. Each site represents a wide stretch + may be atypical. INCREASE to 10+ sites systematic every 200 m. The 5-site sample could miss spatial heterogeneity within the unprotected sections + within the Mappleton section.

Limitation 2 — Temporal coverage.

ONE field day is a snapshot. Cliff retreat occurs in EPISODES — often during major storms. The annual mean retreat is the integral of many small events + occasional large ones. A snapshot measurement of cliff position cannot capture the temporal dynamics.

Improvement: combine primary measurements with SECONDARY EA HISTORIC RECORDS for the same sites going back 20-30 years. This captures multi-year + multi-decade trends + storm-related episodes.

Limitation 3 — Method accuracy.

How was cliff retreat measured? Distance from a fixed reference (a road, a stake, a GPS mark) at a single snapshot. The 'cliff edge' definition matters — where rock meets vegetation? Where the visible face is? Different observers may define differently. Improvement: standardise the cliff-edge definition; use a SINGLE OBSERVER; calibrate measurements.

Limitation 4 — Confounding variables.

The student claims 'engineering protects cliffs' but other variables differ:

Geology. Different sites have different rock + soil resistance.

Storm history. Recent storms may have hit one section harder than the other.

Tide stage. Some sites may be more exposed to high tide than others.

Vegetation + drainage. Above-ground processes affect cliff stability.

Improvement: choose comparable sites (similar geology, similar exposure); document storm history; note tide stage; control for vegetation differences.

Limitation 5 — Reproducibility.

Could another team measure the same sites + reach similar conclusions? The methods need detailed documentation: GPS coordinates, observer name + qualifications, equipment, methodology, conditions on the day. Pearson 4GE1 mark schemes specifically credit replicability.

Limitation 6 — Wider context.

The 'engineering protects coast' conclusion ignores the WIDER COASTAL CONTEXT — Mappleton's engineering creates DOWNDRIFT STARVATION at Withernsea (~10-15 m additional retreat over 30 years). A local conclusion that engineering protects is true; the wider system shows engineering has SHIFTED THE PROBLEM. The conclusion should acknowledge this regional cost.

Bias considerations.

Observer bias. The student may have been measuring while biased to confirm the hypothesis. Inter-observer reliability check would identify this.

Selection bias. The 5 sites chosen may be those most consistent with the hypothesis. Use SYSTEMATIC sampling at fixed intervals.

Confirmation bias. The student should weight evidence equally — not favour data supporting the hypothesis.

Improvements to the enquiry.

More sites + systematic sampling. 10+ sites every 200 m along ~5 km.

Multiple field days. Repeat 3+ times across the year + after major storms.

Secondary data triangulation. UK Environment Agency historic records; Channel Coastal Observatory aerial photographs.

Multiple observers. Two observers measure independently; calculate inter-observer agreement.

Confounding variable control. Match sites for geology + exposure; document storm history; control for tide stage.

Statistical testing. Calculate Spearman's ρ + confidence interval; formal test.

Wider context. Note Withernsea downdrift consequence + UK Shoreline Management Plan implications.

Replicability. Document methods thoroughly so another team could replicate.

A reliable conclusion would read:

'My fieldwork SUPPORTS the hypothesis that cliff retreat is faster at unprotected Holderness coast than at engineered Mappleton, with the caveats that: (1) sample size is small (5 sites); (2) data is from one day; (3) confounding variables (geology, storm history, tide stage) were not fully controlled; (4) the engineering protects LOCALLY but creates DOWNDRIFT STARVATION at Withernsea (~10-15 m additional retreat over 30 years). The conclusion is consistent with Environment Agency multi-decade data. Improvements would include more sites, multiple seasons, formal statistical testing + wider system context.'

This is the standard of evaluation Pearson 4GE1 mark schemes credit at top-band.
Why this scores
Mark scheme: 1 mark per evaluation category covered (sample / time / method / confounding / reproducibility / wider context — at least 6 needed). 1 mark for bias considerations. 1 mark for specific improvement plan. Top-band 6-mark answer demonstrates ALL 5 evaluation categories + bias + improvements + acknowledges the wider Holderness system.
Question 6
6 marks
[HARD] A student finds correlation between fast-food density + obesity (ρ = +0.78) + concludes 'fast-food causes obesity'. EVALUATE this conclusion + suggest improvements. [6]
Model answer
Evaluation of the causal claim.

The student finds a strong correlation (ρ = +0.78) between fast-food density + obesity. They conclude 'fast-food causes obesity'. This is an OVER-CLAIM that fails to acknowledge multiple alternative interpretations.

Reasons the conclusion is unsound.

(1) Correlation ≠ causation. Strong correlation indicates association but not causal direction. The student has shown that two variables MOVE TOGETHER, not that one DRIVES the other.

(2) Reverse causation. Fast-food outlets may LOCATE in high-obesity wards (companies target neighbourhoods where the product sells). The causation could run from obesity → outlets, not outlets → obesity.

(3) Confounding variable — DEPRIVATION. Both fast-food density + obesity may be CAUSED by a third variable: deprivation. Lower-income wards have:

More fast-food outlets (cheaper food in cheaper land).

Higher obesity (less spending on healthy food + fewer healthy-food retailers + less time for cooking).

Lower physical activity levels (fewer gyms + parks).

Higher stress + healthcare-access barriers.

The deprivation context drives BOTH variables. The correlation is NOT evidence of direct fast-food → obesity causation.

(4) Confounding variable — COMPOSITION. Wards differ in:

Age structure (older populations have higher obesity but different fast-food consumption).

Ethnicity (different food cultures + obesity prevalence).

Education levels (correlate with health knowledge).

Housing density + working hours.

Without controlling for these, the correlation reflects compositional differences as much as direct causation.

(5) Selection bias. 10 wards is a small + non-random sample. The wards CHOSEN may be those most consistent with the hypothesis. Use a larger + systematic sample.

(6) Aggregation problem. Ward-level data MASKS within-ward variation. Different streets within a ward may have very different fast-food density + obesity patterns. The ward-level correlation is informative but doesn't reach individual-level causation.

(7) Temporal direction. The student data is a snapshot. To show causation, you need temporal ordering — outlets opened BEFORE obesity rose. Time-series data would help.

(8) Mechanism. The student doesn't propose a specific MECHANISM connecting outlets to obesity. Mechanisms could include: increased access to high-calorie food; price effects; advertising exposure; cultural normalisation. Without specifying + testing the mechanism, causation is speculative.

Improvements to the enquiry.

State conclusion cautiously. 'Fast-food density + obesity are STRONGLY ASSOCIATED, possibly mediated by deprivation. Direct causation requires further investigation.'

Larger sample. Increase from 10 wards to 100+ wards. Reduces influence of chance + atypical wards.

Control for deprivation. STRATIFY analysis — calculate ρ separately for low / middle / high deprivation wards. If the correlation persists WITHIN strata, the causal claim is stronger.

Control for composition. Use AGE-STANDARDISED obesity rates; control for ethnicity + education levels.

Use Index of Multiple Deprivation (IMD). IMD captures multiple compositional + deprivation factors. Compare wards at similar IMD levels.

Time-series data. Track changes in fast-food density + obesity over 5-10 years. Did outlets open before obesity rose, or vice versa?

Mechanism analysis. Survey residents on fast-food consumption + relate to obesity outcomes at individual level.

Triangulation. Compare findings with academic literature, public-health studies, NHS data, World Health Organisation evidence.

Cite academic literature. Multiple peer-reviewed studies have investigated this relationship — cite + integrate findings.

Acknowledge limitations explicitly. State what the data CAN'T support + what would strengthen the conclusion.

A reliable conclusion would read:

'My data show a strong correlation (ρ = +0.78) between fast-food density + obesity across 10 UK wards. This suggests an ASSOCIATION between the two variables. However, correlation does NOT prove causation: alternative explanations include reverse causation (outlets locate in high-obesity wards), confounding variables (deprivation drives both), composition effects (age + ethnicity + education), and aggregation problems (ward-level masks individual-level causation). Improvements would include larger sample, control for deprivation + composition, time-series data, and individual-level survey. The conclusion should be stated as: fast-food density + obesity are STRONGLY ASSOCIATED, possibly mediated by deprivation. Direct causation requires further multi-method investigation.'

Pearson 4GE1 mark schemes specifically credit students who distinguish correlation from causation + identify confounding variables + state conclusions cautiously.
Why this scores
Mark scheme: 1 mark for correlation ≠ causation; 1 mark for reverse-causation possibility; 1 mark for confounding variable (deprivation); 1 mark for specific improvements (stratification, age-standardisation, IMD, time-series); 1 mark for cautious conclusion wording; 1 mark for mechanism + literature integration. The fast-food + obesity causation question is canonical 4GE1.
Question 7
4GE1 Paper 2 Section B style (AO3/AO4)8 marks
[CHALLENGE] A complete fieldwork enquiry on 'urban inequality' was carried out with 5 sites + one field day + primary data only. EVALUATE the enquiry comprehensively + PROPOSE a redesigned version that addresses all five evaluation categories. [8]
Model answer
Context. The student's enquiry on 'urban inequality' used 5 sites along a CBD-to-suburb transect, collected primary data on one field day, and reached conclusions about spatial inequality.

Comprehensive evaluation across 5 categories.

(1) SAMPLE.

Sites. 5 sites is small for a CBD-to-suburb transect potentially 5-10 km. Each site represents a wide area. Improvement: 10+ sites with systematic sampling every 500 m.

Spatial coverage. Single transect may not be representative of the city as a whole. Improvement: multiple transects through different city districts (north, south, east, west).

Temporal coverage. One field day is a snapshot. Pedestrian + traffic + EQS vary with day of week + season + weather + events. Improvement: repeat across multiple weeks + seasons.

Sample size for questionnaires. If questionnaires were used, 20+ respondents per site is the minimum. Improvement: 30-50 respondents per site for statistical power.

(2) SOURCES.

Primary methods + equipment. What methods? Calibrated? Recently checked? Improvement: document equipment + calibration + cite standardisation.

Secondary data — TOTALLY MISSING. A primary-only enquiry is INCOMPLETE for urban inequality. UK Census 2021 (ONS) + UK Index of Multiple Deprivation (IMD) + council planning data + Department for Transport traffic data + ONS journey-to-work data are essential context for ANY urban-inequality enquiry. Improvement: integrate 4-6 named secondary sources.

Reliability of own measurements. EQS uses 1-5 Likert scale — subjective. Improvement: anchor descriptors + inter-observer reliability checks.

(3) METHODOLOGY.

Replicability. Could another team replicate the study + reach similar conclusions? Need detailed documentation. Improvement: GPS-mark sites + document observer + equipment + standardisation procedures.

Pilot study. Was a pilot run? Probably not. Improvement: conduct a pilot at one site before main field day to test recording sheets + timings.

Repeats. Were measurements repeated 3+ times? Probably not for all variables. Improvement: 3+ repeats with mean for every quantitative measure.

Standardisation. Same observer? Same time of day? Same weather? Standardise.

Triangulation. Multiple methods (quant + qual) used? If primary-only, qualitative methods missed. Improvement: add fieldwork sketches + photos + questionnaire alongside quantitative.

(4) CONFOUNDING VARIABLES.

Day of week + weather would affect pedestrian counts + EQS scores.

Major events (festivals, sales, road closures, sports) would skew data.

Demographic composition of each site (age, ethnicity, income) would affect both EQS perception + objective measurements. Improvement: triangulate with Census data per ward.

City history — gentrification, decline, regeneration — shapes current inequality patterns. Improvement: integrate historic OS maps + literature on urban change.

(5) REPRODUCIBILITY.

Could another team repeat? Depends on documentation. Improvement: detailed methodology write-up.

Would another team reach similar conclusions? Depends on representative sample. Single-snapshot + single-transect = unlikely.

Bias considerations.

Selection bias. The 5 sites chosen may favour the hypothesis. Use systematic sampling.

Observer bias. EQS scoring is subjective. Anchor descriptors + use multiple observers.

Questionnaire response bias. Self-selected respondents tend to be available + verbal. Improvement: random or stratified sampling within sites.

Confirmation bias. Student may interpret data in line with hypothesis.

REDESIGNED ENQUIRY.

Hypothesis: 'Urban inequality (measured by EQS, deprivation, pedestrian access + retail mix) varies systematically across [city] in line with multi-nuclei urban model + concentric-zone deprivation pattern.'

Sites. 12 sites across the city — 3 sites along each of 4 different transects (north, south, east, west) — chosen by SYSTEMATIC sampling at 500 m intervals along each transect.

Primary data.

EQS at each site with anchored Likert 1-5 descriptors; 2 observers per site; inter-observer reliability check.

Pedestrian + traffic counts at 3 time slots per day (08:30, 12:30, 16:30) on 2 days (weekday + Saturday) — 6 counts per site.

Annotated field sketches at each site.

Fixed-point photographs at each site, looking same direction.

Structured questionnaire (5 questions, Likert + open) with 30+ respondents per site, stratified by age + ethnicity.

Secondary data.

UK Census 2021 (ONS) — population, ethnicity, housing tenure, deprivation by output area.

UK Index of Multiple Deprivation (IMD) by lower-layer super output area.

Council planning portal — development applications + density.

UK Department for Transport — traffic data on main roads.

ONS journey-to-work data — commuter flows.

Historic OS maps — urban change over past century.

Academic literature on urban inequality.

Methodology.

Pilot study at 1 site before main field day.

All sites visited in same week (consistency).

2 observers per site; inter-observer reliability checks.

Calibrated equipment; pre-designed recording sheets.

Repeat measurements 3+ × per variable.

Risk assessment documented + signed by teacher.

Processing + presentation.

Tables with means, ranges, IQRs per site.

Stacked bar chart of EQS scores by site.

Choropleth map of IMD across the city with study transects overlaid (sequential colour, 5 class intervals, 5 map elements).

Scatter graph of EQS vs IMD (per site).

Line graphs of variables along each transect.

Annotated photos + sketches.

Analysis.

Spearman's rank between distance from CBD + EQS, IMD, pedestrian count.

Identify + explain outliers (e.g. shopping centres = multi-nuclei nodes).

Apply theory: BURGESS + MULTI-NUCLEI + central-place + environmental injustice.

Integrate primary + secondary findings.

Conclusion.

4-part structure: verdict + figures + theory + implications.

Multi-hypothesis if multiple hypotheses tested.

Cautious causal language.

Wider implications for urban planning + policy + equity.

Evaluation.

Acknowledge limitations (sample size, time, observer, confounding).

Propose specific further improvements (more weeks, multi-city replication, mobile-phone movement data).

Distinguish limitations from biases.

Triangulation.

Primary + secondary + statistical + qualitative + theoretical.

This redesign addresses ALL 5 evaluation categories + 5+ confounding variables + adopts inter-observer reliability + pilot study + systematic sampling + comprehensive secondary data. Pearson 4GE1 mark schemes would credit this as gold-standard fieldwork design.
Why this scores
Mark scheme: 1 mark per evaluation category critique (5 needed); 1 mark for bias considerations; 1 mark for redesigned sample + sampling; 1 mark for redesigned methods (calibration, pilot, reliability checks, repeats); 1 mark for secondary-data integration; 1 mark for analysis improvements; 1 mark for conclusion + evaluation improvements; 1 mark for triangulation. Top-band 8-mark answer demonstrates COMPREHENSIVE evaluation + COMPLETE redesign that addresses all 5 categories systematically.
Question 8
4GE1 Paper 2 Section B style (AO3 evaluation)10 marks
[EXTENDED] 'The MARK OF METHODOLOGICAL MATURITY is the honesty with which one EVALUATES one's own enquiry.' Discuss this statement with reference to fieldwork in geography. [10]
Model answer
Thesis. The statement captures a fundamental epistemological insight: SCIENCE PROGRESSES THROUGH HONEST SELF-CRITIQUE. A study that ACKNOWLEDGES its limitations is more reliable than one that hides them — even though the acknowledgement may seem to weaken the conclusion. Pearson 4GE1 mark schemes consistently credit this discipline; the strongest fieldwork enquiries are not those with the cleanest results but those with the most HONEST EVALUATION.

Why honest evaluation matters — five reasons.

(1) Reliability + trust. A conclusion grounded in HONEST evaluation tells the reader EXACTLY what to trust + EXACTLY where to be sceptical. The reader can calibrate confidence to the evidence. A conclusion that hides limitations forces the reader to take it on faith.

(2) Improvement guidance. Honest evaluation IDENTIFIES specific weaknesses + their solutions. 'My sample was 5 sites; future research should use 10+ systematic at 200 m intervals' is concrete + actionable. Hidden limitations leave the field stuck.

(3) Replication + cumulative knowledge. Science advances when studies BUILD on each other. A study that acknowledges limitations enables the next study to ADDRESS them. A study that hides limitations stalls progress.

(4) Distinguishing 'consistent with' from 'proves'. Fieldwork data SUPPORTS or REJECTS hypotheses; it does not PROVE them. The honest evaluator says 'consistent with longshore-drift trapping, with the caveats that ...' — the dishonest one says 'proves engineering works.' The honest framing is more accurate.

(5) Real-world stakes. In coastal management, dam design, beach replenishment + flood-defence planning, OVER-CONFIDENT conclusions can lead to catastrophe. The 2005 Hurricane Katrina + 1953 East Anglian flood + 2011 Tohoku tsunami all involved engineering over-confidence in protective systems. Honest evaluation acknowledging uncertainty produces SAFER decisions.

The 5 categories of honest evaluation.

(1) SAMPLE. Was the sample large enough + spatially + temporally + representatively sampled? Most school fieldwork has small samples + one-day timing — acknowledge.

(2) SOURCES. Were primary methods well-calibrated? Secondary sources reliable + current? Many fieldwork enquiries rely on uncalibrated equipment + outdated secondary data — acknowledge.

(3) METHODOLOGY. Were methods replicable + standardised + repeated? Were observers consistent? Most fieldwork has methodological gaps — acknowledge.

(4) CONFOUNDING VARIABLES. What ELSE could have affected the results? Weather, recent events, demographic composition, geological variation, storm history — acknowledge.

(5) REPRODUCIBILITY. Could another team replicate the study + reach similar conclusions? Most school fieldwork has documentation gaps — acknowledge.

Distinguishing LIMITATIONS from BIASES.

A LIMITATION is something that WASN'T DONE or COULDN'T BE DONE (e.g. sample size, time, methods). Limitations are REDUCIBLE by doing more.

A BIAS is a SYSTEMATIC DISTORTION that affects how data was collected or interpreted (e.g. observer bias, self-selection bias, confirmation bias). Biases require METHODOLOGICAL CHANGE — not just more of the same.

Distinguishing them is itself a sign of methodological maturity.

Examples from 4GE1 fieldwork.

(a) Holderness coastal fieldwork. A study finds beach width 47 m updrift + 18 m downdrift, supporting longshore-drift trapping. Honest evaluation:

'My fieldwork SUPPORTS the hypothesis. LIMITATIONS: 5 sites + one day; cliff-edge definition subjective; recent storm history not controlled. BIASES: I was measuring while the hypothesis was confirmed, so I may have favoured measurements supporting the prediction. Improvements: more sites systematic every 200 m, multi-day repetition, two observers + inter-observer reliability, integration with EA historic data. The conclusion is consistent with multi-decade evidence + theory, but my fieldwork is a snapshot. Wider context: the local engineering creates downdrift starvation at Withernsea — a regional cost that the local conclusion doesn't capture.'

This is the gold standard.

(b) Urban EQS enquiry. A student finds EQS rises with distance from CBD. Honest evaluation:

'PARTIALLY SUPPORTED with Site 4 (shopping centre) outlier explained by multi-nuclei model. LIMITATIONS: 5 sites + one transect + 1 day; questionnaire respondents self-selected. BIASES: EQS scoring is subjective; one observer may have drifted in scoring. Improvements: more transects across the city, multi-day repetition, inter-observer reliability checks, secondary-data triangulation with Census + IMD + Department for Transport data. Conclusion is consistent with multi-nuclei urban theory; further research should test multi-city replication.'

(c) Earthquake impact analysis. A student plots earthquake deaths vs magnitude + finds Christchurch 2011 (M6.3, 185 deaths) outlier. Honest evaluation:

'The pattern is BROADLY consistent with deaths increasing with magnitude, but the Christchurch outlier reveals that magnitude alone is insufficient — liquefaction + building quality matter. LIMITATIONS: cross-event comparison treats different earthquakes as comparable when they are not. BIASES: media coverage shapes what we know about each event; some earthquakes (e.g. Bam 2003, Iran) under-reported. Improvements: include confounding variables (soil, building quality, tsunami) in the analysis; integrate USGS technical + UN OCHA humanitarian + national government final data. Wider implication: the HAZARD-RISK FRAMEWORK (Risk = Hazard × Exposure × Vulnerability) explains the variation better than magnitude alone.'

The counter — does excessive evaluation undermine the conclusion?

Yes, if taken too far. A student who lists 20 limitations + concludes 'I cannot say anything with certainty' is being unhelpful. The mature response is BALANCED: acknowledge significant limitations + propose specific improvements + still draw the best supportable conclusion from the data.

PROPORTIONALITY matters. A 5% systematic measurement bias is much less serious than 'I didn't sample systematically'. The first can be corrected; the second cannot.

The wider epistemological lesson.

Modern science RECOGNISES uncertainty + acknowledges it transparently.

IPCC climate reports include CONFIDENCE LEVELS (high, medium, low) for each statement.

Medical trials report CONFIDENCE INTERVALS alongside point estimates.

UK National Risk Register publishes likelihoods + impacts as ranges.

COVID-19 modelling included explicit uncertainty bounds.

This epistemic discipline is what distinguishes credible science from over-confident assertion. Pearson 4GE1 is training students in the SAME discipline.

Applied to school fieldwork.

School fieldwork has FAR more constraints than professional research — single days, small samples, low-cost equipment, untrained observers. The HONEST response is not to PRETEND these constraints don't matter — it's to ACKNOWLEDGE them + propose improvements + draw cautious + theory-grounded conclusions.

A student who writes 'I PROVED my hypothesis' demonstrates methodological IMMATURITY. A student who writes 'My data are CONSISTENT WITH longshore-drift trapping, with the caveats that sample size is small + the conclusion should be triangulated with EA historic data' demonstrates methodological MATURITY.

Pearson 4GE1 mark schemes consistently reward the second. This is the discipline of credible geographical enquiry.

Judgement.

The statement is BROADLY CORRECT + cuts to the heart of methodological credibility. The MARK OF METHODOLOGICAL MATURITY IS the honesty with which one evaluates one's own enquiry. Honest evaluation:

Acknowledges limitations + biases explicitly.

Distinguishes limitations from biases.

Identifies confounding variables + alternative interpretations.

Suggests specific + quantified improvements.

Maintains proportionality (significant limitations matter more than minor ones).

Frames uncertainty within an overall justified conclusion.

The strongest fieldwork enquiries are those that are HONEST + JUSTIFIED + CAUTIOUS + IMPROVED-UPON. Pearson 4GE1 mark schemes credit this consistently. The HABITS of honest self-critique — calibration, triangulation, replicability, inter-observer reliability, multi-method, secondary-data integration — are the same habits that ENABLE professional geographical research to advance our understanding of the world. Practising them in school fieldwork prepares students for the same epistemic discipline that real science applies. Honest evaluation is not weakness — it is the foundation of credible knowledge.

Conclusion of this essay. Methodological maturity manifests in the honesty + thoroughness of self-evaluation. Pearson 4GE1 mark schemes credit it consistently. The strongest 4GE1 students learn this discipline + bring it to every enquiry. Honest evaluation is the highest-leverage skill in fieldwork — it lifts conclusions from over-claimed to defensible + advances geographical knowledge through cumulative improvement.
Why this scores
Mark scheme: 2 marks for thesis + framing of honesty + reliability; 2 marks for 5 categories of evaluation explained; 2 marks for distinguishing limitations + biases with examples; 2 marks for 4GE1 worked examples (coastal, urban, hazard) demonstrating honest evaluation; 1 mark for counter-argument + proportionality; 1 mark for wider epistemological framing (IPCC, medical trials, UK risk register) + judgement. Top-band 10-mark answer integrates fieldwork evaluation practice with scientific epistemology.

Key Definitions and Keywords — Evaluation on data, methods and conclusions

Definitions to memorise and the exact keywords mark schemes credit for evaluation on data, methods and conclusions answers — sharpened from recent examiner reports for the 2026 Cambridge IGCSE sitting.

Evaluation
Examiner keyword
Honest assessment of the strengths AND limitations of methods, data + conclusions in a fieldwork enquiry.
Limitation
Examiner keyword
Something that WASN'T DONE or COULDN'T BE DONE that reduces the strength of conclusions — e.g. sample size, time, methods.
Bias
Examiner keyword
A SYSTEMATIC DISTORTION in how data was collected or interpreted — makes the data point reliably in one direction (observer bias, sampling bias, self-selection bias, confirmation bias).
Confounding variable
Examiner keyword
A third variable that affects both X + Y — can create apparent correlation without true causation.
Reproducibility
The ability of another team to repeat the study at the same sites + reach similar conclusions.
Replicability
The ability of methods to be repeated by another researcher with similar equipment + procedures.
Sample size
The number of measurements + sites + repeats — larger samples reduce random error + selection bias.
Observer bias
Systematic distortion in how an observer collects or interprets data — reduced by anchored descriptors + multiple observers + inter-observer reliability checks.
Self-selection bias
Distortion in questionnaire / interview responses because those willing to respond are not representative of the population.
Confirmation bias
Tendency to interpret data in favour of one's hypothesis — reduced by pre-registered methods + double-blind procedures.
Pilot study
A small-scale trial of the method before the main field day to test recording sheets + timings + equipment.
Inter-observer reliability
Examiner keyword
A reliability check where two observers score the same site independently + compare; differences indicate the method needs refining.
Triangulation
Examiner keyword
Use of multiple methods, sources or perspectives to investigate the same question — agreement = robust; disagreement = informative.
Improvement plan
Specific + quantified proposals for strengthening a future study — more sites, more repeats, additional sources, statistical tests, control of confounders.

Common Mistakes and Misconceptions — Evaluation on data, methods and conclusions

The traps other students keep falling into on evaluation on data, methods and conclusions questions — taken from recent Cambridge IGCSE examiner reports and mark schemes — and how to avoid them.

✕Vague evaluation like 'I had no time' or 'it rained'
Pearson 4GE1 Examiner Reports — Paper 2 Section B
Why it happens
Students don't think evaluation through.
How to avoid it
Vague excuses score nothing. Pearson examiners reward SPECIFIC + QUANTIFIED critique addressing the 5 categories (sample / sources / methodology / confounding / reproducibility) + improvements.
✕Listing limitations without proposing improvements
Pearson 4GE1 Examiner Reports — Paper 2 Section B
Why it happens
Evaluation feels like the end.
How to avoid it
Every limitation should be paired with a SPECIFIC + QUANTIFIED improvement. 'Sample was 5 sites' → 'Increase to 10+ sites with systematic sampling at 200 m intervals'. Improvements earn marks alongside limitations.
✕Conflating limitations with biases
Why it happens
They feel similar.
How to avoid it
Distinguish: LIMITATION = what wasn't done (sample, time, methods); BIAS = systematic distortion (observer, sampling, confirmation). Limitations reducible by doing more; biases need methodological change.
✕Ignoring confounding variables
Pearson 4GE1 Examiner Reports — Paper 2 Section B
Why it happens
Students focus on the variables they measured.
How to avoid it
Always ask: what ELSE could have affected my results? Weather, season, recent events, demographic composition, geological variation, storm history. Pearson 4GE1 mark schemes specifically credit confounding-variable identification.
✕Forgetting to address replicability / reproducibility
Why it happens
Feels separate from data quality.
How to avoid it
Ask: could another team repeat this study at the same sites + reach similar conclusions? Replicability is the public face of reliability. Pearson 4GE1 mark schemes specifically credit it.
✕Using secondary sources without evaluating reliability
Why it happens
Government data feels authoritative.
How to avoid it
Every secondary source has strengths + weaknesses. 'ONS Census 2021 is high-quality but decadal — 2026 data is 5 years out of date.' Pearson examiners credit students who evaluate sources.
✕Evaluating only positives + ignoring weaknesses
Pearson 4GE1 Examiner Reports — Paper 2 Section B
Why it happens
Students don't want to undermine their work.
How to avoid it
Honest evaluation acknowledges BOTH strengths AND limitations. Stating limitations is NOT failure — it's the mark of methodological maturity. Pearson 4GE1 mark schemes reward this honesty consistently.

Evaluation on data, methods and conclusions

Detailed Study Notes

At a glance

What you’ll learn

Quick recap

Memorise this

How it’s examined

Take this whole topic with you

1The 5 evaluation categories

2Limitation vs bias

3Evaluating sample

4Evaluating methodology

5Identifying + addressing confounding variables

6Improvement plan

Evaluation

Limitation

Bias

Confounding variable

Reproducibility

Replicability

Sample size

Observer bias

Self-selection bias

Confirmation bias

Pilot study

Inter-observer reliability

Triangulation

Improvement plan

✕Vague evaluation like 'I had no time' or 'it rained'

✕Listing limitations without proposing improvements

✕Conflating limitations with biases

✕Ignoring confounding variables

✕Forgetting to address replicability / reproducibility

✕Using secondary sources without evaluating reliability

✕Evaluating only positives + ignoring weaknesses