NHS Number De-duplication
Overview
Section titled “Overview”A patient may have multiple registration records across different organisations, or multiple registrations within the same organisation (for example, when they de-register and re-register, or when historical records have been migrated). This document explains how the OpenSafely pipeline selects a single canonical patient record per NHS number for use in downstream analysis.
What is deduplicated?
Section titled “What is deduplicated?”Deduplication is applied globally across all qualifying England GP organisations. If a patient is registered at two different GP practices with the same NHS number, only one record — the most recently registered — is selected as the canonical record for that patient.
The result is: one row per NHS number globally, representing the patient’s single most recent registration regardless of which organisation it is at.
ROW_NUMBER() OVER ( PARTITION BY nhs_number_raw /* One global winner per NHS number across all organisations */ ORDER BY registration_start_datetime DESC, model_updated_datetime DESC, patient_id DESC /* Tiebreak: most recently active practice, then patient_id for determinism */ ) AS rnEligibility criteria
Section titled “Eligibility criteria”Before deduplication, records must pass the following filters. Records that do not meet these criteria are excluded entirely.
| Criterion | Rule |
|---|---|
| NHS number | Must be present and valid (is_valid = TRUE) |
| Patient type | Regular patients only (is_regular = TRUE) |
| Registration status | Currently registered only (is_registered = TRUE) |
| Sensitive / confidential | Must not be flagged (is_sensitive = FALSE, is_confidential = FALSE) |
| Registration date | Exclude patients who died before 2009 and exclude patients whose registration ended before 2009 (p.datetime_of_death IS NULL OR YEAR(p.datetime_of_death) >= 2009) AND (p.registration_end_datetime IS NULL OR YEAR(p.registration_end_datetime) >= 2009) |
How the winner is selected
Section titled “How the winner is selected”Where multiple eligible registrations exist for the same NHS number — across any number of organisations — the record with the most recent registration_start_datetime is selected as the global canonical record.
This is evaluated across all historical registrations at all organisations, not just those updated in a given processing window. This ensures that if any registration for an NHS number is updated, the globally correct winner is always re-evaluated and maintained.
If a patient moves from one GP to another (the NHS number “moves” organisation), the newer registration becomes the winner and the previous organisation’s row is naturally overwritten on the next batch.
What consumers receive
Section titled “What consumers receive”The patient model exposes one row per NHS number globally, containing the most recent registration details for the winning patient. All patient identifiers (patient_id, nhs_number) are pseudonymised before delivery.
nhs_number is a stable pseudonymous hash derived solely from the raw NHS number (not from the organisation). This means the same patient will always have the same nhs_number value in patient, regardless of which organisation their current winning registration is at.
Consumers should not need to perform any further deduplication on nhs_number.

