Skip to content
Partner Developer Portal

Glossary

The following concepts will appear frequently when discussing iPCV.

  • DSA: Data Sharing Agreement.
  • EMIS Now: Knowledge base and customer support platform. Learn more.
  • EMIS Web: Proprietary clinical software from EMIS used by GPs to store patient’s data. It is also the name of the database that store this data and that EXA imports. Learn more.
  • MKB: Medical Knowledge Base.
  • Index: A central database used to track what is going on in the EMIS Web estate.
  • EXA: EMIS-X Analytics, scalable access to data for analytic applications. Learn more.
  • NDOO: National Data Opt-Out. A service that allows patients to opt out of their data being used for research and planning. Learn more.
  • Primary Care Views (PCV): Set of views that denormalizes data from Emis Web into concepts which are easier to use.
  • Incremental Primary Care Views (iPCV): Application that pre-computes PCV for an improved read time.
  • Execution Date: (_execution_date) ID of an iPCV execution. Use it to identify new data.
  • Replicator: The EMIS written software to pull data out of our source systems.
  • Ingestor: The EMIS written process that takes the replicated data and merges it in to the lake.
  • NDOP: National Data Opt-Out Patients. When a patient has an NDOP flag, their data is excluded from secondary uses in accordance with national policy
  • ERD: Entity Relationship Diagram, a visual representation of the relationships between data entities in a database.
  • SQL: Structured Query Language, there are many flavours of SQL due to modern technologies but there is also a foundation referred to as ANSI.
  • Flavours: Different configurations of the same model.
  • Table formats: Originally using Hive, but other available formats include Iceberg, Delta Lake and Hudi.
  • File formats: Typically referring to one of Parquet, ORC or Avro.
  • Compression: There are different algorithms out there for compressing files, popular ones include ZSTD, Snappy, LZ4 and gzip.
  • Partitioning: When talking about object storage (s3, GCS etc.) the partitioning scheme physically splits the data in to different prefixes. This can help improve performance but there is a balance to be had.
  • Backfilling: The act of populating either all or a section of data historically. Usually requires a monumental amount of resources.
  • Grain: A synonym for unique primary key, here is the kimball documentation.
  • Soft delete: A technique for excluding data without removing it from a data system.
  • Hard delete: As above, but actually removes the data rather than excluding it.
  • DAG: Directed Acyclic Graph, a mathematical term, but it’s used frequently in orchestration. Process a series of tasks in a single direction without any loops (cycles).
  • Orchestration: Synonym for scheduling. Running tasks at a specified time in a specified order.
  • Lookup table: A table that exists to exchange one value for another. Could be an identifier for the description or one identifier for another. Typically narrow in design.
  • Bridge table: A common technique to help resolve many-to-many relationships. An example.
  • Normalised data: A method of structuring a database to reduce redundancy and improve data integrity. It involves organizing data into tables and defining relationships between them to eliminate duplicate data and ensure that each piece of data is stored in only one place. Learn more.
  • Denormalised data: The opposite of normalisation. Compute is expensive (joins) and storage is cheap in the world of analytics, so we add some duplication and redundancy to improve performance.
  • Natural duplicates: Data that is an exact copy for the whole row and can be removed by using DISTINCT or GROUP BY SQL syntax.
  • Unnatural duplicates: Where the primary key remains the same but there are multiple rows and data differs between them. There is no simple fix for this, and the activity to fix should be pushed up to the source system.
  • Time travel: The ability to see what data was in a table at a point in time. Depends on whether the table format implements it and how much history is kept.