Definition

The Data Freshness model provides visibility into the currency and timeliness of data across the iPCV platform. This model tracks the start and end times for data processing, along with when data was last updated. This enables users to understand the recency of information and make informed decisions about data reliability and usage.

Information

The data model captures freshness metrics for data across various datasets and tables. It provides timestamps and indicators that help users assess whether the data meets their timeliness requirements for analysis and reporting.

In the Data Freshness model, several indicators are available to assess the currency of data:

start_datetime - Indicates when the data processing period begins, representing the start of the data range included in the refresh.
end_datetime - Shows when the data processing period ends, representing the end of the data range included in the refresh.
last_updated_datetime - Indicates when the data processing was completed and the data was last refreshed or updated in the system.

What’s changing in iPCV v2

Renaming run_status to data_freshness for Enhanced Data Quality Insights

The data model previously named run_status has been renamed to data_freshness.
This new name better reflects its purpose: providing visibility into the freshness and timeliness of your data.

Why?

The previous name, run_status, was technical and didn’t clearly convey the purpose of the model.
We renamed it to data_freshness because freshness is a recognized data quality dimension and a key pillar of data observability.
Data freshness measures how recently data has been updated for consumption, which directly impacts its accuracy and value. Stale data can lead to poor decisions, operational delays, and even financial loss.
By adopting this naming convention, we align with industry best practices and make it clear that this model helps monitor timeliness and reliability of data, enabling customers to trust their dashboards and analytics

Customer benefit

Clearer Understanding of Data Quality: The new name, data_freshness, makes it obvious that this model tracks how up-to-date your data is, improving transparency.
Better Decision-Making: Fresh data ensures dashboards and analytics reflect the latest information, reducing the risk of acting on stale data.
Alignment with Industry Standards: Using a recognized data quality dimension helps customers adopt best practices for monitoring and governance.
Improved Trust in Data: By focusing on freshness, customers can quickly identify delays or issues in data pipelines and take corrective action

Customer action

Update References: Replace any usage of run_status in queries, ETL processes, or dashboards with data_freshness.

Example:

SELECT last_updated FROM data_freshness;

Review Data Quality Checks: If you have automated checks or alerts based on run_status, update them to use data_freshness fields.

Removal of Irrelevant Columns

The MKB fields and the status column have been removed from the run status data model. In iPCV v1, these columns were included but did not provide meaningful or actionable information for customers. The status column, for example, was typically hardcoded to “complete” and did not reflect any real-time or process-specific status changes.

Why?

The removal streamlines the data model, eliminating unnecessary clutter and focusing only on columns that provide value. This change was made because the columns did not serve a practical purpose and could potentially cause confusion or require customers to handle redundant data.

Customer benefit

Simplified Data Model: Cleaner and more focused, making it easier to use and understand.
Reduced Confusion: Less risk of misinterpreting irrelevant or unused columns.
Easier Maintenance: Improved maintainability for future updates.

Customer action

Update Integrations: Revise queries, reports, or integrations that referenced the removed MKB fields or status column.
Review Logic: Check data processing logic for compatibility with the streamlined model.

Addition of Organization Column

The organization column is being added to the run status data model & will contain the organization identifier, allowing data to be filtered and accessed based on organizational permissions. In cases where the organization is not directly relevant, the column may be defaulted (e.g., to “EMIS”), but it must still be present.

Why?

To comply with Optum’s access requirements, which mandate that every table includes an organization column for authorization filtering.
To ensure customers can only access data for organizations they are authorized for, supporting secure and compliant data access.
To help customers with multiple organizations in their DSA distinguish and segment data by organization, reducing confusion and supporting clear data management.
To maintain consistency across the data model, as all other tables also use organization-based filtering.

Customer benefit

Access Control Compliance: Ensures customers can query data only for authorized organizations, supporting secure and compliant data access.
Clear Data Segmentation: Makes it easy to identify which data belongs to which organization, reducing confusion for customers with multiple organizations.
Consistent Model Structure: Aligns with other tables that use organization-based filtering, simplifying integration and maintenance.

Customer action

Update Queries: Adjust queries and reports to include the organization column for filtering and joining data.
Review Permissions: Confirm that data access aligns with organizational permissions and update any logic as needed.
Integrate with Existing Systems: Ensure that systems using the data model can handle the organization column for proper data segmentation.

Updated Timestamp Columns

The run status table now includes three timestamp columns: start date time, end date time, and last updated date time. These columns represent the range of data processed (start and end) and the completion time of the data processing (last updated). The execution date time column used previously is replaced by last updated date time, which is not present in the data models themselves but indicates when processing finished.

Why?

To provide a clear and accurate record of the data processing window for each table, ensuring transparency about what data has been processed.
To help customers understand exactly when data processing started, ended, and was last updated, supporting better data governance and auditability.
To avoid confusion from the previous execution date time, which was sometimes used for joins but is no longer suitable for that purpose.

Customer benefit

Clear Data Processing Window: Improved clarity on what data is included in each run, making it easier to track processing periods.
Enhanced Auditability: Greater transparency and traceability for compliance and troubleshooting.
Reduced Join Errors: Minimizes risk of incorrect joins by clarifying the purpose of each timestamp column

Customer action

Update Queries: Revise any queries or integrations that used execution date time to now use the new timestamp columns.
Use Last Updated for Completion Only: Rely on last updated date time solely to check when processing is finished, not for joining tables.
Reference Start/End for Data Range: Use start and end date time columns to determine the range of data processed in each run.

Examples

Get freshness status for all current datasets

SELECT
  *
FROM hive.explorer_ipcv.srv_data_freshness
WHERE is_current = true

Check freshness for a specific table

SELECT
  customer_table_name,
  last_updated_datetime,
  start_datetime,
  end_datetime
FROM hive.explorer_ipcv.srv_data_freshness
WHERE customer_table_name = 'user_in_role'