Data Quality Statement

Maintaining a high standard of data quality in order to enable health research is a central concern of the OneFlorida+ Data Trust team. The team has an appointed team member who is responsible for ensuring established data quality standards are followed. These standards include procedural guidelines, technical protocols, and a system of recurring quality-checking phases.

Procedural Guidelines

Proper quality procedures include ensuring no unauthorized private health information is included in a data set. Before submitting data to the OneFlorida+ Data Trust team, partner health care institutions must complete an honest broker review of the data and provide formal sign-off via an approved form attesting that the data set meets HIPAA requirements and the current approved IRB protocol (IRB201500466). The form is submitted along with a data set each time changes are made to the parameters or process for data extraction on the partner side. The OneFlorida+ Data Trust team has built a software package to track changes in data file headers and compare them to canonical historical headers to verify no unauthorized data elements are added to the data feed. Additionally, an appointed member of the OneFlorida+ Data Trust team reviews the new data sets before releasing them to the team for processing.

Technical Protocols

Data received from partner health care institutions comes from a heterogeneous mix of electronic health record systems and in a mixture of formats, but it is all transformed to the PCORI Common Data Model (CDM) via OneFlorida+’s custom extract/transform/load (ETL) software. Raw fields are included in the final output data as a cross-check to verify the accuracy of transformed data. In addition, data coded using external data standards such as ICD-9, ICD-10, LOINC, and CPT are validated by the ETL software and corrected where necessary. Dates and test values are standardized. The dataset must maintain referential integrity and this is enforced by disallowing orphaned keys or replication errors in replicated fields.

Quality-Checking Phases

Quality checking phases occur at every step of the data pipeline. While performing the ETL process team members verify correct data counts and track deltas in row counts across data refreshes against historical counts to verify data has been loaded correctly. Once the data is loaded, it must be curated to the current standards established by PCORnet and submitted for review and certification on a quarterly basis. In order to submit a dataset the OneFlorida+ Data Trust team runs SAS software that produces an Empirical Data Characterization (EDC) report, once per refresh cycle per partner and for the data set as a whole. These reports are reviewed to find exceptions or investigative issues for PCORnet’s curation checks, which are then resolved by the Data Trust team. The reports investigate internal consistency in the dataset, real-world consistency (prescribing dates before birth date, for example), adherence to the CDM (no values outside the allowed CDM values), and display changes in the dataset over time. No exceptions to the EDC are allowed before submission of the dataset. Once the dataset is successfully submitted and certified, the database is set to read-only, ensuring no further changes will be made, and the database is ready for authorized research staff to run queries against it.

Relationships With Partners

In addition to internal quality phases, the OneFlorida+ Data Trust team meets regularly with data producers (partner health care institutions) and data consumers (researchers, staff running queries on the data, and those using the data in secondary systems) to correct issues and answer questions regarding how results are represented in the dataset. The EDCs for each partner are delivered to the authorized technical and leadership staff at each site, as well as summary statistics and deviations noted while processing the data. This bi-directional link with those who produce and utilize the data ensures the successive resolution of data quality issues in every data refresh cycle.

Ensuring data quality is a part of every step of the data life cycle, and through continuous monitoring, feedback, and improvement the OneFlorida+ team maintains its high standards.