-
An SQL database was approved for data storage, management, and analysis during the expansion. The IT system administrator physically transferred the data and the codebook to the new location using a flash drive and then promptly imported it into the new system.
Data quality is compromised, and potential data loss occurs biasing any further analysis. -
The original analyst examined the 2017 data to determine whether a lack of medical care was an issue in Alabama and the reasons behind it.
Results are potentially biased due to the data subset and the inherent response bias of the survey. -
BRFSS (https://www.cdc.gov/brfss/index.html.) collected nationwide health data via phone surveys. An Alabama-specific subset of heart health risks was then extracted and saved as a CSV. Note: January 2018 data is an outlier/follow-up.
Sampling and response biases inherent in BRFSS methodology and may impact reliability.
Data Type: Numerical, Categorical, and Characters
Initial Data Quality High -
A new data analyst was tasked with reviewing the master data set to identify the top three health concerns that the new facility should prioritize for heart health. During the data migration process, it was determined that the CSV file contained unreadable characters that the SQL database could not process. As a result, some rows of data failed to import correctly, leading to data loss.
Severe data loss, reduced sample size, statistical power, and inability to reliably generalize findings. -
The data analyst found discrepancies in the import quantity and sought our assistance in identifying the missing information, analyzing the data's journey, and assessing the impact of data loss on quality. Potential ethical considerations, like HIPPA compliance, will apply.
Note: Timeline dates are approximate due to scenario/data inconsistencies.