Notes: Making Data Science work for Clinical Reporting - Part 1
Clinical trial
Data science
Reporting
This is the Part 1 of a four-part course on Coursera. In this part, there is an introduction to clinical trial phases, and motivation to share data. In addition, some terms (such as CDISC standard) have been introduced.
Author
Chi Zhang
Published
February 6, 2023
This is a course provided by Genentech (part of Roche) on Coursera.
Clinical trial: aim to demonstrate that drug is safe and effective (safety, efficacy)
phase 1: 10-20 people, focus on safety (healthy volunteers)
phase 2: 100, study of side effects, determine best dose
phase 3: 1000, demonstrate drug efficacy, fuller safety profile (common across multiple regions, ethnicities)
collecting data from different hospitals, hence important to ensure standards are being followed
evidence must be submitted to health authorities (FDA, EMA European medicines agency)
health authorities determine whether the drug is submitted to market
Submit the analysis plan in advance
Why share data
regulatory requirements
scientific community interest
company internal research interest
marketing materials
Data and results sharing
Regulatory req (e.g. EMA require sharing clinical trial results to gain marketing authorization for pharma products, FDA require sharing data)
scientific community (peer review check accuracy, perform additional analyses, derive new hypothesis)
CDISC standards
CDASH (clinical data acquisition standards harmonization)
SDTM (study data tabulation model)
format for ‘raw’ data, define datasets, structures, contents, variable attributes
SEND (standard for exchange of non clinical data)
ADaM (analysis data model)
data format for data processed for analysis (e.g. converted, imputed, derived)
Dictionary
e.g. nose congestion, stuffy nose, … need to be standardized
MedDRA: standard dictionary for medical conditions, events and procedures
WHO drug dictionary (for pharma agents)
SAP statistical analysis plan
based on study protocol, focus on statistical methodology, is regulated
Programming specification
based on SAP, provides additional details on datasets and tables, listing and figures (TLFs) required for statistical analysis. focus on programming details. Not regulated
Quality assurance
Good clinical practice (GCP), issued by ICH
purpose: prevent mistakes, reduce inefficiencies/waste in a process, increase reliability/trustworthiness of the product of a process
Clinical monitoring: performed by a clinical research assistant (CRA) at investigator sites, checks that study protocols are executed as intended, and site processes result in accurate data capture. Focus on trial subjects’ safety. Traditional goal: 100% source data verification
Data quality checks (more relevant for data scientists). Checks data for technical conformance, and data plausibility. Focus on data quality. Traditional goal: 100% accurate and format compliant data
Code review
Double programming
Data access restrictions
reasons
data collected is very sensitive (health data), need data protection
clinical trial data is a key asset and revenue predictor for pharma companies, high confidentiality levels
scientific validity, data is ideally double blinded, no-one should know whether a subjecttreatment is as long as the data is still being collected
pseudonymization: data de-identification
use pseudonym (ID), link is recorded to allow re-identification
anonymization: limit the risk of re-identification
remove variables, remove values, replace more precise values with more general categories, replace personal identifiers with random identifiers
FSP, CRO (out-sourcing), personnels require data access at different levels
Unblinding
Citation
BibTeX citation:
@online{zhang2023,
author = {Zhang, Chi},
title = {Notes: {Making} {Data} {Science} Work for {Clinical}
{Reporting} - {Part} 1},
date = {2023-02-06},
url = {https://kundan-kumarr.github.io/blog/talks/technotes_20230205_clinreport_part1/},
langid = {en}
}