Abstract 2088: gCNV-Seeker: A Comprehensive Germline CNV Calling Pipeline Based on Whole Genome Sequencing Data

Publication
Cancer Research, 83(7 Supplement) 2088–2088. AACR https://doi.org/10.1158/1538-7445.AM2023-2088

Abstract: Germline Copy Number Variation (gCNV) is a type of genomic structural alteration including deletion or duplication of small genomic regions (50bp to 1MB) and can have an important role into cancer etiology. Whole genome sequencing (WGS) has been considered to be the most effective technology for genome-wide identification of gCNVs. However, an easy-to-use gCNV calling pipeline based on WGS data is still lacking. Here, we present the gCNV-Seeker, a user-friendly, rigorous computational pipeline to detect gCNV events based on WGS with standardized quality control and data visualization features. gCNV-Seeker initially adopts GATK probabilistic algorithms to detect a set of raw gCNV events and subsequently applies Binary Segmentation and Pruned Exact Linear Time (PELT) algorithms for re-segmentation and boundary revision of the gCNV candidates, respectively. In addition, gCNV-Seeker is built with several functionalities, including quality control, filtration, annotation and visualization to identify the gCNV regions (gCNVRs) of interest. We applied gCNV-Seeker to the WGS data from 872 lung cancers in never smokers from the Sherlock-Lung study and 3202 WGS data from the 1000 Genomes Project (1KGP) (reference) and identified CNVRs associated with lung cancer risk in never smokers. For example, in comparison to WGS data from 1KGP, we identified several CNVR candidates overlapping with known LC susceptible genes, e.g., the GSTM1/2 homozygous deletion (OR = 1.64, 95% CI=1.43-1.88, P {$<$} 0.01) and CYP2A6/7 homozygous or heterozygous deletion (OR = 1.59, 95% CI=1.24-2.03, P {$<$} 0.01). To evaluate its performance, we also carried out a comprehensive comparison of gCNV calling results between gCNV-Seeker and the 1KGP structural variant calling pipelines (PMID: 36055201) for common gCNVRs in the 1KGP WGS data. gCNV-Seeker showed a significant improvement on specificity and sensitivity for gCNVR detection compared with the 1KGP pipeline. In terms of specificity, 1315 out of 2692 (HGSV_9692) GSTM1/2 heterozygous deletion events (48.85%) originally identified by 1KGP were detected as homozygous deletions by gCNV-Seeker. These results were further confirmed by manual check. In terms of sensitivity, gCNV-Seeker detected total 299 deletion events in CYP2A6/7 regions, which were also manually confirmed. This corresponds to an increase of 48.02% compared to the 202 deletion events (HGSV_232739 & HGSV_232740) originally identified by the 1KGP pipeline. gCNV-Seeker will be a publicly available cross-platform (Linux and IOS) pipeline accessible in GitHub.

Related