Open Access Journal of Genome Biology and Bioinformatics
From Oncogenetic Pedigrees to Family Profiles: A Necessary Step to Enable Statistics
Published on: 2018-09-03
Background: Cancer has always been a major domain requiring progress in statistics, methodology and bio-informatics. Oncogenetic, focusing on the relationship between genetics and cancer, is particularly concerned with “big data” issues, which includes genealogical pedigrees: their special structure – made of relations between members and possible clinical annotations - is too complex to be directly used for statistical purpose. This article describes a way to condense pedigrees so that they can be handled more easily and compared together. Method: our approach aggregates the genealogical and clinical information of pedigrees containing many generations. Condensed pedigrees, called “subtrees”, are composed of basic 2 or 3-generation pedigrees: for one whole pedigree, a subtree is calculated by the mean of all basic pedigrees it contains. These subtrees can then be grouped together for different subsets of families (for example breast/ovarian cancer families with or without BRCA mutation carrier). Such a grouping named “profile”, besides its reduced structure, is particularly interesting because for each studied characteristic, means and standard deviations are available. Moreover, distances between each subtree and various profiles can be calculated and used as a discriminant index. Results: Subtrees and profiles were validated using a subset of 454 families (22.348 members) with a Lynch syndrome: in 84, at least one member carried an MMR deleterious mutation. Two profiles were computed depending on the presence or the absence of MMR mutation in the families. An ROC analysis showed that distances between each family subtree and both profiles were significant predictors for MMR mutations. Conclusion: Subtrees and profiles show interesting discriminant properties to study pedigree data. This method seems suitable to search for population differences between monogenic cancer risk models and multigenic ones.