Biostatistics Unit
The research mandate of the Biostatistics Unit (BU) is:
"To advance the health of the nation through the application, development and promotion of statistical methods in the clinical and health research conducted by the Medical Research Council."
This mandate extends to the following institutions and individuals in order of priority:
- MRC Centres, Units, Groups and Lead Programmes
- MRC supported research
- Department of Health / National and Provincial, National and International Health Agencies i.e. Health System Trust, WHO
- Health researchers at tertiary institutions, NGO's
- Pharmaceutical and Private Companies
By giving priority to the MRC Units the Biostatistics Unit is aligned with the strategic health research activities of the MRC.
The following are short summaries of selected projects that featured prominently in the research effort of the BU.
Research objectives
Collaborative research
The Biostatistics Unit aligns itself with the collaborative research objectives of the research priorities of the Lead Programmes and Units of the MRC. The research objective of the Unit is to make an innovative scientific contribution to all statistical aspects of a project. The aim of the Unit is to allocate more than 80% of its resources into collaborative research.
A selection from the portfolio of projects and collaborations of the Unit has been made to reflect the diversity of study objectives, designs and statistical methods. The projects that have been highlighted in this 2010 report fall into the following research designs/ disciplines:
- clinical trials
- pragmatic trials
- surveys
- statistical genetics
- surveillance
- data management
- other
Methodological research
In 2010 the following methodological areas were targeted to support the collaborative research effort of the Unit.
• D-optimal designs
• Joint Spatial Statistical Models
• Methods for the analysis of genetic data
• Multi-criteria decision making
• Latent variable modeling
• Miscellaneous
Optimal Designs for Drug Synergy
The objective of the PHD thesis of Gaetan Kabera was to construct designs for the precise estimation of parameters of models useful for the detection of synergy between two drugs. The design problems were solved using the two-variable binary logistic model (1) without interaction and (2) with interaction as drug effects are generally modelled using logistic models. Designs for the precise estimation, i.e. minimizing the dispersion, of model parameters are known in statistical literature as D-optimal designs. In this research work D-optimal designs were constructed analytically in simple design cases and numerically in complicated cases. Real world examples were used illustrate the theory and to show that optimal designs are more efficient than classical designs often used in experimentation. The results of this research work can easily be extended to the construction of optimal designs for detecting interaction between more than two drugs using either logistic models or several other models briefly discussed in the thesis. Part of the work was briefly published in Contributed Paper Meetings (CPMs), ISI, Durban 2009 (see http://www.statssa.gov.za/isi2009/ScientificProgramme/IPMs/0951.pdf). Dr Kabera was awarded his doctorate in 2010 from the University of UKZN. He is continuing his work on the analytic construction of D-optimal designs for the two-variable binary logistic models. This involves the writing of publishable papers and exploration of extensions and applications in medical fields of the theoretical results which emerged from the thesis. This research is being conducted with the collaboration with Professor Principal Ndlovu of the University of South Africa and Professor Linda Haines of the University of Cape Town.
Joint Spatial Statistical Models
- This is the area that Dr Samuel Manda is actively involved in. He has been developing and applying novel statistical models and methods for fitting multivariate spatial data (e.g. counts for two cancers in geographical areas). Most of the data are supplied by epidemiologists/public health scientists at the University of Leeds, United Kingdom.
- Dr Manda in collaboration with Dr K Zuma (HSRC) these models are now developed in the context of multivariate discrete outcomes to fit HIV/TB/STD co-infections at the individual level. A PhD student will be working within this area
- Dr Samuel Manda led a study looking at the geographic variations in risk of Human Immunodeficiency Virus in South Africa using multiple diseases joint spatial modeling. The data used were from the South African National Antenatal Sentinel HIV and Syphilis Prevalence surveys conducted between 2007 and 2009. The data were aggregated to prevalence rates at the health district level. Two contextual factors, which have previously been linked to sexually transmitted infections (STIs), were selected and extracted for each of the 52 districts viz:
- material and social deprivation, and
- population density. Syphilis prevalence first was used, with deprivation, as a covariate in a Bayesian HIV prevalence ecological regression model and, secondly, in a bivariate binomial spatial model treating HIV and Syphilis prevalence rates as bivariate outcome. The district map of excess spatial variation in HIV prevalence produced from the modelling indicates other spatially lurking variables. Understanding this variation would be essential for determining districts in which resources for prevention and treatment programs should be focussed. Work done in collaboration with the Department of Health and has been submitted for publication. Collaborator: Dr Carl Lombard
Methods for the analysis of genetic data
Ushma Galal
MSc Thesis
Title: The Statistical Theory Underlying Human Genetic Linkage Analysis based on Quantitative Data from Extended Families
Supervisors: Professor Lize van der Merwe and Professor Renette Blignaut
Background:
Traditionally in human genetic linkage analysis, extended families were only used in the analysis of dichotomous traits, such as Disease/No Disease. For quantitative traits, analyses initially focused on data from family trios (for example, mother, father, and child) or sib-pairs.
Recently however, there have been two very important developments in genetics: It became clear that if the disease status of several generations of a family is known and their genetic information is obtained, researchers can pinpoint which pieces of genetic material are linked to the disease or trait. It also became evident that if a trait is quantitative (numerical), as blood pressure or viral loads are, rather than dichotomous, one has much more power for the same sample size. This led to the development of statistical mixed models which could incorporate all the features of the data, including the degree of relationship between each pair of family members. This is necessary because a parent-child pair definitely shares half their genetic material, whereas a pair of cousins share, on average, only an eighth. The statistical methods involved here have however been developed by geneticists, for their specific studies, so there does not seem to be a unified and general description of the theory underlying the methods.
Final thesis:
The aim of this dissertation was to explain in a unified and statistically comprehensive manner, the theory involved in the analysis of quantitative trait genetic data from extended families. The focus is on linkage analysis: what it is and what it aims to do. There is a step-by-step build up to it, starting with an introduction to genetic epidemiology. This includes an explanation of the relevant genetic terminology. There is also an application section where an appropriate human genetic family dataset is analysed, illustrating the methods explained in the theory sections.
Multi-criteria decision making
Modelling Human Judgment
Methodology was developed by Dr Piet Becker for constructing a measuring instrument (metric) to evaluate subjects from a population that cannot provide data to facilitate the development of such a metric, e.g. pre-term infants in a neonatal intensive care unit. Central to this methodology is to rely on an expert group to decide on the items to be included into such a metric, and an expert group to provide values, using the visual analogue scale, for pairwise importance between items. The geometric mean method, a well established procedure in multi-criteria decision analysis, is applied to establish standardized item weights that contribute to a more reliable composite score for items that were assessed on a Likert scale. This methodology was applied, and results were published, throughout a range of research areas looking at stress levels of pre-term infants in ICU, a computerized tutorial performance evaluation instrument, cost and benefit of prophylaxis against deep-vein thrombosis in elective hip-replacement and simple scoring method to screen children for tuberculosis. In these projects the Biostatistics Unit facilitated the expert group workshops to gather data on pair-wise importance of items, after which modeling of this data was done and standardized item weights were determined.
The Analytical Hierarchy Process
The Analytical Hierarchy Process (AHP) is a multi-criteria decision making procedure introduced in the 1970’s by Thomas Saaty for ranking items from the most important to the worse and vice-versa. The methods generally used lack the capability of measuring variability. Dr Geatan Kabera in collaboration with Professor Linda Haines of the University of Cape Town are investigating statistical approaches to the AHP, and to find real world statistical applications, mainly in medical fields. They have already developed and published a peer-reviewed conference proceeding (SASA 2010) using statistical methods based on the logistic and log-logistic distributions. Currently, he is investigating analysis of judgment matrices in the AHP based on the Kullback-Leibler distance.
Latent variable modeling
The idea of latent variables captures a wide variety of statistical concepts, including random effects, missing data, sources of variation in hierarchical data, finite mixtures, latent classes and clusters. The paragraphs below mention some of the concepts in the latent variable modeling framework Esme Jordaan has investigated.
As a start she attended courses on latent curve models by K Bollen as well as an advanced Structureal Equations Modelling (SEM) course by Randall Schumacker in 2008.
Following up on these two courses, she attended a discussion at 57 th International Statistical Institute Conference last year on Partial Least Squares path modeling (PLS_PM) by L Trinchera from the University of Naples, Italy. The talk was about a new method for unobserved heterogeneity detection in PLS-PM: (Response Based procedure for detecting Unit Segments in PLS_PM).
Also at the ISI conference, she gave a talk related to latent variable modeling titled : “A Nonlinear Mixed Model Approach to Item Response Modelling: A Rating scale for non-verbal communication”. This work was done for a project in collaboration with Dana Niehaus, from Department of Psychiatry, US. As follow-up she presented a SEM course in Durban to participants from UKZN, Biostatisticss Unit and CAPRISA. The course covered the modeling formulation, some basic, some more advanced topics and various examples, as well as programs to analyze SEM.
A special application of SEMs involved the validation for the analyses of dietary questionnaire assessments: The method of TRIADS in terms of SEM (Kaaks). The method is used to assess the validity of a short questionnaire to measure iron intake among young women. The method of TRIADS refers to the triangular comparison among the short questionnaire, the food frequency questionnaire and a biomarker. The analysis was towards a PhD for Ernesta Kunneke from UWC. This analysis was also presented at the seminar series at UCT.
To implement the various techniques, I investigated various statistical procedures in SAS, SSPSS AND LISREL.
Miscellaneous
Multisate Markov Models
Tarylee Reddy
- She has been approached to work on a study to assess the progression of HIV infection in infants in collaboration with Professor Anna Coutsoudis and Dr Gurpreet Kindra of the Department of Paediatrics and Child Health, University of KwaZulu-Natal. This study is of particular interest to her as it necessitates the application of Multistate Markov Models. The states of the model are the four WHO defined HIV stages and death is a fifth absorbing states. The effect of gender, maternal age at birth and feed on the rate of progression will be determined using an extension of the Cox Proportional Hazards model.
- Collaborative research with the HPP (HIV Pathogenesis Programme) aimed at describing immune deterioration in treatment naïve HIV positive adults enrolled in the Sinikithemba study. HIV disease progression was investigated through the application of a five state Markov model accommodating reverse transitions to data on HIV infected individuals in KwaZulu Natal
Linear combinations of multiple diagnostic markers
ROC analysis of Cathy Connolly in collaboration with Dr V Patel of the Dept of Neurology of UKZN and K Dheda, UCT.
The development of new rapid TB tests has created interest in developing statistical techniques for combining clinical and laboratory data to improve the diagnosis of TB meningitis. The area under the ROC curve was used to select optimum cut points for the new TB test. A weighted clinical score was generated from coefficients from a logistic regression model. Using a likelihood ratio test, the incremental value of adding the laboratory results to clinical findings was tested and showed a significant improvement in diagnosis accuracy. Ongoing work is exploring ways of combining several TB screening results using linear discriminant analysis, a distribution free method and logistic regression and to assess whether test performance differs by patient covariates like CD4 count. Two papers published and one submitted
Flexible Modelling of Clustered and Stratified National Sample Survey Data (e.g. SA-DHS, HSRC data sets)
Dr Manda independently or in collaboration with researchers from the HSRC, New Zealand and USA, novels models have been (are being) investigated to flexibly model the inherit clustering and stratification of these data (e.g. to transition to teenage motherhood in SA, Child survival in SA; clustering effects to child survival in Malawi), in addition to accounting for the weighting of the sampled data.
Relative Survival Using HIV and TB cohorts in South Africa
With colleagues in the BSU and researchers from TB Epidemiology and Intervention Research Unit (MRC) and University of Limpopo (MEDUNSA), Dr Manda is introducing novel statistical methods to measure excess mortality among the HIV or TB patients cohorts compared to the general morality in South Africa.
Statistics for Profiling Health Providers
Dr Manda is investigating the issues and problems at constructing reliable and valid league tables of health institution/hospital performances, with a view to developing robust performance statistics. These issues are being looked into using ARV success indicators in South Africa.
Advanced Techniques for Modelling Maternal and Child Health in Africa
Dr Manda worked on two chapters for a manuscript with the above mentioned title
Manda SOM, Meyer R, Cai B (2010). A Semi-parametric Hierarchical Stratified Model of Determinants on Transition to Teenage Motherhood in South Africa.
Manda SOM (2010). Macro Determinants of Spatial Variation in Childhood Mortality in South Africa using Flexible Bayesian Gaussian Mixture
Editors Ngianga-Bakwin Kandala, Khaled Khatab (eds), Springer Forthcoming
Modern Methods for Epidemiology
Dr Manda worked on chapters in another manuscript with the above stated title
Manda SOM, Feltbower RG (2010), Bivariate Frailty Model for Spatially Dependent Survival Data. Feltbower RG, Manda SOM (2010). Bayesian Bivariate Disease Mapping.
In Yu-Kang Tu, Darren C Greenwood (eds), Springer, Forthcoming. |