1. Pathophysiological significance and therapeutic targeting of germinal center kinase in diffuse large B-cell lymphoma.

    Blood 128(2):239 (2016) PMID 27151888 PMCID PMC4946202

    Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma, yet 40% to 50% of patients will eventually succumb to their disease, demonstrating a pressing need for novel therapeutic options. Gene expression profiling has identified messenger RNAs that lead to transfo...
  2. Sparse regression and marginal testing using cluster prototypes.

    Biostatistics 17(2):364 (2016) PMID 26614384

    We propose a new approach for sparse regression and marginal testing, for data with correlated features. Our procedure first clusters the features, and then chooses as the cluster prototype the most informative feature in that cluster. Then we apply either sparse regression (lasso) or marginal s...
  3. QnAs with Robert Tibshirani.

    PNAS 112(25):7621 (2015) PMID 26100896 PMCID PMC4485129

  4. Collaborative regression.

    Biostatistics 16(2):326 (2015) PMID 25406332 PMCID PMC4441100

    We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with these type of data is "sparse multiple canonical correlation analysis" (sparse mCCA). All of th...
  5. Molecular subtyping for clinically defined breast cancer subgroups.

    Breast Cancer Research (Online Edition) 17(1):29 (2015) PMID 25849221 PMCID PMC4365540

    Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the ...
  6. Pancancer analysis of DNA methylation-driven genes using MethylMix.

    Genome Biology 16(1):17 (2015) PMID 25631659 PMCID PMC4365533

    Aberrant DNA methylation is an important mechanism that contributes to oncogenesis. Yet, few algorithms exist that exploit this vast dataset to identify hypo- and hypermethylated genes in cancer. We developed a novel computational algorithm called MethylMix to identify differentially methylated ...
  7. Quantitative SD-OCT Imaging Biomarkers as Indicators of Age-Related Macular Degeneration Progression.

    Investigative Ophthalmology & Visual Science 55(11):7093 (2014) PMID 25301882

    We developed a statistical model based on quantitative characteristics of drusen to estimate the likelihood of conversion from early and intermediate age-related macular degeneration (AMD) to its advanced exudative form (AMD progression) in the short term (less than 5 years), a crucial task to e...
  8. Active idiotypic vaccination versus control immunotherapy for follicular lymphoma.

    Journal of Clinical Oncology 32(17):1797 (2014) PMID 24799467 PMCID PMC4039868

    Idiotypes (Ids), the unique portions of tumor immunoglobulins, can serve as targets for passive and active immunotherapies for lymphoma. We performed a multicenter, randomized trial comparing a specific vaccine (MyVax), comprising Id chemically coupled to keyhole limpet hemocyanin (KLH) plus gra...
  9. A multicentre study of primary breast diffuse large B-cell lymphoma in the rituximab era.

    British Journal of Haematology 165(3):358 (2014) PMID 24467658 PMCID PMC3990235

    Primary breast diffuse large B-cell lymphoma (DLBCL) is a rare subtype of non-Hodgkin lymphoma (NHL) with limited data on pathology and outcome. A multicentre retrospective study was undertaken to determine prognostic factors and the incidence of central nervous system (CNS) relapses. Data was r...
  10. Increasing value and reducing waste in research design, conduct, and analysis.

    The Lancet 383(9912):166 (2014) PMID 24411645 PMCID PMC4697939

    Correctable weaknesses in the design, conduct, and analysis of biomedical and public health research studies can produce misleading results and waste valuable resources. Small effects can be difficult to distinguish from bias introduced by study design and analyses. An absence of detailed writte...
  11. A shared transcriptional program in early breast neoplasias despite genetic and clinical distinctions.

    Genome Biology 15(5):R71 (2014) PMID 24887547 PMCID PMC4072957

    The earliest recognizable stages of breast neoplasia are lesions that represent a heterogeneous collection of epithelial proliferations currently classified based on morphology. Their role in the development of breast cancer is not well understood but insight into the critical events at this ear...
  12. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.

    Statistical Methods in Medical Research 22(5):519 (2013) PMID 22127579 PMCID PMC4605138

    We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especia...
  13. Classification of patients from time-course gene expression.

    Biostatistics 14(1):87 (2013) PMID 22926914 PMCID PMC3520502

    Classifying patients into different risk groups based on their genomic measurements can help clinicians design appropriate clinical treatment plans. To produce such a classification, gene expression data were collected on a cohort of burn patients, who were monitored across multiple time points....
  14. Scientific research in the age of omics: the good, the bad, and the sloppy.

    Journal of the American Medical Informatics Ass... 20(1):125 (2013) PMID 23037799 PMCID PMC3555320

    It has been claimed that most research findings are false, and it is known that large-scale studies involving omics data are especially prone to errors in design, execution, and analysis. The situation is alarming because taxpayer dollars fund a substantial amount of biomedical research, and bec...
  15. Genome-wide measurement of RNA folding energies.

    Molecular Cell 48(2):169 (2012) PMID 22981864 PMCID PMC3483374

    RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures...
  16. Normalization, testing, and false discovery rate estimation for RNA-sequencing data.

    Biostatistics 13(3):523 (2012) PMID 22003245 PMCID PMC3372940

    We discuss the identification of genes that are associated with an outcome in RNA sequencing and other sequence-based comparative genomic experiments. RNA-sequencing data take the form of counts, so models based on the Gaussian distribution are unsuitable. Moreover, normalization is challenging ...
  17. Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers.

    Genome Biology 13(8):R75 (2012) PMID 22929540 PMCID PMC4053743

    Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers...
  18. A fused lasso latent feature model for analyzing multi-sample aCGH data.

    Biostatistics 12(4):776 (2011) PMID 21642389 PMCID PMC3169672

    Array-based comparative genomic hybridization (aCGH) enables the measurement of DNA copy number across thousands of locations in a genome. The main goals of analyzing aCGH data are to identify the regions of copy number variation (CNV) and to quantify the amount of CNV. Although there are many m...
  19. Adaptive index models for marker-based risk stratification.

    Biostatistics 12(1):68 (2011) PMID 20663850 PMCID PMC3006126

    We use the term "index predictor" to denote a score that consists of K binary rules such as "age > 60" or "blood pressure > 120 mm Hg." The index predictor is the sum of these binary scores, yielding a value from 0 to K. Such indices as often used in clinical studies to stratify population risk:...
  20. Supervised multidimensional scaling for visualization, classification, and bipartite ranking

    Computational Statistics & Data Analysis 55(1):789 (2011)

    Least squares multidimensional scaling (MDS) is a classical method for representing a n × n dissimilarity matrix D . One seeks a set of configuration points z ...