Statistics Seminar
Victoria StoddenUniversity of Illinois at Urbana-Champaign
Structuring machine learning research in data driven science
Wednesday, November 1, 2017 - 4:15pm
Biotech G01
Statistical discovery is increasingly taking place using data not collected by the discoverers and often completely in silico. This calls on new considerations of methods and computational infrastructure that support statistical pipelines. In this talk I present a novel framework for statistical analysis of "organic data" as opposed to "designed data" (Kreuter & Peng 2014) called CompareML that permits the direct comparison of findings that purport to answer the same statistical question. I will argue that such computational frameworks are crucial to reproducible science by way of an example from genomics (acute leukemia (Golub et al 1999)) where traditional approaches (surprisingly) fail at scale.