John J. Gosink
I have worked in the field of bioinformatics for over 10 years. In 2011 I decided to improve my quantitative and analytic skills by taking a break from my job and enrolling in Texas A&M’s online masters program in statistics. Consider it an extended sabbatical. A deeper understanding of statistical theory and modeling will allow me to work on interesting questions, potentially in a variety of different fields. In the meantime I’m also working through a number of side projects in computer science and data analysis. I plan start doing limited consulting as my classwork winds down.
My doctoral work explored the ecology, phylogeny and biogeography of a group of arctic and Antarctic sea ice bacteria. In addition to field research, I applied phylogenetic techniques to evaluate the relationship of bacterial isolates from both poles. Subsequently, I obtained a post-doctoral fellowship investigating HIV and SIV evolution and its correlation with molecular determinants of infectivity and tissue compartmentalization. It was during this time that I realized that I could leverage my laboratory experience, DNA sequence analysis skills, and previous undergraduate computer science courses to land a position in the new and exciting field of bioinformatics. With some searching and a bit of luck I found a second post-doctoral position through Novo Nordisk in the bioinformatics group at ZymoGenetics.
Until recently I was a Senior Scientist in the Computational Biology group at Amgen. I analyzed a range of data types from a variety of biomarker platforms from phase 1 and phase 2 inflammation and oncology clinical drug trials. We were interested in finding molecular markers or panels of such markers that predict patient response to the drugs and/or are predictive of the pharmacodynamic effect of the drugs in a given patient. A recent, particularly interesting project involved designing methods to graphically, statistically and programmatically model, in a nonparametric way, the accuracy and precision of a number of biological assays and evaluate the efficiency with which they can report on the clinical endpoints in the study. I was also part of a small team at Amgen that built an infrastructure to collect, annotate, marshal, run quality assessment on, clean, and analyze large sets of high-dimensional flow cytometry data. I used this system, along with additional tools I built in the statistical programming language R, to detect both subtle and profound multidimensional changes within the data of complex flow cytometry experiments. These results were used both to improve the experimental design and to look for pharmacokinetic biomarkers in the form of temporally ordered shifts in a variety of cell populations.
My current goal is to broaden my understanding of the statistical models used to analyze large and complex data sets. Several classes offered by the Texas A&M statistics department that will be particularly useful in this endeavor include: multivariate analysis, Bayesian and Markov models, and classification techniques. I also have an interest in non-parametric techniques, perhaps stemming from past encounters with unusually distributed and sparse data or from my efforts to create my own methods to analyze them. Beyond that are a variety of things that I want to learn more about such as: image analysis, natural language processing, machine learning in general, Python, relational database design, graph theory, and many others. My sabbatical gives me the opportunity to pry open the space for me to learn about the things that have interested me for a long time.