Statistician / Machine Learning specialist
Reasearcher at Fred Hutch cancer center.
Teaching Data-Science and Machine learning courses at UW Professional & Continuing Education certificate program.
Enthusiastic Statistician with expertise of implementing machine learning (resampling, ensemble, tuning, benchmarking) tools for high-dimensional multiplex data structure.
Leveraging higher level scope tools for analysis of convoluted nested datasets, including multi-layer fusion data, n-table dimension reduction methods, and integration of multiple annotation domains.
Expert at data engineering pipelines, utilizing scalable object-oriented tools.
Designing analytical tools for complex nested experimental design, at the meta-analysis level.
Over 15 years of experience in advanced R and Python.
• Machine learning: Utilization of wide range of machine learning models, with a meta aggregator tools for high level scope analysis: tuning, benchmarking, ensemble, and resampling (cross-validation, bootstrap, etc.), including both specific model (xgboost, neural networks, etc.) and automotive multi-model tools.
• Meta-analysis / high level scope tools: Exploiting accessors to nested datasets of non-atomic objects.
• Object-oriented programing: Expertise in Bioconductor S4 object-oriented classes designed for multi-assay genomic data. Developed methods and workflows for convoluted data integration: Vertical / horizontal methods, gene-set-enrichment; meta-analysis.
• Cross-programing language interface tools: Python’s scikit-learn, Spark, pandas, and others. Interface to R via reticulate rpy2 and others.
• Git version control and reproducible research (Markdown). e.g. https://drorberel.github.io/
• Web applications deployment via REST API and Shiny: e.g. https://dror.shinyapps.io/nomogram/
• Developed R packages: Bioc2mlr https://github.com/drorberel/Bioc2mlr. Utility functions to transform Bioconductor’s S4 object oriented genomic classes into meta-aggregator machine learning packages (caret, mlr,…). Customized feature engineering pipelines in a composable ‘monad’ fashion.
• Also familiar with BUGS / STAN, SAS, JMP, SPSS, Visual Basic, Java, Matlab, SQL, UNIX.
• Longtime member of the American Statistician Association (ASA); Blogger; Active member at local meetup groups (R, data-science); Vice president of education at a Toastmasters club (International club for public speaking). Teaching applied statistics and machine learning at UW professional certificate program.