Avatar

Sidney Le

Data Scientist

Dascena, Inc.

Biography

Sidney Le is a data scientist and applied statistician, most recently at Dascena, Inc., developing clinical decision support systems that utilize machine learning, domain adaptation, and NLP.

Interests

  • Machine learning, with applications in the sociological space
  • NLP and its predictive capabilities
  • Social impact

Education

  • BA in Statistics, 2018

    University of California, Berkeley

Experience

 
 
 
 
 

Data Scientist

Dascena, Inc.

Jan 2019 – Feb 2020 Oakland, California

Designed and implemented experiments based on the analysis of predictive models, leveraging machine- and deep-learning in Python alongside large-scale clinical EHR data, including unstructured text, to drive novel health research. Consulted on statistical matters across the company, particularly on clinical trial design and hypothesis testing, which included A/B testing. Built data processing and analysis functions into Python codebase.

Wrote and published technical papers to demonstrate novelty and significance of experimental results; developed technical aspects of grants to fund large scientific and engineering projects:

  • Worked cross-functionally to construct research plans and timelines
  • Managed and processed large-scale clinical EHR data for use in analysis using a Linux machine on the AWS cloud computing platform, MongoDB, and PostgreSQL
  • Generated analysis reports and visualizations using Matplotlib/Seaborn
  • Worked across teams, including engineering and sales, in order to communicate data needs and uses

Applied techniques include:

  • transfer and semi-supervised learning,
  • deep learning (RNN, CNN, feature extraction, GAN) (implemented in Keras, Tensorflow, PyTorch)
  • NLP (Doc2Vec)
  • supervised learning (tree ensemble, generalized linear models, regression analysis, SVM)
  • unsupervised learning (clustering, dimensionality reduction)
 
 
 
 
 

Undergraduate Student Instructor (Data 8)

Department of Statistics at UC Berkeley

Aug 2018 – Dec 2018 Berkeley, California
Taught introductory data science as an undergraduate student instructor in one the first, and fastest-growing, introductory data science university courses in the country. Role included responsibilities such as: developing course material; conducting discussion and lab sections, office hours, and exam prep sessions; and advising students on data science as an interdisciplinary field of study.
 
 
 
 
 

Research Associate

Goodly Labs

Feb 2018 – Jan 2019 Berkeley, California
Worked with teams of sociologists and students to develop research and social good products. Lead project development and determined technical goals and timeline for Demo Watch, formerly Deciding Force. Developed machine-learning pipeline utilizing clustering and NLP to extract sociological insight from user-generated data.
 
 
 
 
 

Data Consultant

Statistics Undergraduate Student Association at UC Berkeley

Aug 2017 – May 2018 Berkeley, California
Built predictive models for food need in Alameda County in collaboration with the Alameda County Community Food Bank. Participated in the CTSP Data for Good competition; evaluated affordable housing policy initiatives and research in California and produced unique tools and indexes in order to recommend solutions for the affordable housing crisis.

Skills

R

Fluent

Python

Fluent

Machine/Deep Learning

Fluent

Statistics

Fluent

SQL and PostgreSQL

Familiar

Spark and PySpark

Familiar

NLP

Familiar

Tableau

Familiar

Accomplish­ments

Data Scientist Certification

Workera evaluates skills used by machine learning engineers, data scientists, and software engineers in their work. It is designed to assess a candidate’s ability to perform tasks such as data engineering, modeling, deployment, business analysis, and AI infrastructure rather than test for knowledge.

Sidney placed in the:

  • 96th percentile for Machine Learning
  • 88th percentile for Data Science
  • 94th percentile for Algorithmic Coding
  • 60th percentile for Deep Learning
  • 98th percentile for Mathematics
See certificate

Star Research Achievement Award

Developed novel sepsis prediction technology at Dascena as the lead author, which was submitted to the Society for Critical Care Medicine. Abstract was awarded the Star Research Achievement Award and we were invited to present at the Critical Care Congress.

Data For Good Competition – Runner Up

In collaboration with Facebook, the Center for Technology, Society & Policy at UC Berkeley hosted a data science competition aimed at building data-driven solutions for social good issues. Teams of undergraduates, graduates, and members of industry submitted proposals, four of which were selected for seed grants. My team’s proposal, along with three others, were selected to compete for the final prize. Our project, which analyzed past affordable housing initiatives in California and built on existing research to create policy directives, was presented in front of a panel of judges, which included the Public Policy Research Manager at Facebook. The work that we produced was awarded second place.

Design Showcase

Selected by Professor Robert J. Full to present conceptual product design work based on biologically-derived processes. All student-designed projects in any class at the Jacobs Institute were considered. Only approximately 8 groups out of 100 were given spots in the design showcase. Attendees were faculty, members of industry, and members of the public.

Publications

Effect of a Sepsis Prediction Algorithm on Patient Mortality, Length of Stay and Readmission: A Prospective Multicentre Clinical Outcomes Evaluation of Real-World Patient Data From US Hospitals

The purpose of this study was to evaluate the effect of a machine learning algorithm for severe sepsis prediction on in-hospital …

24: EFFECTS OF MONOCYTE DISTRIBUTION WIDTH AND WHITE BLOOD CELL COUNT ON A SEPSIS PREDICTION ALGORITHM

Severe sepsis and septic shock are among the leading causes of death in the US, and early prediction can reduce adverse patient …

Mortality, disease progression, and disease burden of acute kidney injury in alcohol use disorder subpopulation

Objective: The objective of this study is to quantify the relationship between acute kidney injury (AKI) and alcohol use disorder …

Multicenter validation of a machine-learning algorithm for 48-h all-cause mortality prediction

In order to evaluate mortality predictions based on boosted trees, this retrospective study uses electronic medical record data from …

Pediatric Severe Sepsis Prediction Using Machine Learning

Early detection of pediatric severe sepsis is necessary in order to optimize effective treatment, and new methods are needed to …

Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs

Sepsis remains a costly and prevalent syndrome in hospitals; however, machine learning systems can increase timely sepsis detection …

Contact