Scenario: You have come up with an exciting hypothesis, and now you are keen to find and analyze as much data as possible to prove (or refute) it. There are many datasets that might be applicable, but they have been created at different times by different people and don’t conform to any common standard. They use different names for variables that mean the same thing and the same names for variables that mean different things. They use different units of measurement and different categories. Some have more variables than others. And they all have data quality issues.

In this session, we explore some of the challenges involved in doing research across multiple datasets. We offer an architecture to support dataset harmonization, search, analysis, and sharing of results or insights, using a combination of managed and serverless services such as Jupyter Notebooks and Apache Spark on Amazon EMR, Amazon Elasticsearch, Amazon S3, Amazon Athena, & Amazon Quicksight.

Dr. Taha Kass-Hout, representing the American Heart Association (AHA), will take the podium to describe how AHA and AWS have worked together to implement these techniques in the recently launched AHA Precision Medicine Platform (https://precision.heart.org), an initiative that brings together researchers and practitioners from around the globe to access, analyze, and share volumes of cardiovascular and stroke data to accelerate clinical and population health research and generate evidence around the care of patients at risk of cardio-vascular disease – the number one killer in the United States and a leading global health threat.

This video is from the fine folks at Amazon Web Services (AWS).