Multimodal Data Analysis with AWS Health and Machine Learning Services

via aws.amazon.com => original post link

In this blog, we show how you can leverage AWS purpose-built health care and life sciences (HCLS), machine learning (ML), and analytics services to simplify storage and analysis across genomic, health records, and medical imaging data for precision health use cases. The included reference architecture is built on AWS HealthOmics, AWS HealthImaging, and AWS HealthLake services which enable you to store these data modalities with a few clicks. You can also create governed databases and tables via AWS Lake Formation, that allows querying across multiple modalities using Amazon Athena. You can then build, train, and deploy ML models with Amazon SageMaker to make real-time, personalized inference on patient outcomes. Finally, you can build custom, interactive dashboards to visualize multimodal data across individual patients and cohorts using Amazon QuickSight.

HCLS customers are seeing a rapid growth in patient-level data. This data is increasing both in size and diversity, with modalities that include genomic, clinical, medical imaging, medical claims, and sensor data. While multimodal data offers a comprehensive view that can improve patient outcomes and care, analyzing multiple modalities at scale to build precision health applications is challenging. First, each modality requires distinct storage infrastructure, like Fast Healthcare Interoperability Resources (FHIR) for clinical records, Digital Imaging and Communications in Medicine (DICOM) for medical imaging, and custom databases for genomic variant and annotation data in Variant Call Format (VCF) files. Second, not all storage modalities are accessible via common query languages like SQL, making it difficult to execute analytical queries across data types. Third, tooling for data science and machine learning is typically not built to handle the domain-specific data infrastructures or data types presented by these modalities, thereby hindering comprehensive analytics. Finally, customers wishing to pilot precision health initiatives have difficulty accessing a coherent dataset across all modalities with enough data points to support ML development and benchmarking.