spark | Engineering Health

spark

Scalable Data Science with FHIR

Scalable Data Science with FHIR

July 2, 2018

The FHIR standard started as a better way to exchange healthcare data, but it also provides a solid basis for deep analytics and Machine Learning at scale. This post looks at an example from the recent FHIR DevDays conference that does just that. You can also run the interactive FHIR data engineering tutorial used in the conference yourself. Our first step is to bring FHIR data into a data lake – a computational environment where our analysis can easily and efficiently work through petabytes of data.

Announcing Bunsen: FHIR Data with Apache Spark

November 27, 2017

We’re excited to open source Bunsen, a library to make analyzing FHIR data with Apache Spark simple and scalable. Bunsen encodes FHIR resources directly into Apache Spark’s native data structures. This lets users leverage well-defined FHIR data models directly within Spark SQL. Here’s a simple query against a table of FHIR observations that produces a table of heart rate values: spark.sql(""" select subject.reference person_id, effectiveDateTime date_time, valueQuantity.value value from observations where in_valueset(code, 'heart_rate') """).