Core Python APIs

Bunsen offers Python APIs for PySpark users working with FHIR datasets. This includes basic functionality for working with FHIR Concept Maps, Bundles and Valuesets.

FHIR Bundles

Support for loading FHIR bundles into Bunsen. This includes the following features:

  • Allow users to load bundles from a given location
  • Convert bundle entries into Spark Dataframes
  • Save all entities with a bundle collection to a distinct table for each (e.g., an observation table, a condition table, and so on.)
  • Converts the results of a Bunsen query back into bundles that can then be used elsewhere.

See the methods below for details.

bunsen.bundles.extract_entry(sparkSession, javaRDD, resourceName)

Returns a dataset for the given entry type from the bundles.

bunsen.bundles.load_from_directory(sparkSession, path, minPartitions=1)

Returns a Java RDD of bundles loaded from the given path. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.

bunsen.bundles.save_as_database(sparkSession, path, databaseName, *resourceNames, **kwargs)

Loads the bundles in the path and saves them to a database, where each table in the database has the same name of the resource it represents.

bunsen.bundles.to_bundle(sparkSession, dataset)

Converts a dataset of FHIR resources to a bundle containing those resources. Use with caution against large datasets.

FHIR Valuesets

Support for broadcasting valuesets and using them in user-defined functions in Spark queries.


Returns the current valuesets in the same form that is accepted by the push_valuesets function above, that is the structure will follow this pattern: {referenceName: [(codeset, codevalue), (codeset, codevalue)]}

bunsen.valuesets.isa_loinc(code_value, loinc_version=None)

Returns a hierarchy placeholder that will load all values that are descendents of a given LOINC code.

bunsen.valuesets.isa_snomed(code_value, snomed_version=None)

Returns a hierarchy placeholder that will load all values that are descendents of a given SNOMED code.


Pops the current valuesets from the stack, returning true if there remains an active valueset, or false otherwise.

bunsen.valuesets.push_valuesets(spark_session, valueset_map, database='ontologies')

Pushes valuesets onto a stack and registers an in_valueset user-defined function that uses this content.

The valueset_map takes the form of {referenceName: [(codeset, codevalue), (codeset, codevalue)]} to specify which codesets/values are used for the given valueset reference name.

Rather than explicitly passing a list of (codeset, codevalue) tuples, users may instead load particular value sets or particular hierarchies by providing a ValueSetPlaceholder or HierarchyPlaceholder that instructs the system to load codes belonging to a particular value set or hierarchical system, respectively. See the isa_loinc and isa_snomed functions above for details.

Finally, ontology information is assumed to be stored in the ‘ontologies’ database by default, but users can specify another database name if they have customized ontologies that are separated from the default ontologies database.