STU3 Python APIs

Bunsen offers Python APIs for PySpark users working with FHIR datasets. This includes basic functionality for working with FHIR Concept Maps, Bundles and Valuesets.

FHIR Bundles

Support for loading FHIR bundles into Bunsen. This includes the following features:

  • Allow users to load bundles from a given location
  • Convert bundle entries into Spark Dataframes
  • Save all entities with a bundle collection to a distinct table for each (e.g., an observation table, a condition table, and so on.)
  • Converts the results of a Bunsen query back into bundles that can then be used elsewhere.

See the methods below for details.

bunsen.stu3.bundles.extract_entry(sparkSession, javaRDD, resourceName)

Returns a dataset for the given entry type from the bundles.

bunsen.stu3.bundles.from_json(df, column)

Takes a dataframe with JSON-encoded bundles in the given column and returns a Java RDD of Bundle records. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.

bunsen.stu3.bundles.from_xml(df, column)

Takes a dataframe with XML-encoded bundles in the given column and returns a Java RDD of Bundle records. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.

bunsen.stu3.bundles.load_from_directory(sparkSession, path, minPartitions=1)

Returns a Java RDD of bundles loaded from the given path. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.

bunsen.stu3.bundles.save_as_database(sparkSession, path, databaseName, *resourceNames, **kwargs)

Loads the bundles in the path and saves them to a database, where each table in the database has the same name of the resource it represents.

bunsen.stu3.bundles.to_bundle(sparkSession, dataset)

Converts a dataset of FHIR resources to a bundle containing those resources. Use with caution against large datasets.

FHIR Valuesets

Support for broadcasting valuesets and using them in user-defined functions in Spark queries.

bunsen.stu3.valuesets.get_current_valuesets(spark_session)

Returns the current valuesets in the same form that is accepted by the push_valuesets function above, that is the structure will follow this pattern: {referenceName: [(codeset, codevalue), (codeset, codevalue)]}

bunsen.stu3.valuesets.isa_loinc(code_value, loinc_version=None)

Returns a hierarchy placeholder that will load all values that are descendents of a given LOINC code.

bunsen.stu3.valuesets.isa_snomed(code_value, snomed_version=None)

Returns a hierarchy placeholder that will load all values that are descendents of a given SNOMED code.

bunsen.stu3.valuesets.pop_valuesets(spark_session)

Pops the current valuesets from the stack, returning true if there remains an active valueset, or false otherwise.

bunsen.stu3.valuesets.push_valuesets(spark_session, valueset_map, database='ontologies')

Pushes valuesets onto a stack and registers an in_valueset user-defined function that uses this content.

The valueset_map takes the form of {referenceName: [(codeset, codevalue), (codeset, codevalue)]} to specify which codesets/values are used for the given valueset reference name.

Rather than explicitly passing a list of (codeset, codevalue) tuples, users may instead load particular value sets or particular hierarchies by providing a ValueSetPlaceholder or HierarchyPlaceholder that instructs the system to load codes belonging to a particular value set or hierarchical system, respectively. See the isa_loinc and isa_snomed functions above for details.

Finally, ontology information is assumed to be stored in the ‘ontologies’ database by default, but users can specify another database name if they have customized ontologies that are separated from the default ontologies database.

APIS for Loading ValueSets and ConceptMaps

Bunsen Python API for working with Code Systems.

bunsen.stu3.codes.create_concept_maps(spark_session)

Creates a new, empty bunsen.codes.ConceptMaps instance.

bunsen.stu3.codes.create_hierarchies(spark_session)

Creates a new, empty bunsen.codes.Hierarchies instance.

bunsen.stu3.codes.create_value_sets(spark_session)

Creates a new, empty bunsen.codes.ValueSets instance.

bunsen.stu3.codes.get_concept_maps(spark_session, database='ontologies')

Returns a bunsen.codes.ConceptMaps instance for the given database.

bunsen.stu3.codes.get_hierarchies(spark_session, database='ontologies')

Returns a bunsen.codes.Hierarchies instance for the given database.

bunsen.stu3.codes.get_value_sets(spark_session, database='ontologies')

Returns a bunsen.codes.ValueSets instance for the given database.

Core library for working with Concept Maps and Value Sets, and hierarchical code systems in Bunsen. See the ConceptMaps class, ValueSets class, and Hierarchies class for details.

class bunsen.codes.ConceptMaps(spark_session, jconcept_maps, jfunctions, java_package)

An immutable collection of FHIR Concept Maps to be used to map value sets. These instances are typically created via the :py:module bunsen.codes.stu3

add_mappings(url, version, mappings)

Returns a new ConceptMaps instance with the given mappings added to an existing map. The mappings parameter must be a list of tuples of the form [(source_system, source_value, target_system, target_value, equivalence)].

get_map_as_xml(url, version)

Returns an XML string containing the specified concept map.

get_mappings(url=None, version=None)

Returns a dataset of all mappings which may be filtered by an optional concept map url and concept map version.

get_maps()

Returns a dataset of FHIR ConceptMaps without the nested mapping content, allowing users to explore mapping metadata.

The mappings themselves are excluded because they can become quite large, so users should use the get_mappings method to explore a table of them.

latest_version(url)

Returns the latest version of a map, or None if there is none.”

with_disjoint_maps_from_directory(path, database='ontologies')

Returns a new ConceptMaps instance with all value sets read from the given directory path that are disjoint with value sets stored in the given database. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.

with_maps_from_directory(path)

Returns a new ConceptMaps instance with all maps read from the given directory path. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.

with_new_map(url, version, source, target, experimental=True, mappings=[])

Returns a new ConceptMaps instance with the given map added. Callers may include a list of mappings tuples in the form of [(source_system, source_value, target_system, target_value, equivalence)].

write_to_database(database)

Writes the mapping content to the given database, creating a mappings and conceptmaps table if they don’t exist.

class bunsen.codes.Hierarchies(spark_session, jhierarchies)

An immutable collection of values from hierarchical code systems to be used for ontologically-based queries.

get_ancestors(url=None, version=None)

Returns a dataset of ancestor values representing the transitive closure of codes in this Hierarchies instance filtered by an optional hierarchy uri and version..

latest_version(uri)

Returns the latest version of a hierarchy, or None if there is none.

write_to_database(database)

Write the ancestor content to the given database, create an ancestors table if they don’t exist.

class bunsen.codes.ValueSets(spark_session, jvalue_sets, jfunctions, java_package)

An immutable collection of FHIR Value Sets to be used to for ontologically-based queries.

add_values(url, version, values)

Returns a new ValueSets instance with the given values added to an existing value set. The values parameter must be a list of the form [(sytem, value)].

get_value_set_as_xml(url, version)

Returns an XML string containing the specified value set.

get_value_sets()

Returns a dataset of FHIR ValueSets without the nested value content, allowing users to explore value set metadata.

The values themselves are excluded because they can be become quite large, so users should use the get_values method to explore them.

get_values(url=None, version=None)

Returns a dataset of all values which may be filtered by an optional value set url and value set version.

latest_version(url)

Returns the latest version of a value set, or None if there is none.

with_disjoint_value_sets_from_directory(path, database='ontologies')

Returns a new ValueSets instance with all value sets read from the given directory path that are disjoint with value sets stored in the given database. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.

with_new_value_set(url, version, experimental=True, values=[])

Returns a new ValueSets instance with the given value set added. Callers may include a list of value tuples in the form of [(system, value)].

with_value_sets_from_directory(path)

Returns a new ValueSets instance with all value sets read from the given directory path. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.

write_to_database(database)

Writes the value set content to the given database, creating a values and valuesets table if they don’t exist.