STU3 Python APIs¶
Bunsen offers Python APIs for PySpark users working with FHIR datasets. This includes basic functionality for working with FHIR Concept Maps, Bundles and Valuesets.
FHIR Bundles¶
Support for loading FHIR bundles into Bunsen. This includes the following features:
Allow users to load bundles from a given location
Convert bundle entries into Spark Dataframes
Save all entities with a bundle collection to a distinct table for each (e.g., an observation table, a condition table, and so on.)
Converts the results of a Bunsen query back into bundles that can then be used elsewhere.
See the methods below for details.
-
bunsen.stu3.bundles.
extract_entry
(sparkSession, javaRDD, resourceTypeUrl)¶ Returns a dataset for the given entry type from the bundles.
- Parameters
sparkSession – the SparkSession instance
javaRDD – the RDD produced by
load_from_directory()
or other methods in this packageresourceTypeUrl – the type of the FHIR resource to extract (Condition, Observation, etc, for the base profile, or the URL of the structure definition)
- Returns
a DataFrame containing the given resource encoded into Spark columns
-
bunsen.stu3.bundles.
from_json
(df, column)¶ Takes a dataframe with JSON-encoded bundles in the given column and returns a Java RDD of Bundle records. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.
- Parameters
df – a DataFrame containing bundles to decode
column – the column in which the bundles to decode are stored
- Returns
a Java RDD of bundles for use with
extract_entry()
-
bunsen.stu3.bundles.
from_xml
(df, column)¶ Takes a dataframe with XML-encoded bundles in the given column and returns a Java RDD of Bundle records. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like extract_entry.
- Parameters
df – a DataFrame containing bundles to decode
column – the column in which the bundles to decode are stored
- Returns
a Java RDD of bundles for use with
extract_entry()
-
bunsen.stu3.bundles.
load_from_directory
(sparkSession, path, minPartitions=1)¶ Returns a Java RDD of bundles loaded from the given path. Note this RDD contains Bundle records that aren’t serializable in Python, so users should use this class as merely a parameter to other methods in this module, like
extract_entry()
.- Parameters
sparkSession – the SparkSession instance
path – path to directory of FHIR bundles to load
- Returns
a Java RDD of bundles for use with
extract_entry()
-
bunsen.stu3.bundles.
to_bundle
(sparkSession, dataset, resourceTypeUrl)¶ Converts a dataset of FHIR resources to a bundle containing those resources. Use with caution against large datasets.
- Parameters
sparkSession – the SparkSession instance
dataset – a DataFrame of encoded FHIR Resources
resourceTypeUrl – the type of the FHIR resource to extract (Condition, Observation, etc, for the base profile, or the URL of the structure definition)
- Returns
a JSON bundle of the dataset contents
-
bunsen.stu3.bundles.
write_to_database
(sparkSession, javaRDD, databaseName, resourceTypeUrls)¶ Writes the bundles in the give RDD and saves them to a database, where each table in the database has the same name of the resource it represents.
- Parameters
sparkSession – the SparkSession instance
javaRDD – the RDD produced by
load_from_directory()
or other methods in this packagedatabaseName – name of the database to write the resources to
resourceTypeUrls – the types of the FHIR resource to extract (Condition, Observation, etc, for the base profile, or the URL of the structure definition)
FHIR Valuesets¶
Support for broadcasting valuesets and using them in user-defined functions in Spark queries.
-
bunsen.stu3.valuesets.
get_current_valuesets
(spark_session)¶ Returns the current valuesets in the same form that is accepted by the push_valuesets function above, that is the structure will follow this pattern: {referenceName: [(codeset, codevalue), (codeset, codevalue)]}
- Parameters
spark_session – the SparkSession instance
- Returns
a map containing the valuesets currently published to the cluster
-
bunsen.stu3.valuesets.
isa_loinc
(code_value, loinc_version=None)¶ Returns a hierarchy placeholder that will load all values that are descendents of a given LOINC code.
- Parameters
code_value – the parent code value
loinc_version – the version of LOINC to use (uses latest if None is given)
- Returns
a placeholder for use with
push_valuesets()
-
bunsen.stu3.valuesets.
isa_snomed
(code_value, snomed_version=None)¶ Returns a hierarchy placeholder that will load all values that are descendents of a given SNOMED code.
- Parameters
code_value – the parent code value
loinc_version – the version of SNOMED to use (uses latest if None is given)
- Returns
a placeholder for use with
push_valuesets()
-
bunsen.stu3.valuesets.
pop_valuesets
(spark_session)¶ Pops the current valuesets from the stack, returning true if there remains an active valueset, or false otherwise.
- Parameters
spark_session – the SparkSession instance
-
bunsen.stu3.valuesets.
push_valuesets
(spark_session, valueset_map, database='ontologies')¶ Pushes valuesets onto a stack and registers an in_valueset user-defined function that uses this content.
The valueset_map takes the form of {referenceName: [(codeset, codevalue), (codeset, codevalue)]} to specify which codesets/values are used for the given valueset reference name.
Rather than explicitly passing a list of (codeset, codevalue) tuples, users may instead load particular value sets or particular hierarchies by providing a ValueSetPlaceholder or HierarchyPlaceholder that instructs the system to load codes belonging to a particular value set or hierarchical system, respectively. See the isa_loinc and isa_snomed functions above for details.
Finally, ontology information is assumed to be stored in the ‘ontologies’ database by default, but users can specify another database name if they have customized ontologies that are separated from the default ontologies database.
- Parameters
spark_session – the SparkSession instance
valueset_map – a map containing value set structures to publish
database – the database from which value set data is loaded
-
bunsen.stu3.valuesets.
valueset
(valueset_uri, valueset_version)¶ Creates a placeholder specifying a specific valueset for use with
push_valuesets()
.- Parameters
valueset_uri – the URI of the valueset
valueset_version – the version of the valueset
- Returns
a placeholder for use with
push_valuesets()
APIS for Loading ValueSets and ConceptMaps¶
Bunsen Python API for working with Code Systems.
-
bunsen.stu3.codes.
create_concept_maps
(spark_session)¶ Creates a new, empty
bunsen.codes.ConceptMaps
instance.- Returns
an empty
bunsen.codes.ConceptMaps
instance
-
bunsen.stu3.codes.
create_hierarchies
(spark_session)¶ Creates a new, empty
bunsen.codes.Hierarchies
instance.- Returns
an empty
bunsen.codes.Hierarchies
instance
-
bunsen.stu3.codes.
create_value_sets
(spark_session)¶ Creates a new, empty
bunsen.codes.ValueSets
instance.- Returns
an empty
bunsen.codes.ValueSets
instance
-
bunsen.stu3.codes.
get_concept_maps
(spark_session, database='ontologies')¶ Returns a
bunsen.codes.ConceptMaps
instance for the given database.- Parameters
database – the database containing the concept maps to load
- Returns
a
bunsen.codes.ConceptMaps
with the loaded maps
-
bunsen.stu3.codes.
get_hierarchies
(spark_session, database='ontologies')¶ Returns a
bunsen.codes.Hierarchies
instance for the given database.- Parameters
database – the database containing the hierarchies to load
- Returns
a
bunsen.codes.Hierarchies
with the loaded value sets
-
bunsen.stu3.codes.
get_value_sets
(spark_session, database='ontologies')¶ Returns a
bunsen.codes.ValueSets
instance for the given database.- Parameters
database – the database containing the value sets to load
- Returns
a
bunsen.codes.ValueSets
with the loaded value sets
Core library for working with Concept Maps
and Value Sets, and hierarchical code systems
in Bunsen. See the ConceptMaps
class,
ValueSets
class, and Hierarchies
class for details.
-
class
bunsen.codes.
ConceptMaps
(spark_session, jconcept_maps, jfunctions, java_package)¶ An immutable collection of FHIR Concept Maps to be used to map value sets. These instances are typically created via the :py:module bunsen.codes.stu3
-
add_mappings
(url, version, new_version, mappings)¶ Returns a new ConceptMaps instance with the given mappings added to an existing map. The mappings parameter must be a list of tuples of the form [(source_system, source_value, target_system, target_value, equivalence)].
- Parameters
url – URL of the ConceptMap to add mappings to
version – Version of the ConceptMap to add mappings to
new_version – Version of the updated ConceptMap to which new mappings have been added
mappings – A list of tuples representing the mappings to add
- Returns
a
ConceptMaps
instance with the added mappings
-
get_map_as_xml
(url, version)¶ Returns an XML string containing the specified concept map.
- Parameters
url – URL of the ConceptMap to return
version – Version of the ConceptMap to return
- Returns
a string containing the ConceptMap in XML form
-
get_mappings
(url=None, version=None)¶ Returns a dataset of all mappings which may be filtered by an optional concept map url and concept map version.
- Parameters
url – Optional URL of the mappings to return
version – Optional version of the mappings to return
- Returns
a DataFrame of mapping records
-
get_maps
()¶ Returns a dataset of FHIR ConceptMaps without the nested mapping content, allowing users to explore mapping metadata.
The mappings themselves are excluded because they can become quite large, so users should use the get_mappings method to explore a table of them.
- Returns
a DataFrame of FHIR ConceptMap resources managed by this object
-
latest_version
(url)¶ Returns the latest version of a map, or None if there is none.”
- Parameters
url – the URL identifying a given concept map
- Returns
the version of the given map
-
with_disjoint_maps_from_directory
(path, database='ontologies')¶ Returns a new ConceptMaps instance with all value sets read from the given directory path that are disjoint with value sets stored in the given database. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.
- Parameters
path – Path to directory containing FHIR ConceptMap resources
database – The database in which existing concept maps are stored
- Returns
a
ConceptMaps
instance with the added maps
-
with_maps_from_directory
(path)¶ Returns a new ConceptMaps instance with all maps read from the given directory path. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.
- Parameters
path – Path to directory containing FHIR ConceptMap resources
- Returns
a
ConceptMaps
instance with the added maps
-
with_new_map
(url, version, source, target, experimental=True, mappings=[])¶ Returns a new ConceptMaps instance with the given map added. Callers may include a list of mappings tuples in the form of [(source_system, source_value, target_system, target_value, equivalence)].
- Parameters
url – URL of the ConceptMap to add
version – Version of the ConceptMap to add
source – source URI of the ConceptMap
target – target URI of the ConceptMap
experimental – a Boolean variable indicating whether the map should be labeled as experimental
mappings – A list of tuples representing the mappings to add
- Returns
a
ConceptMaps
instance with the added map
-
write_to_database
(database)¶ Writes the mapping content to the given database, creating a mappings and conceptmaps table if they don’t exist.
- Parameters
database – the database to write the concept maps to
-
-
class
bunsen.codes.
Hierarchies
(spark_session, jhierarchies)¶ An immutable collection of values from hierarchical code systems to be used for ontologically-based queries.
-
get_ancestors
(url=None, version=None)¶ Returns a dataset of ancestor values representing the transitive closure of codes in this Hierarchies instance filtered by an optional hierarchy uri and version.
- Parameters
url – Optional URL of hierarchy to return
version – Optional version of the hierarchy to return
- Returns
a DataFrame of ancestor records
-
latest_version
(uri)¶ Returns the latest version of a hierarchy, or None if there is none.
- Parameters
uri – URI of the concept hierarchy to return
- Returns
the version of the hierarchy, or None if there is none
-
write_to_database
(database)¶ Write the ancestor content to the given database, create an ancestors table if they don’t exist.
- Parameters
database – the database to write the hierarchies to
-
-
class
bunsen.codes.
ValueSets
(spark_session, jvalue_sets, jfunctions, java_package)¶ An immutable collection of FHIR Value Sets to be used to for ontologically-based queries.
-
add_values
(url, version, new_version, values)¶ Returns a new ValueSets instance with the given values added to an existing value set. The values parameter must be a list of the form [(sytem, value)].
- Parameters
url – URL of the ValueSet to add values to
version – Version of the ValueSet to add values to
new_version – Version of the updated ValueSet to which new values have been added
mappings – A list of tuples representing the values to add
- Returns
a
ValueSets
instance with the added values
-
get_value_set_as_xml
(url, version)¶ Returns an XML string containing the specified value set.
- Parameters
url – URL of the ValueSet to return
version – Version of the ValueSet to return
- Returns
a string containing the ValueSet in XML form
-
get_value_sets
()¶ Returns a dataset of FHIR ValueSets without the nested value content, allowing users to explore value set metadata.
The values themselves are excluded because they can be become quite large, so users should use the get_values method to explore them.
- Returns
a dataframe of FHIR ValueSets
-
get_values
(url=None, version=None)¶ Returns a dataset of all values which may be filtered by an optional value set url and value set version.
- Parameters
url – Optional URL of ValueSet to return
version – Optional version of the ValueSet to return
- Returns
a DataFrame of values
-
latest_version
(url)¶ Returns the latest version of a value set, or None if there is none.
- Parameters
url – URL of the ValueSet to return
- Returns
the version of the ValueSet, or None if there is none
-
with_disjoint_value_sets_from_directory
(path, database='ontologies')¶ Returns a new ValueSets instance with all value sets read from the given directory path that are disjoint with value sets stored in the given database. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.
- Parameters
path – Path to directory containing FHIR ValueSet resources
database – The database in which existing value sets are stored
- Returns
a
ValueSets
instance with the added value sets
-
with_new_value_set
(url, version, experimental=True, values=[])¶ Returns a new ValueSets instance with the given value set added. Callers may include a list of value tuples in the form of [(system, value)].
- Parameters
url – URL of the ValueSet to add
version – Version of the ValueSet to add
experimental – a Boolean variable indicating whether the ValueSet should be labeled as experimental
values – A list of tuples representing the values to add
- Returns
a
ValueSets
instance with the added value set.
-
with_value_sets
(df)¶ Returns a new ValueSets instance that includes the ValueSet FHIR resources encoded in the given Spark DataFrame.
- Parameters
df – A Spark DataFrame containing the valueset FHIR resource
- Returns
a
ValueSets
instance with the added value sets
-
with_value_sets_from_directory
(path)¶ Returns a new ValueSets instance with all value sets read from the given directory path. The directory may be anything readable from a Spark path, including local filesystems, HDFS, S3, or others.
- Parameters
path – Path to directory containing FHIR ValueSet resources
- Returns
a
ValueSets
instance with the added value sets
-
write_to_database
(database)¶ Writes the value set content to the given database, creating a values and valuesets table if they don’t exist.
- Parameters
database – the database to write the value sets to
-