Provenance¶
The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.
Imports¶
from synapseclient import Activity
Creating an activity object¶
act = Activity(name='clustering',
description='whizzy clustering',
used=['syn1234','syn1235'],
executed='syn4567')
Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567. The used and executed can reference entities in Synapse or URLs.
Alternatively, you can build an activity up piecemeal:
act = Activity(name='clustering', description='whizzy clustering')
act.used(['syn12345', 'syn12346'])
act.executed(
'https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/develop/tests/unit/unit_test_client.py')
Storing entities with provenance¶
The activity can be passed in when storing an Entity to set the Entity’s provenance:
clustered_samples = syn.store(clustered_samples, activity=act)
We’ve now recorded that clustered_samples is the output of our whizzy clustering algorithm applied to the data stored in syn1234 and syn1235.
Recording data source¶
The synapseclient.Synapse.store()
has shortcuts for specifying the used and executed lists directly.
For example, when storing a data entity, it’s a good idea to record its source:
excellent_data = syn.store(excellent_data,
activityName='data-r-us'
activityDescription='downloaded from data-r-us',
used='http://data-r-us.com/excellent/data.xyz')
Activity¶
-
class
synapseclient.activity.
Activity
(name=None, description=None, used=None, executed=None, data={})¶ Represents the provenance of a Synapse Entity.
- Parameters
name – name of the Activity
description – a short text description of the Activity
used –
Either a list of: - reference objects
(e.g.
[{'targetId':'syn123456', 'targetVersionNumber':1}]
)a list of Synapse Entities or Entity IDs
a list of URL’s
executed – A code resource that was executed to generate the Entity.
data – A dictionary representation of an Activity, with fields ‘name’, ‘description’ and ‘used’ (a list of reference objects)
See also: The W3C’s provenance ontology
-
executed
(target=None, targetVersion=None, url=None, name=None)¶ Add a code resource that was executed during the activity. See
synapseclient.activity.Activity.used()
-
used
(target=None, targetVersion=None, wasExecuted=None, url=None, name=None)¶ Add a resource used by the activity.
This method tries to be as permissive as possible. It accepts a string which might be a synapse ID or a URL, a synapse entity, a UsedEntity or UsedURL dictionary or a list containing any combination of these.
In addition, named parameters can be used to specify the fields of either a UsedEntity or a UsedURL. If target and optionally targetVersion are specified, create a UsedEntity. If url and optionally name are specified, create a UsedURL.
It is an error to specify both target/targetVersion parameters and url/name parameters in the same call. To add multiple UsedEntities and UsedURLs, make a separate call for each or pass in a list.
In case of conflicting settings for wasExecuted both inside an object and with a parameter, the parameter wins. For example, this UsedURL will have wasExecuted set to False:
activity.used({'url':'http://google.com', 'name':'Goog', 'wasExecuted':True}, wasExecuted=False)
Entity examples:
activity.used('syn12345') activity.used(entity) activity.used(target=entity, targetVersion=2) activity.used(codeEntity, wasExecuted=True) activity.used({'reference':{'target':'syn12345', 'targetVersion':1}, 'wasExecuted':False})
URL examples:
activity.used('http://mydomain.com/my/awesome/data.RData') activity.used(url='http://mydomain.com/my/awesome/data.RData', name='Awesome Data') activity.used(url='https://github.com/joe_hacker/code_repo', name='Gnarly hacks', wasExecuted=True) activity.used({'url':'https://github.com/joe_hacker/code_repo', 'name':'Gnarly hacks'}, wasExecuted=True)
List example:
activity.used(['syn12345', 'syn23456', entity, {'reference':{'target':'syn100009', 'targetVersion':2}, 'wasExecuted':True}, 'http://mydomain.com/my/awesome/data.RData'])