Provenance

The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.

Imports

from synapseclient import Activity

Creating an activity object

act = Activity(name='clustering',
               description='whizzy clustering',
               used=['syn1234','syn1235'],
               executed='syn4567')

Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567. The used and executed can reference entities in Synapse or URLs.

Alternatively, you can build an activity up piecemeal:

act = Activity(name='clustering', description='whizzy clustering')
act.used(['syn12345', 'syn12346'])
act.executed(
    'https://raw.githubusercontent.com/Sage-Bionetworks/synapsePythonClient/develop/tests/unit/unit_test_client.py')

Storing entities with provenance

The activity can be passed in when storing an Entity to set the Entity’s provenance:

clustered_samples = syn.store(clustered_samples, activity=act)

We’ve now recorded that clustered_samples is the output of our whizzy clustering algorithm applied to the data stored in syn1234 and syn1235.

Recording data source

The synapseclient.Synapse.store() has shortcuts for specifying the used and executed lists directly. For example, when storing a data entity, it’s a good idea to record its source:

excellent_data = syn.store(excellent_data,
                           activityName='data-r-us'
                           activityDescription='downloaded from data-r-us',
                           used='http://data-r-us.com/excellent/data.xyz')

Activity

class synapseclient.activity.Activity(name=None, description=None, used=None, executed=None, data={})

Represents the provenance of a Synapse Entity.

Parameters
  • name – name of the Activity

  • description – a short text description of the Activity

  • used

    Either a list of: - reference objects

    (e.g. [{'targetId':'syn123456', 'targetVersionNumber':1}])

    • a list of Synapse Entities or Entity IDs

    • a list of URL’s

  • executed – A code resource that was executed to generate the Entity.

  • data – A dictionary representation of an Activity, with fields ‘name’, ‘description’ and ‘used’ (a list of reference objects)

See also: The W3C’s provenance ontology

executed(target=None, targetVersion=None, url=None, name=None)

Add a code resource that was executed during the activity. See synapseclient.activity.Activity.used()

used(target=None, targetVersion=None, wasExecuted=None, url=None, name=None)

Add a resource used by the activity.

This method tries to be as permissive as possible. It accepts a string which might be a synapse ID or a URL, a synapse entity, a UsedEntity or UsedURL dictionary or a list containing any combination of these.

In addition, named parameters can be used to specify the fields of either a UsedEntity or a UsedURL. If target and optionally targetVersion are specified, create a UsedEntity. If url and optionally name are specified, create a UsedURL.

It is an error to specify both target/targetVersion parameters and url/name parameters in the same call. To add multiple UsedEntities and UsedURLs, make a separate call for each or pass in a list.

In case of conflicting settings for wasExecuted both inside an object and with a parameter, the parameter wins. For example, this UsedURL will have wasExecuted set to False:

activity.used({'url':'http://google.com', 'name':'Goog', 'wasExecuted':True}, wasExecuted=False)

Entity examples:

activity.used('syn12345')
activity.used(entity)
activity.used(target=entity, targetVersion=2)
activity.used(codeEntity, wasExecuted=True)
activity.used({'reference':{'target':'syn12345', 'targetVersion':1}, 'wasExecuted':False})

URL examples:

activity.used('http://mydomain.com/my/awesome/data.RData')
activity.used(url='http://mydomain.com/my/awesome/data.RData', name='Awesome Data')
activity.used(url='https://github.com/joe_hacker/code_repo', name='Gnarly hacks', wasExecuted=True)
activity.used({'url':'https://github.com/joe_hacker/code_repo', 'name':'Gnarly hacks'}, wasExecuted=True)

List example:

activity.used(['syn12345', 'syn23456', entity,                           {'reference':{'target':'syn100009', 'targetVersion':2}, 'wasExecuted':True},                           'http://mydomain.com/my/awesome/data.RData'])