Overview

The synapseutils package provides both higher level functions as well as utilities for interacting with Synapse. These functionalities include:

copy

synapseutils.copy_functions.changeFileMetaData(syn, entity, downloadAs=None, contentType=None)
Parameters
  • entity – Synapse entity Id or object

  • contentType – Specify content type to change the content type of a filehandle

  • downloadAs – Specify filename to change the filename of a filehandle

Returns

Synapse Entity

Can be used to change the filename or the file content-type without downloading:

file_entity = syn.get(synid)
print(os.path.basename(file_entity.path))  ## prints, e.g., "my_file.txt"
file_entity = synapseutils.changeFileMetaData(syn, file_entity, "my_new_name_file.txt")
synapseutils.copy_functions.copy(syn, entity, destinationId, skipCopyWikiPage=False, skipCopyAnnotations=False, **kwargs)
  • This function will assist users in copying entities (Tables, Links, Files, Folders, Projects), and will recursively copy everything in directories.

  • A Mapping of the old entities to the new entities will be created and all the wikis of each entity will also be copied over and links to synapse Ids will be updated.

Parameters
  • syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse

  • entity – A synapse entity ID

  • destinationId – Synapse ID of a folder/project that the copied entity is being copied to

  • skipCopyWikiPage – Skip copying the wiki pages Default is False

  • skipCopyAnnotations – Skips copying the annotations Default is False

Examples:: import synapseutils import synapseclient syn = synapseclient.login() synapseutils.copy(syn, …)

Examples and extra parameters unique to each copy function – COPYING FILES

Parameters
  • version – Can specify version of a file. Default to None

  • updateExisting – When the destination has an entity that has the same name, users can choose to update that entity. It must be the same entity type Default to False

  • setProvenance – Has three values to set the provenance of the copied entity: traceback: Sets to the source entity existing: Sets to source entity’s original provenance (if it exists) None: No provenance is set

Examples::

synapseutils.copy(syn, “syn12345”, “syn45678”, updateExisting=False, setProvenance = “traceback”,version=None)

– COPYING FOLDERS/PROJECTS

Parameters

excludeTypes – Accepts a list of entity types (file, table, link) which determines which entity types to not copy. Defaults to an empty list.

Examples:: #This will copy everything in the project into the destinationId except files and tables. synapseutils.copy(syn, “syn123450”,”syn345678”,excludeTypes=[“file”,”table”])

Returns

a mapping between the original and copied entity: {‘syn1234’:’syn33455’}

synapseutils.copy_functions.copyFileHandles(syn, fileHandles, associateObjectTypes, associateObjectIds, newContentTypes=None, newFileNames=None)

Given a list of fileHandle Ids or Objects, copy the fileHandles

Parameters
  • syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()

  • fileHandles – List of fileHandle Ids or Objects

  • associateObjectTypes – List of associated object types: FileEntity, TableEntity, WikiAttachment, UserProfileAttachment, MessageAttachment, TeamAttachment, SubmissionAttachment, VerificationSubmission (Must be the same length as fileHandles)

  • associateObjectIds – List of associated object Ids: If copying a file, the objectId is the synapse id, and if copying a wiki attachment, the object id is the wiki subpage id. (Must be the same length as fileHandles)

  • newContentTypes – (Optional) List of content types. Set each item to a new content type for each file handle, or leave the item as None to keep the original content type. Default None, which keeps all original content types.

  • newFileNames – (Optional) List of filenames. Set each item to a new filename for each file handle, or leave the item as None to keep the original name. Default None, which keeps all original file names.

Returns

List of batch filehandle copy results, can include failureCodes: UNAUTHORIZED and NOT_FOUND

Raises

ValueError – If length of all input arguments are not the same

synapseutils.copy_functions.copyWiki(syn, entity, destinationId, entitySubPageId=None, destinationSubPageId=None, updateLinks=True, updateSynIds=True, entityMap=None)

Copies wikis and updates internal links

Parameters
  • syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse

  • entity – A synapse ID of an entity whose wiki you want to copy

  • destinationId – Synapse ID of a folder/project that the wiki wants to be copied to

  • updateLinks – Update all the internal links. (e.g. syn1234/wiki/34345 becomes syn3345/wiki/49508) Defaults to True

  • updateSynIds – Update all the synapse ID’s referenced in the wikis. (e.g. syn1234 becomes syn2345) Defaults to True but needs an entityMap

  • entityMap – An entity map {‘oldSynId’,’newSynId’} to update the synapse IDs referenced in the wiki. Defaults to None

  • entitySubPageId – Can specify subPageId and copy all of its subwikis Defaults to None, which copies the entire wiki subPageId can be found: https://www.synapse.org/#!Synapse:syn123/wiki/1234 In this case, 1234 is the subPageId.

  • destinationSubPageId – Can specify destination subPageId to copy wikis to Defaults to None

Returns

A list of Objects with three fields: id, title and parentId.

walk

synapseutils.walk.walk(syn, synId)

Traverse through the hierarchy of files and folders stored under the synId. Has the same behavior as os.walk()

Parameters
  • syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse

  • synId – A synapse ID of a folder or project

Example:

walkedPath = walk(syn, "syn1234")

for dirpath, dirname, filename in walkedPath:
    print(dirpath)
    print(dirname) #All the folders in the directory path
    print(filename) #All the files in the directory path

sync

synapseutils.sync.generateManifest(syn, allFiles, filename, provenance_cache=None)

Generates a manifest file based on a list of entities objects.

Parameters
  • allFiles – A list of File Entities

  • filename – file where manifest will be written

  • provenance_cache – an optional dict of known provenance dicts keyed by entity ids

synapseutils.sync.readManifestFile(syn, manifestFile)

Verifies a file manifest and returns a reordered dataframe ready for upload.

Parameters
  • syn – A synapse object as obtained with syn = synapseclient.login()

  • manifestFile – A tsv file with file locations and metadata to be pushed to Synapse. See below for details

:returns A pandas dataframe if the manifest is validated.

See also for a description of the file format:
synapseutils.sync.syncFromSynapse(syn, entity, path=None, ifcollision='overwrite.local', allFiles=None, followLink=False, manifest='all', downloadFile=True)

Synchronizes all the files in a folder (including subfolders) from Synapse and adds a readme manifest with file metadata.

Parameters
  • syn – A synapse object as obtained with syn = synapseclient.login()

  • entity – A Synapse ID, a Synapse Entity object of type file, folder or project.

  • path – An optional path where the file hierarchy will be reproduced. If not specified the files will by default be placed in the synapseCache.

  • ifcollision – Determines how to handle file collisions. Maybe “overwrite.local”, “keep.local”, or “keep.both”. Defaults to “overwrite.local”.

  • followLink – Determines whether the link returns the target Entity. Defaults to False

  • manifest – Determines whether creating manifest file automatically. The optional values here (“all”, “root”, “suppress”).

:param downloadFile Determines whether downloading the files.

Defaults to True

Returns

list of entities (files, tables, links)

This function will crawl all subfolders of the project/folder specified by entity and download all files that have not already been downloaded. If there are newer files in Synapse (or a local file has been edited outside of the cache) since the last download then local the file will be replaced by the new file unless “ifcollision” is changed.

If the files are being downloaded to a specific location outside of the Synapse cache a file (SYNAPSE_METADATA_MANIFEST.tsv) will also be added in the path that contains the metadata (annotations, storage location and provenance of all downloaded files).

See also: - synapseutils.sync.syncToSynapse()

Example: Download and print the paths of all downloaded files:

entities = syncFromSynapse(syn, "syn1234")
for f in entities:
    print(f.path)
synapseutils.sync.syncToSynapse(syn, manifestFile, dryRun=False, sendMessages=True, retries=4)

Synchronizes files specified in the manifest file to Synapse

Parameters
  • syn – A synapse object as obtained with syn = synapseclient.login()

  • manifestFile – A tsv file with file locations and metadata to be pushed to Synapse. See below for details

  • dryRun – Performs validation without uploading if set to True (default is False)

Given a file describing all of the uploads uploads the content to Synapse and optionally notifies you via Synapse messagging (email) at specific intervals, on errors and on completion.

Manifest file format

The format of the manifest file is a tab delimited file with one row per file to upload and columns describing the file. The minimum required columns are path and parent where path is the local file path and parent is the Synapse Id of the project or folder where the file is uploaded to. In addition to these columns you can specify any of the parameters to the File constructor (name, synapseStore, contentType) as well as parameters to the syn.store command (used, executed, activityName, activityDescription, forceVersion). Used and executed can be semi-colon (“;”) separated lists of Synapse ids, urls and/or local filepaths of files already stored in Synapse (or being stored in Synapse by the manifest). Any additional columns will be added as annotations.

Required fields:

Field

Meaning

Example

path

local file path or URL

/path/to/local/file.txt

parent

synapse id

syn1235

Common fields:

Field

Meaning

Example

name

name of file in Synapse

Example_file

forceVersion

whether to update version

False

Provenance fields:

Field

Meaning

Example

used

List of items used to generate file

syn1235; /path/to_local/file.txt

executed

List of items exectued

https://github.org/; /path/to_local/code.py

activityName

Name of activity in provenance

“Ran normalization”

activityDescription

Text description on what was done

“Ran algorithm xyx with parameters…”

Annotations:

Annotations:

Any columns that are not in the reserved names described above will be interpreted as annotations of the file

Other optional fields:

Field

Meaning

Example

synapseStore

Boolean describing whether to upload files

True

contentType

content type of file to overload defaults

text/html

Example manifest file

path

parent

annot1

annot2

used

executed

/path/file1.txt

syn1243

“bar”

3.1415

“syn124; /path/file2.txt”

https://github.org/foo/bar

/path/file2.txt

syn12433

“baz”

2.71

“”

https://github.org/foo/baz

monitor

synapseutils.monitor.notifyMe(syn, messageSubject='', retries=0)

Function decorator that notifies you via email whenever an function completes running or there is a failure.

Parameters
  • syn – A synapse object as obtained with syn = synapseclient.login()

  • messageSubject – A string with subject line for sent out messages.

  • retries – Number of retries to attempt on failure (default=0)

Example:

# to decorate a function that you define
from synapseutils import notifyMe
import synapseclient
syn = synapseclient.login()

@notifyMe(syn, 'Long running function', retries=2)
def my_function(x):
    doing_something()
    return long_runtime_func(x)

my_function(123)

#############################
# to wrap a function that already exists
from synapseutils import notifyMe
import synapseclient
syn = synapseclient.login()

notify_decorator = notifyMe(syn, 'Long running query', retries=2)
my_query = notify_decorator(syn.tableQuery)
results = my_query("select id from syn1223")

#############################
synapseutils.monitor.with_progress_bar(func, totalCalls, prefix='', postfix='', isBytes=False)

Wraps a function to add a progress bar based on the number of calls to that function.

Parameters
  • func – Function being wrapped with progress Bar

  • totalCalls – total number of items/bytes when completed

  • prefix – String printed before progress bar

  • prefix – String printed after progress bar

  • isBytes – A boolean indicating weather to convert bytes to kB, MB, GB etc.

Returns

a wrapped function that contains a progress bar

migrate

class synapseutils.migrate_functions.MigrationResult(syn, db_path)

A MigrationResult is a proxy object to the underlying sqlite db. It provides a programmatic interface that allows the caller to iterate over the file handles that were migrated without having to connect to or know the schema of the sqlite db, and also avoids the potential memory liability of putting everything into an in memory data structure that could be a liability when migrating a huge project of hundreds of thousands/millions of entities.

As this proxy object is not thread safe since it accesses an underlying sqlite db.

as_csv(path)

Output a flat csv file of the contents of the Migration index. Its columns are as follows:

id - the Synapse id type - the concrete type of the entity version - the verson of the file entity (if applicable) row_id - the row of the table attached file (if applicable) col_name - the column name of the column the table attached file resides in (if applicable) from_storage_location_id - the previous storage location id where the file/version was stored from_file_handle_id - the id file handle of the existing file/version to_file_handle_id - if migrated, the new file handle id status - one of INDEXED, MIGRATED, ALREADY_MIGRATED, ERRORED indicating the status of the file/version exception - if an error was encountered indexing/migrating the file/version its stack is here

get_counts_by_status()

Returns a dictionary of counts by the migration status of each indexed file/version. Keys are as follows:

  • INDEXED - the file/version has been indexed and will be migrated on a call to migrate_indexed_files

  • MIGRATED - the file/version has been migrated

  • ALREADY_MIGRATED - the file/version was already stored at the target storage location and no migration is needed

  • ERRORED - an error occurred while indexing or migrating the file/version

get_migrations()

A generator yielding each file/version in the migration index. A dictionary of the properties of the migration row is yielded as follows:

id - the Synapse id type - the concrete type of the entity version - the verson of the file entity (if applicable) row_id - the row of the table attached file (if applicable) col_id - the column id of the table attached file (if applicable) from_storage_location_id - the previous storage location id where the file/version was stored from_file_handle_id - the id file handle of the existing file/version to_file_handle_id - if migrated, the new file handle id status - one of INDEXED, MIGRATED, ALREADY_MIGRATED, ERRORED indicating the status of the file/version exception - if an error was encountered indexing/migrating the file/version its stack is here

synapseutils.migrate_functions.index_files_for_migration(syn: synapseclient.client.Synapse, entity, dest_storage_location_id: str, db_path: str, source_storage_location_ids: Optional[Iterable[str]] = None, file_version_strategy='new', include_table_files=False, continue_on_error=False)

Index the given entity for migration to a new storage location. This is the first step in migrating an entity to a new storage location using synapseutils.

This function will create a sqlite database at the given db_path that can be subsequently passed to the migrate_indexed_files function for actual migration. This function itself does not modify the given entity in any way.

Parameters
  • syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()

  • entity – A Synapse entity whose files should be migrated. Can be a Project, Folder, File entity, or Table entity. If it is a container (a Project or Folder) its contents will be recursively indexed.

  • dest_storage_location_id – The id of the new storage location to be migrated to.

  • db_path – A path on disk where a sqlite db can be created to store the contents of the created index.

  • source_storage_location_ids – An optional iterable of storage location ids that will be migrated. If provided, files outside of one of the listed storage locations will not be indexed for migration. If not provided, then all files not already in the destination storage location will be indexed for migrated.

  • file_version_strategy

    One of “new” (default), “all”, “latest”, “skip” as follows:

    • ”new” - will create a new version of file entities in the new storage location, leaving existing versions unchanged

    • ”all” - all existing versions will be migrated in place to the new storage location

    • ”latest” - the latest version will be migrated in place to the new storage location

    • ”skip” - skip migrating file entities. use this e.g. if wanting to e.g. migrate table attached files in a container while leaving the files unchanged

  • include_table_files – Whether to migrate files attached to tables. If False (default) then e.g. only file entities in the container will be migrated and tables will be untouched.

  • continue_on_error – Whether any errors encountered while indexing an entity (access etc) will be raised or instead just recorded in the index while allowing the index creation to continue. Default is False (any errors are raised).

Returns

A MigrationResult object that can be used to inspect the contents of the index or output the index to a CSV for manual inspection.

synapseutils.migrate_functions.migrate_indexed_files(syn: synapseclient.client.Synapse, db_path: str, create_table_snapshots=True, continue_on_error=False, force=False)

Migrate files previously indexed in a sqlite database at the given db_path using the separate index_files_for_migration function. The files listed in the index will be migrated according to the configuration of that index.

Parameters
  • syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()

  • db_path – A path on disk where a sqlite db was created using the index_files_for_migration function.

  • create_table_snapshots – When updating the files in any table, whether the a snapshot of the table is first created (default True).

  • continue_on_error – Whether any errors encountered while migrating will be raised or instead just recorded in the sqlite database while allowing the migration to continue. Default is False (any errors are raised).

  • force – If running in an interactive shell, migration requires an interactice confirmation. This can be bypassed by using the force=True option.

Returns

A MigrationResult object that can be used to inspect the results of the migration.