Overview¶
The synapseutils
package provides both higher level functions as well as utilities for interacting with
Synapse. These functionalities include:
copy¶
-
synapseutils.copy_functions.
changeFileMetaData
(syn, entity, downloadAs=None, contentType=None)¶ - Parameters
entity – Synapse entity Id or object
contentType – Specify content type to change the content type of a filehandle
downloadAs – Specify filename to change the filename of a filehandle
- Returns
Synapse Entity
Can be used to change the filename or the file content-type without downloading:
file_entity = syn.get(synid) print(os.path.basename(file_entity.path)) ## prints, e.g., "my_file.txt" file_entity = synapseutils.changeFileMetaData(syn, file_entity, "my_new_name_file.txt")
-
synapseutils.copy_functions.
copy
(syn, entity, destinationId, skipCopyWikiPage=False, skipCopyAnnotations=False, **kwargs)¶ This function will assist users in copying entities (Tables, Links, Files, Folders, Projects), and will recursively copy everything in directories.
A Mapping of the old entities to the new entities will be created and all the wikis of each entity will also be copied over and links to synapse Ids will be updated.
- Parameters
syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse
entity – A synapse entity ID
destinationId – Synapse ID of a folder/project that the copied entity is being copied to
skipCopyWikiPage – Skip copying the wiki pages Default is False
skipCopyAnnotations – Skips copying the annotations Default is False
Examples:: import synapseutils import synapseclient syn = synapseclient.login() synapseutils.copy(syn, …)
Examples and extra parameters unique to each copy function – COPYING FILES
- Parameters
version – Can specify version of a file. Default to None
updateExisting – When the destination has an entity that has the same name, users can choose to update that entity. It must be the same entity type Default to False
setProvenance – Has three values to set the provenance of the copied entity: traceback: Sets to the source entity existing: Sets to source entity’s original provenance (if it exists) None: No provenance is set
- Examples::
synapseutils.copy(syn, “syn12345”, “syn45678”, updateExisting=False, setProvenance = “traceback”,version=None)
– COPYING FOLDERS/PROJECTS
- Parameters
excludeTypes – Accepts a list of entity types (file, table, link) which determines which entity types to not copy. Defaults to an empty list.
Examples:: #This will copy everything in the project into the destinationId except files and tables. synapseutils.copy(syn, “syn123450”,”syn345678”,excludeTypes=[“file”,”table”])
- Returns
a mapping between the original and copied entity: {‘syn1234’:’syn33455’}
-
synapseutils.copy_functions.
copyFileHandles
(syn, fileHandles, associateObjectTypes, associateObjectIds, newContentTypes=None, newFileNames=None)¶ Given a list of fileHandle Ids or Objects, copy the fileHandles
- Parameters
syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()
fileHandles – List of fileHandle Ids or Objects
associateObjectTypes – List of associated object types: FileEntity, TableEntity, WikiAttachment, UserProfileAttachment, MessageAttachment, TeamAttachment, SubmissionAttachment, VerificationSubmission (Must be the same length as fileHandles)
associateObjectIds – List of associated object Ids: If copying a file, the objectId is the synapse id, and if copying a wiki attachment, the object id is the wiki subpage id. (Must be the same length as fileHandles)
newContentTypes – (Optional) List of content types. Set each item to a new content type for each file handle, or leave the item as None to keep the original content type. Default None, which keeps all original content types.
newFileNames – (Optional) List of filenames. Set each item to a new filename for each file handle, or leave the item as None to keep the original name. Default None, which keeps all original file names.
- Returns
List of batch filehandle copy results, can include failureCodes: UNAUTHORIZED and NOT_FOUND
- Raises
ValueError – If length of all input arguments are not the same
-
synapseutils.copy_functions.
copyWiki
(syn, entity, destinationId, entitySubPageId=None, destinationSubPageId=None, updateLinks=True, updateSynIds=True, entityMap=None)¶ Copies wikis and updates internal links
- Parameters
syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse
entity – A synapse ID of an entity whose wiki you want to copy
destinationId – Synapse ID of a folder/project that the wiki wants to be copied to
updateLinks – Update all the internal links. (e.g. syn1234/wiki/34345 becomes syn3345/wiki/49508) Defaults to True
updateSynIds – Update all the synapse ID’s referenced in the wikis. (e.g. syn1234 becomes syn2345) Defaults to True but needs an entityMap
entityMap – An entity map {‘oldSynId’,’newSynId’} to update the synapse IDs referenced in the wiki. Defaults to None
entitySubPageId – Can specify subPageId and copy all of its subwikis Defaults to None, which copies the entire wiki subPageId can be found: https://www.synapse.org/#!Synapse:syn123/wiki/1234 In this case, 1234 is the subPageId.
destinationSubPageId – Can specify destination subPageId to copy wikis to Defaults to None
- Returns
A list of Objects with three fields: id, title and parentId.
walk¶
-
synapseutils.walk.
walk
(syn, synId)¶ Traverse through the hierarchy of files and folders stored under the synId. Has the same behavior as os.walk()
- Parameters
syn – A synapse object: syn = synapseclient.login()- Must be logged into synapse
synId – A synapse ID of a folder or project
Example:
walkedPath = walk(syn, "syn1234") for dirpath, dirname, filename in walkedPath: print(dirpath) print(dirname) #All the folders in the directory path print(filename) #All the files in the directory path
sync¶
-
synapseutils.sync.
generateManifest
(syn, allFiles, filename, provenance_cache=None)¶ Generates a manifest file based on a list of entities objects.
- Parameters
allFiles – A list of File Entities
filename – file where manifest will be written
provenance_cache – an optional dict of known provenance dicts keyed by entity ids
-
synapseutils.sync.
readManifestFile
(syn, manifestFile)¶ Verifies a file manifest and returns a reordered dataframe ready for upload.
- Parameters
syn – A synapse object as obtained with syn = synapseclient.login()
manifestFile – A tsv file with file locations and metadata to be pushed to Synapse. See below for details
:returns A pandas dataframe if the manifest is validated.
- See also for a description of the file format:
-
synapseutils.sync.
syncFromSynapse
(syn, entity, path=None, ifcollision='overwrite.local', allFiles=None, followLink=False, manifest='all', downloadFile=True)¶ Synchronizes all the files in a folder (including subfolders) from Synapse and adds a readme manifest with file metadata.
- Parameters
syn – A synapse object as obtained with syn = synapseclient.login()
entity – A Synapse ID, a Synapse Entity object of type file, folder or project.
path – An optional path where the file hierarchy will be reproduced. If not specified the files will by default be placed in the synapseCache.
ifcollision – Determines how to handle file collisions. Maybe “overwrite.local”, “keep.local”, or “keep.both”. Defaults to “overwrite.local”.
followLink – Determines whether the link returns the target Entity. Defaults to False
manifest – Determines whether creating manifest file automatically. The optional values here (“all”, “root”, “suppress”).
- :param downloadFile Determines whether downloading the files.
Defaults to True
- Returns
list of entities (files, tables, links)
This function will crawl all subfolders of the project/folder specified by entity and download all files that have not already been downloaded. If there are newer files in Synapse (or a local file has been edited outside of the cache) since the last download then local the file will be replaced by the new file unless “ifcollision” is changed.
If the files are being downloaded to a specific location outside of the Synapse cache a file (SYNAPSE_METADATA_MANIFEST.tsv) will also be added in the path that contains the metadata (annotations, storage location and provenance of all downloaded files).
See also: -
synapseutils.sync.syncToSynapse()
Example: Download and print the paths of all downloaded files:
entities = syncFromSynapse(syn, "syn1234") for f in entities: print(f.path)
-
synapseutils.sync.
syncToSynapse
(syn, manifestFile, dryRun=False, sendMessages=True, retries=4)¶ Synchronizes files specified in the manifest file to Synapse
- Parameters
syn – A synapse object as obtained with syn = synapseclient.login()
manifestFile – A tsv file with file locations and metadata to be pushed to Synapse. See below for details
dryRun – Performs validation without uploading if set to True (default is False)
Given a file describing all of the uploads uploads the content to Synapse and optionally notifies you via Synapse messagging (email) at specific intervals, on errors and on completion.
Manifest file format
The format of the manifest file is a tab delimited file with one row per file to upload and columns describing the file. The minimum required columns are path and parent where path is the local file path and parent is the Synapse Id of the project or folder where the file is uploaded to. In addition to these columns you can specify any of the parameters to the File constructor (name, synapseStore, contentType) as well as parameters to the syn.store command (used, executed, activityName, activityDescription, forceVersion). Used and executed can be semi-colon (“;”) separated lists of Synapse ids, urls and/or local filepaths of files already stored in Synapse (or being stored in Synapse by the manifest). Any additional columns will be added as annotations.
Required fields:
Field
Meaning
Example
path
local file path or URL
/path/to/local/file.txt
parent
synapse id
syn1235
Common fields:
Field
Meaning
Example
name
name of file in Synapse
Example_file
forceVersion
whether to update version
False
Provenance fields:
Field
Meaning
Example
used
List of items used to generate file
syn1235; /path/to_local/file.txt
executed
List of items exectued
https://github.org/; /path/to_local/code.py
activityName
Name of activity in provenance
“Ran normalization”
activityDescription
Text description on what was done
“Ran algorithm xyx with parameters…”
Annotations:
Annotations:
Any columns that are not in the reserved names described above will be interpreted as annotations of the file
Other optional fields:
Field
Meaning
Example
synapseStore
Boolean describing whether to upload files
True
contentType
content type of file to overload defaults
text/html
Example manifest file
path
parent
annot1
annot2
used
executed
/path/file1.txt
syn1243
“bar”
3.1415
“syn124; /path/file2.txt”
/path/file2.txt
syn12433
“baz”
2.71
“”
monitor¶
-
synapseutils.monitor.
notifyMe
(syn, messageSubject='', retries=0)¶ Function decorator that notifies you via email whenever an function completes running or there is a failure.
- Parameters
syn – A synapse object as obtained with syn = synapseclient.login()
messageSubject – A string with subject line for sent out messages.
retries – Number of retries to attempt on failure (default=0)
Example:
# to decorate a function that you define from synapseutils import notifyMe import synapseclient syn = synapseclient.login() @notifyMe(syn, 'Long running function', retries=2) def my_function(x): doing_something() return long_runtime_func(x) my_function(123) ############################# # to wrap a function that already exists from synapseutils import notifyMe import synapseclient syn = synapseclient.login() notify_decorator = notifyMe(syn, 'Long running query', retries=2) my_query = notify_decorator(syn.tableQuery) results = my_query("select id from syn1223") #############################
-
synapseutils.monitor.
with_progress_bar
(func, totalCalls, prefix='', postfix='', isBytes=False)¶ Wraps a function to add a progress bar based on the number of calls to that function.
- Parameters
func – Function being wrapped with progress Bar
totalCalls – total number of items/bytes when completed
prefix – String printed before progress bar
prefix – String printed after progress bar
isBytes – A boolean indicating weather to convert bytes to kB, MB, GB etc.
- Returns
a wrapped function that contains a progress bar
migrate¶
-
class
synapseutils.migrate_functions.
MigrationResult
(syn, db_path)¶ A MigrationResult is a proxy object to the underlying sqlite db. It provides a programmatic interface that allows the caller to iterate over the file handles that were migrated without having to connect to or know the schema of the sqlite db, and also avoids the potential memory liability of putting everything into an in memory data structure that could be a liability when migrating a huge project of hundreds of thousands/millions of entities.
As this proxy object is not thread safe since it accesses an underlying sqlite db.
-
as_csv
(path)¶ Output a flat csv file of the contents of the Migration index. Its columns are as follows:
id - the Synapse id type - the concrete type of the entity version - the verson of the file entity (if applicable) row_id - the row of the table attached file (if applicable) col_name - the column name of the column the table attached file resides in (if applicable) from_storage_location_id - the previous storage location id where the file/version was stored from_file_handle_id - the id file handle of the existing file/version to_file_handle_id - if migrated, the new file handle id status - one of INDEXED, MIGRATED, ALREADY_MIGRATED, ERRORED indicating the status of the file/version exception - if an error was encountered indexing/migrating the file/version its stack is here
-
get_counts_by_status
()¶ Returns a dictionary of counts by the migration status of each indexed file/version. Keys are as follows:
INDEXED - the file/version has been indexed and will be migrated on a call to migrate_indexed_files
MIGRATED - the file/version has been migrated
ALREADY_MIGRATED - the file/version was already stored at the target storage location and no migration is needed
ERRORED - an error occurred while indexing or migrating the file/version
-
get_migrations
()¶ A generator yielding each file/version in the migration index. A dictionary of the properties of the migration row is yielded as follows:
id - the Synapse id type - the concrete type of the entity version - the verson of the file entity (if applicable) row_id - the row of the table attached file (if applicable) col_id - the column id of the table attached file (if applicable) from_storage_location_id - the previous storage location id where the file/version was stored from_file_handle_id - the id file handle of the existing file/version to_file_handle_id - if migrated, the new file handle id status - one of INDEXED, MIGRATED, ALREADY_MIGRATED, ERRORED indicating the status of the file/version exception - if an error was encountered indexing/migrating the file/version its stack is here
-
-
synapseutils.migrate_functions.
index_files_for_migration
(syn: synapseclient.client.Synapse, entity, dest_storage_location_id: str, db_path: str, source_storage_location_ids: Optional[Iterable[str]] = None, file_version_strategy='new', include_table_files=False, continue_on_error=False)¶ Index the given entity for migration to a new storage location. This is the first step in migrating an entity to a new storage location using synapseutils.
This function will create a sqlite database at the given db_path that can be subsequently passed to the migrate_indexed_files function for actual migration. This function itself does not modify the given entity in any way.
- Parameters
syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()
entity – A Synapse entity whose files should be migrated. Can be a Project, Folder, File entity, or Table entity. If it is a container (a Project or Folder) its contents will be recursively indexed.
dest_storage_location_id – The id of the new storage location to be migrated to.
db_path – A path on disk where a sqlite db can be created to store the contents of the created index.
source_storage_location_ids – An optional iterable of storage location ids that will be migrated. If provided, files outside of one of the listed storage locations will not be indexed for migration. If not provided, then all files not already in the destination storage location will be indexed for migrated.
file_version_strategy –
One of “new” (default), “all”, “latest”, “skip” as follows:
”new” - will create a new version of file entities in the new storage location, leaving existing versions unchanged
”all” - all existing versions will be migrated in place to the new storage location
”latest” - the latest version will be migrated in place to the new storage location
”skip” - skip migrating file entities. use this e.g. if wanting to e.g. migrate table attached files in a container while leaving the files unchanged
include_table_files – Whether to migrate files attached to tables. If False (default) then e.g. only file entities in the container will be migrated and tables will be untouched.
continue_on_error – Whether any errors encountered while indexing an entity (access etc) will be raised or instead just recorded in the index while allowing the index creation to continue. Default is False (any errors are raised).
- Returns
A MigrationResult object that can be used to inspect the contents of the index or output the index to a CSV for manual inspection.
-
synapseutils.migrate_functions.
migrate_indexed_files
(syn: synapseclient.client.Synapse, db_path: str, create_table_snapshots=True, continue_on_error=False, force=False)¶ Migrate files previously indexed in a sqlite database at the given db_path using the separate index_files_for_migration function. The files listed in the index will be migrated according to the configuration of that index.
- Parameters
syn – A Synapse object with user’s login, e.g. syn = synapseclient.login()
db_path – A path on disk where a sqlite db was created using the index_files_for_migration function.
create_table_snapshots – When updating the files in any table, whether the a snapshot of the table is first created (default True).
continue_on_error – Whether any errors encountered while migrating will be raised or instead just recorded in the sqlite database while allowing the migration to continue. Default is False (any errors are raised).
force – If running in an interactive shell, migration requires an interactice confirmation. This can be bypassed by using the force=True option.
- Returns
A MigrationResult object that can be used to inspect the results of the migration.