Starfish external API (v1.0)

Download OpenAPI specification:Download

Note: This describes the Starfish API

Authentication

Starfish uses Bearer authentication.

Although Bearer authentication is commonly used with OAuth 2.0, Starfish API doesn't use OAuth 2.0.
Bearer authentication is also called token authentication.

To use the API you need to obtain a token first.
To obtain a token use auth endpoint.
Note that this token is valid for 16 hours.
The validity period can be configured in the config auth.auth_token_timeout_secs.

The token consists of 3 parts separated by a colon: token_ver:token_id:token_secret,
for example sf-api-v1:lr9Cnex0za:AVj8w19TMhMjjVbEHse3EPeeq1TnuLuCXK6IHxzCzls.\

  • token_ver, version of the API, for now, it is sf-api-v1.\
  • token_id for the above example it is lr9Cnex0za. It identifies the token.\
  • token_secret in this case it is AVj8w19TMhMjjVbEHse3EPeeq1TnuLuCXK6IHxzCzls.
    In a clear form, token_secret is known only for the owner of the token.

The token then needs to be used in all API calls in Authorization header in HTTP request:

POST /api/... HTTP/1.1
Host: starfish.com
Authorization: Bearer TOKEN
Content-Type: application/json

archive

Archive files

Create and start an archive job

Returns archive job object. Single archive job may contain multiple low-level jobs.

Authorizations:
starfishAuth
Request Body schema: application/json
volume_and_path
string

volume and path in as volume:path

archive_target_name
string

Archive target name defined with /api/archive/target API

dest_path
string

Destination dir path appended to archive_target dest_path

migrate
boolean
Default: false

Remove files from source after copy to archive. If alias remove_source is given, the result is calculated as migrate OR remove_source. If the result is False, the option is ignored.

object (job.options.response)
remove_source
boolean
Default: false

Alias to migrate.

remove_empty_dirs
boolean
Default: false

For each removed file remove also parent directory and in case of success it will be recursive up to the job root (job root will not be removed if it's volume root). Using this option without migrate is prohibited.

generate_manifest
boolean or null

Generate a manifest file for an upload/copy job. If not defined, the value for upload/copy command will be used (true by default). If set, it overwrites settings per upload/copy command and global default set by dispatcher.generate_manifest.

query
string

If defined here will overwrite query filters from url params

compression_type
string
Enum: "gzip" "xz"

During upload to object store compress file contents. Ignored if used without compression_type.

compression_level
integer

Set compression level. Defaults to 1 for xz and 6 for gzip.

overwrite
string
Enum: "never" "success_if_identical" "older" "always"

Allowed only when archiving to another volume. See overwrite option from restore start command_options for details.

storage_class
string

Allowed only for S3 and Azure archive targets. Override 'storage_class' parameter from archive targets.

parallel_upload_count
integer

When the size of the object being uploaded exceeds the Starfish multi-part upload size (default ~100MB), uploads will be split across simultaneous threads (default: 4). Example: for --workers-per-agent 8 and --parallel-upload-count 4, there will be up to 32 distinct uploads happening per agent, with 4-way parallelism internal to each worker.

no_sparse
boolean

Allowed only for volume archive targets. Restore sparse files as non-sparse. This is default on Windows volumes, because restoring sparse files is not supported there.

inplace
boolean

Allowed only for volume archive targets. See inplace option See inplace option from restore start command_options for details.

command_verbose
boolean

Allowed only for object store archive target. Run low level command (upload) with debug log level.

dedup
boolean
Default: false

deduplicate files with same contents to object store (uses md5 of file content as uploaded object name). This option works for cloud storage only and is exclusive with 'tar'.

part_size
string

Minimum part size for multipart uploads to object store. Files with size greater that that value would be uploaded using multipart upload. Files with size up to "min_part_size" could be uploaded with single request and depending on another config parameters whole file could be loaded into memory.

min_part_size
string

Alias to 'part_size' for backward compatibility. Please use 'part_size' instead of that option.

tar
boolean
Default: false

upload tar.gz archive of input files to object store instead of individual files. This option is exclusive with 'dedup'.

verbose
boolean
Default: false

run job command with DEBUG log level

prescan_enabled
boolean
Default: true

Enable filesystem prescanning

prescan_type
string
Default: "diff"
Enum: "diff" "sync" "mtime"

Change prescan type

from_scratch
boolean
Default: false

Force job to run archive on all matching entries, even if they are already archived

job_name
string

Use that name for copy job. Copy job with the same job_name will be run only on changed entries unless --from-scratch is given. New results will override results from previous job with the same name.

workers_per_agent
Array of strings

Number of workers to be run on agent that can run that job. Element in that list could be just a number then it applies as default to all agents but it could be also a string in form agent_address=number

entries_from_file
string or null (entries_from_file_enum)
Enum: "paths" "sfids"

Determines the type of entries passed in file. Supported by restore job. Other jobs expects paths only.

hard_links
boolean
Default: false

Preserve hardlinks when copying files between volumes (ignored for other types of archive targets). Linux only.

Responses

Request samples

Content type
application/json
{
  • "volume_and_path": "usr:path",
  • "archive_target_name": "fake-s3",
  • "dest_path": "my/dir",
  • "migrate": true,
  • "options": {
    },
  • "remove_source": false,
  • "remove_empty_dirs": false,
  • "generate_manifest": true,
  • "query": "string",
  • "compression_type": "gzip",
  • "compression_level": 0,
  • "overwrite": "never",
  • "storage_class": "string",
  • "parallel_upload_count": 0,
  • "no_sparse": true,
  • "inplace": true,
  • "command_verbose": true,
  • "dedup": false,
  • "part_size": "10MiB",
  • "min_part_size": "string",
  • "tar": false,
  • "verbose": false,
  • "prescan_enabled": true,
  • "prescan_type": "diff",
  • "from_scratch": false,
  • "job_name": "string",
  • "workers_per_agent": [],
  • "entries_from_file": "paths",
  • "hard_links": false
}

Response samples

Content type
application/json
{
  • "href": "/api/archive/job/123",
  • "volume_and_path": "projects:dir1/dir2",
  • "target_id": 0,
  • "target_name": "string",
  • "target_info": {
    },
  • "query": "string",
  • "archiving_options": {
    },
  • "low_level_jobs": {
    },
  • "stats": {
    },
  • "id": 0,
  • "status": "STARTING",
  • "state": {
    },
  • "creation_time": 1593093530.123456,
  • "creation_time_hum": "2020-06-25 15:58:50",
  • "end_time": 1593093600.123456,
  • "end_time_hum": "2020-06-25 16:00:00",
  • "duration": 70,
  • "duration_hum": "1m10s",
  • "created_by_id": 1,
  • "created_by": {
    },
  • "created_by_hum": "Alice (uid=12)"
}

List archive jobs

Authorizations:
starfishAuth
query Parameters
status
Array of strings

Job status(es), either a list of statuses or a single status as a string. Cannot be used together with running.

running
boolean

if set to true, lists only running jobs; if set to false, lists only not-running (completed) jobs; cannot be used together with status

requested_by
Array of strings

created by a given entity, either a list of entities or a single entity as a string. For example 'gui', 'client', 'scheduler' etc.

creation_time
string

Supports FROM-TO and RELATIVE formats;

FROM-TO:

'# hour|day|week|month|year(s) ago' or 'YYYYMMDD[HHMM[SS]]' or 'now' or 'inf', for example:

  • 'inf-3 months ago'
  • '9 months ago-now'
  • '20200101-202003251314'

RELATIVE:

'[+|-]N[y|m|w|d|h]', meaning a number of years, months, weeks, days (default) or hours, for example:

  • '+4h': more (older) than 5 hours ago
  • '1d' (or '1'): between 2 days ago (48 hours) and 1 day (24 hours) ago
  • '-2y': less (newer) than 2 years ago
end_time
string

the same as creation_time

sort_by
string
Enum: "archive_target_name" "command" "creation_time" "dest_path" "end_time" "id" "path" "phase" "query" "reason" "requested_by" "status" "target_name" "volume_and_path" "volume_id" "volume_name"
Example: sort_by=archive_target_name +command

Sort by given fields. Multiple fields should be separated with some whitespace or comma. Each field could be prefixed with '+' or '-' to sort ascending or descending (default is ascending). By default, results are sorted by id, but the limit is applied descending. If limit is also specified, results are sorted first and then the limit is applied.

limit
integer

Maximum number of returned jobs

paging_offset
integer

Parameter that describes paging offset. It should be equal to number of entries that have been already printed on the previous pages.

For example:

  • paging_offset=0 and limit=2: two first entries will be printed,
  • paging_offset=4 and limit=2: 5th and 6th entry will be printed,
  • paging_offset=7 and limit=3: 8th, 9th and 10th entry will be printed.

With paged result comes field next_page_params, that describes parameters that should be passed with next request to retrieve the next page.

add_paging_params_to_response
boolean
Default: false

A flag specifying whether to include paging params in response.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

created_by_username
Array of strings

Only jobs created by user with given username will be taken into account. Request may specify more then one name.

created_by_uid
Array of strings

Only jobs created by user with given UID will be taken into account. Request may specify more then one user id.

Responses

Response samples

Content type
application/json
{
  • "archive_jobs": [
    ],
  • "next_page_params": {
    },
  • "matched_archive_jobs_count": 70
}

Delete archive jobs matching filter

Authorizations:
starfishAuth
query Parameters
volume_id
required
integer

Volume id that archive jobs are related to

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get archive job by id

Authorizations:
starfishAuth
path Parameters
archive_job_id
required
integer

ID of the archive job

Responses

Response samples

Content type
application/json
{
  • "href": "/api/archive/job/123",
  • "volume_and_path": "projects:dir1/dir2",
  • "target_id": 0,
  • "target_name": "string",
  • "target_info": {
    },
  • "query": "string",
  • "archiving_options": {
    },
  • "low_level_jobs": {
    },
  • "stats": {
    },
  • "id": 0,
  • "status": "STARTING",
  • "state": {
    },
  • "creation_time": 1593093530.123456,
  • "creation_time_hum": "2020-06-25 15:58:50",
  • "end_time": 1593093600.123456,
  • "end_time_hum": "2020-06-25 16:00:00",
  • "duration": 70,
  • "duration_hum": "1m10s",
  • "created_by_id": 1,
  • "created_by": {
    },
  • "created_by_hum": "Alice (uid=12)"
}

Stop archive job with given id

Authorizations:
starfishAuth
path Parameters
archive_job_id
required
integer

ID of the archive job

Responses

Delete all archive jobs matching filter

Deletes all the archive jobs that corresponds to any of the filters specified.

Authorizations:
starfishAuth
query Parameters
volume_id
integer

numerical id of volume

Responses

Upload file with entries for archive job with given id

File should consist of arbitrary number of file/dir paths separated with \0. Paths should be relative to archive job root_path. entries_from_file argument mustpassed in archive job start request must be paths. Job will start only after entries file is uploaded or will set status to timeout when file was not uploaded or the uploading took too long. Note that SFIDs are not yet supported in this case.

Authorizations:
starfishAuth
path Parameters
archive_job_id
required
integer

ID of the archive job

Request Body schema: application/binary
object (job_entries_file)

Responses

Create a new archive target

Returns id of the created target

Authorizations:
starfishAuth
Request Body schema: application/json
name
required
string
type
required
string (target_type)
Enum: "azure" "s3" "swift" "volume"

Type of storage

dst_path
string

Dest path under which files will be stored

required
storage_azure (object) or storage_s3 (object) or storage_swift (object) or (storage_volume (storage_volume (object) or storage_volume (object))) (one_of_storages)
verify
boolean
Default: true

Responses

Request samples

Content type
application/json
{
  • "verify": false
}

Response samples

Content type
application/json
{
  • "id": 71,
  • "href": "/api/archive/target/71"
}

List archive targets

Authorizations:
starfishAuth
query Parameters
obfuscate
boolean
Default: true

If enabled then fields like passwords or secret keys will be replaced with *** (see do_not_obfuscate_fields parameter for exceptions). If requested with obfuscate=False but without permissions then request will fail with 403 HTTP code.

Note that for backwards compatibility reasons if deprecated Basic HTTP authorization is used, obfuscate=False is the default.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

name
string
Default: ""

Name filter. If provided only archive target with exact name will be returned.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get archive target by id

Authorizations:
starfishAuth
path Parameters
target_id
required
integer

ID of the archive target

query Parameters
obfuscate
boolean
Default: true

If enabled then fields like passwords or secret keys will be replaced with *** (see do_not_obfuscate_fields parameter for exceptions). If requested with obfuscate=False but without permissions then request will fail with 403 HTTP code.

Note that for backwards compatibility reasons if deprecated Basic HTTP authorization is used, obfuscate=False is the default.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

Responses

Response samples

Content type
application/json
{
  • "id": 71,
  • "href": "/api/archive/target/71"
}

Replace the parameters of archive target

This call is dangerous as changing storage parameters can make already archived data unable to restore.

Authorizations:
starfishAuth
path Parameters
target_id
required
integer

ID of the archive target

Request Body schema: application/json
verify
boolean
Default: true

If service should verify connection to object store

required
storage_azure (object) or storage_s3 (object) or storage_swift (object) or (storage_volume (storage_volume (object) or storage_volume (object))) (one_of_storages)

Responses

Request samples

Content type
application/json
{
  • "verify": true,
  • "params": {
    }
}

Response samples

Content type
application/json
{
  • "id": 71,
  • "href": "/api/archive/target/71"
}

Delete archive target by id

Deletes archive target. Together archive jobs and jobs results related to specified target are also deleted. Data archived to the deleted archive target is not deleted but is no longer accessible by Starfish (data archived to volume is an exception which is restorable as long as the volume exists)

Authorizations:
starfishAuth
path Parameters
target_id
required
integer

ID of the archive target

Responses

auth

Authenticate users

List all tokens

Authorizations:
starfishAuth
query Parameters
username
string

Username that exists in the Starfish DB. If specified, only matches tokens for this user.

show_auto_generated
boolean
Default: false

If true, also displays auto-generated tokens (not displayed by default).

show_expired
boolean
Default: false

If true, also displays expired tokens (not displayed by default).

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Authenticate user in Starfish

Credentials are verified against PAM. In order to get access the user has to be a member of a group specified by auth.user_group_name or auth.admin_group_name config property (default: starfish-users and starfish). Members of the latter group are given superuser access to Starfish. Token is valid for 16 hours. The validity period can be configured in the config auth.auth_token_timeout_secs.

Request Body schema: application/json
username
required
string
password
required
string
token_timeout_secs
integer
token_description
string

Responses

Request samples

Content type
application/json
{
  • "username": "string",
  • "password": "string",
  • "token_timeout_secs": 0,
  • "token_description": "string"
}

Response samples

Content type
application/json
{
  • "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9",
  • "superuser": true,
  • "zone_manager": true
}

Deletes tokens from DB matching given filters

Currently, only deleting by username is supported and the username is a mandatory data.

Authorizations:
starfishAuth
query Parameters
username
required
string

Responses

Response samples

Content type
application/json
{
  • "deleted_tokens_count": 10
}

Get details of a token

Authorizations:
starfishAuth
path Parameters
token_id
required
string
Example: FGzQiO4leR

Token ID / public key

Responses

Response samples

Content type
application/json
{
  • "auto_generated": false,
  • "creation_time": 1656979234,
  • "creation_time_hum": "2022-07-05 02:00:34",
  • "description": "alice's api key",
  • "id": 62,
  • "public_key": "FGzQiO4leR",
  • "user": {
    },
  • "valid_until": 1703977203,
  • "valid_until_hum": "2023-12-31 00:00:03"
}

Deletes given token from DB.

Authorizations:
starfishAuth
path Parameters
token_id
required
string
Example: FGzQiO4leR

Token ID / public key

Responses

Update user token with new validity period.

Authorizations:
starfishAuth
path Parameters
token_id
required
string
Example: FGzQiO4leR

Token ID / public key

Request Body schema: application/json
valid_until
number <float>

Responses

Request samples

Content type
application/json
{
  • "valid_until": 1656949329.642527
}

Response samples

Content type
application/json
{
  • "valid_until": 1656949329.642527
}

Get information about authenticated user

Authorizations:
starfishAuth

Responses

Response samples

Content type
application/json
{
  • "id": 5000,
  • "username": "bob"
}

List all user parameters

Authorizations:
starfishAuth

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get user parameter

Authorizations:
starfishAuth
path Parameters
user_param_name
required
string
Example: favourite_paths

User parameter name.

Responses

Response samples

Content type
application/json
{
  • "name": "favourite_paths",
  • "value": {
    }
}

Update user parameter.

JSON Patch is described in RFC 6902 (https://datatracker.ietf.org/doc/html/rfc6902/). The copy and move operations have not been implemented.

A JSON Patch request body is a JSON document that represents an array of objects. Each object represents a single operation to be applied to the target JSON document.

Authorizations:
starfishAuth
path Parameters
user_param_name
required
string
Example: favourite_paths

User parameter name.

Request Body schema: application/json-patch+json
Array
op
string

Operation, one of 'replace', 'add', 'remove' or 'test'

path
string

A string containing a JSON-Pointer value described in RFC 6901 (https://datatracker.ietf.org/doc/html/rfc6901/) that references a location within the target document (the "target location") where the operation is performed.

Allowed characters are reduced to the upper and lower case letters, numbers and the underscore sign.

The single character "-" is used to append new elements to JSON array.

value
object

Object relevant to json path

Responses

Request samples

Content type
application/json-patch+json
[
  • {
    },
  • {
    },
  • {
    },
  • {
    }
]

Response samples

Content type
application/json
{
  • "name": "favourite_paths",
  • "value": {
    }
}

Get user parameter json schema

JSON Schema is the vocabulary that enables JSON data consistency, validity, and interoperability (https://json-schema.org/).

Authorizations:
starfishAuth
path Parameters
user_param_name
required
string
Example: favourite_paths

User parameter name.

Responses

Response samples

Content type
application/json
{
  • "name": "gui_general",
  • "json_schema": {
    }
}

config

Starfish configuration info

Get starfish configuration

Authorizations:
starfishAuth
query Parameters
obfuscate
boolean
Default: true

If enabled then fields like passwords or secret keys will be replaced with *** (see do_not_obfuscate_fields parameter for exceptions). If requested with obfuscate=False but without permissions then request will fail with 403 HTTP code.

Note that for backwards compatibility reasons if deprecated Basic HTTP authorization is used, obfuscate=False is the default.

do_not_obfuscate_fields
string
Default: "loki_password"
Example: do_not_obfuscate_fields=loki_password

List of comma separated fields that could be excluded from obfuscation. If user has no permissions to obtain mentioned field then it will be obfuscated anyway.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

Responses

Response samples

Content type
application/json
{
  • "config_service_url": "https://starfish_host",
  • "secret_key": "password",
  • "internal_logrotate_backup_days_count": 2,
  • "temp": { },
  • "agent": {
    },
  • "crawler": {
    },
  • "config": { },
  • "volumes": { },
  • "scans": { },
  • "dispatcher": { },
  • "cron": { },
  • "gateway": {
    },
  • "client": {
    },
  • "archive": { },
  • "archive:alt_metadata:XXX": {
    },
  • "auth": {
    },
  • "pg": { },
  • "pgloader": { },
  • "GUI": { },
  • "templates": {
    },
  • "default_agent_service_url": "https://sfagent:30002",
  • "starfish_url_prefix": "http://localhost",
  • "log_level": "DEBUG"
}

Get starfish public configuration (mainly URLs to service)

Responses

Response samples

Content type
application/json
{}

dir_metadata

Set metadata attached to directory like excludes or tree_out_of_sync_time

Change excludes assigned to directory entry

Excludes are divided into list of directory name patterns and list of file name patterns

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

path
required
string
Request Body schema: application/vnd.sf.dir_metadata.exclude+json
dir_excludes.add
Array of strings

list of name patterns which should be added to list of blocked directory names

dir_excludes.set
Array of strings

list of name patterns which should overwrite current list of blocked directory names

dir_excludes.delete
Array of strings

list of name patterns which should be removed from list of blocked directory names

file_excludes.add
Array of strings

list of name patterns which should be added to list of blocked file names

file_excludes.set
Array of strings

list of name patterns which should overwrite current list of blocked file names

file_excludes.delete
Array of strings

list of name patterns which should be removed from list of blocked file names

Responses

Request samples

Content type
application/vnd.sf.dir_metadata.exclude+json
{
  • "dir_excludes.add": [
    ],
  • "dir_excludes.set": [
    ],
  • "dir_excludes.delete": [
    ],
  • "file_excludes.add": [
    ],
  • "file_excludes.set": [
    ],
  • "file_excludes.delete": [
    ]
}

Response samples

Content type
application/json
{ }

job

Run low-level jobs on volumes

Start new job

Authorizations:
starfishAuth
Request Body schema: application/json

Job and first incarnation definition

object (incarnation)
object (job.input)

Responses

Request samples

Content type
application/json
{
  • "incarnation": {
    },
  • "job": {
    }
}

Response samples

Content type
application/json
{
  • "id": 1,
  • "long_id": "j_cmd_generating_f_20181010_1052_3",
  • "heartbeat": 1593093530.123456,
  • "ended_at": 1593093530.123456,
  • "started_at": 1593093421.123456,
  • "created_at": 1593093420.123456,
  • "duration": 10,
  • "workers_per_agent": [],
  • "duration_hum": "1m10s",
  • "avg_file_size": 8191.8,
  • "avg_file_size_hum": "8K",
  • "bandwidth": "string",
  • "batches_in_progress": [
    ],
  • "incarnations": [
    ],
  • "options": {
    },
  • "status": "starting",
  • "state": {
    },
  • "runtime": 0,
  • "runtime_hum": "string",
  • "est_total_bytes": 0,
  • "est_total_entries_num": 0,
  • "last_dispatcher_operation": "string",
  • "pause_interval": 0,
  • "reason_code": "string",
  • "reason_msg": "string",
  • "tag_results_with": "string",
  • "volume_id": 0,
  • "rerun_from_job_id": 0,
  • "query": [
    ],
  • "throttle_interval": 0,
  • "retry_count": 0,
  • "throttle_cmd": "string",
  • "prefetch_size": 0,
  • "prefetch_parallel_threads": 0,
  • "items_in_batch": 0,
  • "batch_size_bytes": 0,
  • "prescan_id": 0,
  • "pre_verify_ctime": true,
  • "post_verify_ctime": true,
  • "manifest_status": "not-requested",
  • "manifest_query_ids": [
    ],
  • "manifest_loc": "string",
  • "cmd_name": "string",
  • "current_incarnation": {
    },
  • "first_incarnation": {
    },
  • "fs_bytes_done": 69870585,
  • "fs_entries_pushed": 200,
  • "fs_entries_done": 150,
  • "fs_entries_failed": 50,
  • "fs_entries_temp_error": 250,
  • "fs_entries_timedout": 1,
  • "fs_entries_unprocessed": 0,
  • "bandwidth_seconds": 16399744.619680852,
  • "source": "home:user/projects",
  • "snapshot": "string",
  • "name": "string",
  • "allow_overlapping_job": false,
  • "batch_per_dir": false,
  • "batch_fields": {
    },
  • "cmd_output_format": "text",
  • "command": [
    ],
  • "entries_from_file": "paths",
  • "generate_manifest": false,
  • "ignore_results": false,
  • "post_verification": true,
  • "pre_verification": true,
  • "prescan_type": "diff",
  • "query_str": "ext jpg not uid 5",
  • "path_passing_method": "arg",
  • "requested_by": "string",
  • "root_path": "string",
  • "volume": "string",
  • "agent_fail_fast": true,
  • "agent_fail_fast_min_batches": 100,
  • "agent_fail_fast_threshold": 100,
  • "snapshot_glob": "string",
  • "sort_by": [
    ],
  • "group_by": [
    ],
  • "retry_entries": "grouped",
  • "limit": 0
}

Get list of jobs

Authorizations:
starfishAuth
query Parameters
name
string

List only jobs with specified job name. If name contains * or ? then it will return all jobs that match basic shell regex (* match any substring, ? match single char).

status
Array of strings

Job status(es), either a list of statuses or a single status as a string. Cannot be used together with running.

running
boolean

if set to true, lists only running jobs; if set to false, lists only not-running (completed) jobs; cannot be used together with status

requested_by
Array of strings

created by a given entity, either a list of entities or a single entity as a string. 'archive' ('restore') returns all low-level jobs started by archive (restore) jobs. To return low-level jobs started by a specific archive (restore) job with id=4, 'archive#4' ('restore#4') should be passed. Cannot be used together with requested_by_archive_or_restore.

requested_by_archive_or_restore
boolean

if set to true, lists only jobs created by either archive or restore jobs; if set to false, lists all other jobs, i.e. requested by for example 'gui', 'client' etc. Cannot be used together with requested_by.

created_at
string

Supports FROM-TO and RELATIVE formats;

FROM-TO:

'# hour|day|week|month|year(s) ago' or 'YYYYMMDD[HHMM[SS]]' or 'now' or 'inf', for example:

  • 'inf-3 months ago'
  • '9 months ago-now'
  • '20200101-202003251314'

RELATIVE:

'[+|-]N[y|m|w|d|h]', meaning a number of years, months, weeks, days (default) or hours, for example:

  • '+4h': more (older) than 5 hours ago
  • '1d' (or '1'): between 2 days ago (48 hours) and 1 day (24 hours) ago
  • '-2y': less (newer) than 2 years ago
ended_at
string

the same as created_at

long_id
Array of strings

long id of the job

num_id
Array of integers

numeric id of the job

string or Array of strings

source volume(s) name(s)

src_volume_id
integer

ID of job's source volume

root_path
string

root path on source volume

sort_by
string
Enum: "cmd_line" "command" "created_at" "current_incarnation_id" "dst_volume_id" "estimated_total_bytes" "estimated_total_entries_num" "heartbeat" "id" "incarnation_id" "name" "reason_code" "reason_msg" "requested_by" "started_at" "status" "tag_entry_failure_count" "volume_id" "volume_name"
Example: sort_by=created_at -name

Sort by given fields. Multiple fields should be separated with some whitespace or comma. Each field could be prefixed with '+' or '-' to sort ascending or descending (default is ascending). By default results are sorted by 'created_at' but the limit is applied descending. If limit is also specified, results are sorted first and then the limit is applied.

limit
integer

Maximum number of returned jobs

paging_offset
integer

Parameter that describes paging offset. It should be equal to number of entries that have been already printed on the previous pages.

For example:

  • paging_offset=0 and limit=2: two first entries will be printed,
  • paging_offset=4 and limit=2: 5th and 6th entry will be printed,
  • paging_offset=7 and limit=3: 8th, 9th and 10th entry will be printed.

With paged result comes field next_page_params, that describes parameters that should be passed with next request to retrieve the next page.

add_paging_params_to_response
boolean
Default: false

A flag specifying whether to include paging params in response.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

created_by_username
Array of strings

Only jobs created by user with given username will be taken into account. Request may specify more then one name.

created_by_uid
Array of strings

Only jobs created by user with given UID will be taken into account. Request may specify more then one user id.

force_cache_reload
boolean

forces cached volumes to be reloaded

Responses

Response samples

Content type
application/json
{
  • "jobs": [
    ],
  • "next_page_params": {
    },
  • "matched_jobs_count": 70
}

Delete all low-level jobs matching filters. At least one filter is required.

Authorizations:
starfishAuth
query Parameters
target_id
integer

ID of the archive target

src_volume_id
integer

ID of job's source volume

dst_volume_id
integer

ID of job's destination volume

Responses

Response samples

Content type
application/json
{ }

Get single job by long id

Authorizations:
starfishAuth
path Parameters
job_id
required
string
Example: j_hash_20170821_1254_117

Id of the job

Responses

Response samples

Content type
application/json
{
  • "id": 1,
  • "long_id": "j_cmd_generating_f_20181010_1052_3",
  • "heartbeat": 1593093530.123456,
  • "ended_at": 1593093530.123456,
  • "started_at": 1593093421.123456,
  • "created_at": 1593093420.123456,
  • "duration": 10,
  • "workers_per_agent": [],
  • "duration_hum": "1m10s",
  • "avg_file_size": 8191.8,
  • "avg_file_size_hum": "8K",
  • "bandwidth": "string",
  • "batches_in_progress": [
    ],
  • "incarnations": [
    ],
  • "options": {
    },
  • "status": "starting",
  • "state": {
    },
  • "runtime": 0,
  • "runtime_hum": "string",
  • "est_total_bytes": 0,
  • "est_total_entries_num": 0,
  • "last_dispatcher_operation": "string",
  • "pause_interval": 0,
  • "reason_code": "string",
  • "reason_msg": "string",
  • "tag_results_with": "string",
  • "volume_id": 0,
  • "rerun_from_job_id": 0,
  • "query": [
    ],
  • "throttle_interval": 0,
  • "retry_count": 0,
  • "throttle_cmd": "string",
  • "prefetch_size": 0,
  • "prefetch_parallel_threads": 0,
  • "items_in_batch": 0,
  • "batch_size_bytes": 0,
  • "prescan_id": 0,
  • "pre_verify_ctime": true,
  • "post_verify_ctime": true,
  • "manifest_status": "not-requested",
  • "manifest_query_ids": [
    ],
  • "manifest_loc": "string",
  • "cmd_name": "string",
  • "current_incarnation": {
    },
  • "first_incarnation": {
    },
  • "fs_bytes_done": 69870585,
  • "fs_entries_pushed": 200,
  • "fs_entries_done": 150,
  • "fs_entries_failed": 50,
  • "fs_entries_temp_error": 250,
  • "fs_entries_timedout": 1,
  • "fs_entries_unprocessed": 0,
  • "bandwidth_seconds": 16399744.619680852,
  • "source": "home:user/projects",
  • "snapshot": "string",
  • "name": "string",
  • "allow_overlapping_job": false,
  • "batch_per_dir": false,
  • "batch_fields": {
    },
  • "cmd_output_format": "text",
  • "command": [
    ],
  • "entries_from_file": "paths",
  • "generate_manifest": false,
  • "ignore_results": false,
  • "post_verification": true,
  • "pre_verification": true,
  • "prescan_type": "diff",
  • "query_str": "ext jpg not uid 5",
  • "path_passing_method": "arg",
  • "requested_by": "string",
  • "root_path": "string",
  • "volume": "string",
  • "agent_fail_fast": true,
  • "agent_fail_fast_min_batches": 100,
  • "agent_fail_fast_threshold": 100,
  • "snapshot_glob": "string",
  • "sort_by": [
    ],
  • "group_by": [
    ],
  • "retry_entries": "grouped",
  • "limit": 0
}

Update job definition

Authorizations:
starfishAuth
path Parameters
job_id
required
string
Example: j_hash_20170821_1254_117

Id of the job

Request Body schema: application/json
remove_agents
Array of strings
id
integer

Job id generated by DB

long_id
string

More verbose version of job id.

heartbeat
number

Job last update timestamp

ended_at
number

job's heartbeat if job has final status, otherwise null

started_at
number or null

equal to started_at of the first incarnation which has this field set

created_at
number

equal to first incarnation's created_at

duration
integer

difference between heartbeat and started_at rounded to the nearest integer if heartbeat and started_at are not null, otherwise 0

Array of strings or integers[ items ]

Number of workers to be run on agent that can run that job. Each element is a pair of worker name and a number. Value for default worker is declared explicitly.

duration_hum
string

job duration as a humanized string, '-' if job is still running

avg_file_size
number

Average size of completed entries when there is at least on completed entry, 0 otherwise.

avg_file_size_hum
string

Human readable form of avg_file_size. It translates 8192 to '8K' and 1048576 to '1M'

bandwidth
string

Computed average bandwidth of running job.

Array of objects (batch_in_progress)
Array of objects (incarnation.output)
object (job.options.response)
status
string
Enum: "starting" "resuming" "tagging" "prescan" "preparing" "in_progress" "stopping" "done" "timeout" "stopped" "failed"

job status

object (job_state)

state for the corresponding status

runtime
integer

difference between heartbeat and created_at rounded to the nearest integer if created_at are not null, otherwise 0. If heartbeat is null, replace heartbeat with current time

runtime_hum
string

Human readable form of runtime

est_total_bytes
number or null

Estimated number of bytes (sum of files sizes) to be processed by job.

est_total_entries_num
number or null

Estimated number of files to be processed by job.

last_dispatcher_operation
string

Last executed operation by dispatcher service for this job.

pause_interval
number

Total number of seconds job was paused.

reason_code
string or null

Reason why the job has failed.

reason_msg
string or null

Human readable message associated with job failure.

tag_results_with
string or null

Name of tag to be assigned to successful entries.

volume_id
number

Id of source volume.

rerun_from_job_id
number or null

Id of rerun job (started with "sf job rerun"). NULL if job was not rerun.

query
Array of strings[ items ]

Parsed query that was used to run this job.

throttle_interval
number

Number of seconds the job was throttled for.

retry_count
number

How many times single entry should be retied. Value 1 means command will be executed up to 2 times per entry.

throttle_cmd
string or null

Command used to check if job should be throttled. By default jobs are not throttled.

prefetch_size
number

Number of bytes to prefetch from disk before executing a command.

prefetch_parallel_threads
number

How many threads will be used for prefetching.

items_in_batch
number

Maximum number of files and directories in batch, set by "--batch-size-entries" cli option.

batch_size_bytes
number

Maximum size of batch in bytes, set by "--batch-size-bytes" cli option.

prescan_id
number or null

Id of a scan that was run as prescan for this job or null if there was no prescan.

pre_verify_ctime
boolean or null

Set to False if job ignores ctime of entries during pre verification.

post_verify_ctime
boolean or null

Set to False if job ignores ctime of entries during post verification.

manifest_status
string
Enum: "not-requested" "generating" "uploading" "upload-failed" "failed" "done"

Status of job manifest. By default job manifests are not generated in which case the value is "not-requested".

manifest_query_ids
Array of strings

Ids of queries run in order to generate job manifests.

manifest_loc
string or null

Location where manifest file will be stored on agent running the job.

cmd_name
string

First part of command job parameter.

object (incarnation.output)
object (incarnation.output)
fs_bytes_done
number

Total number of bytes done for given job

fs_entries_pushed
number

Total number entries pushed to job entry queue.

fs_entries_done
number

Total number of files and dirs successfully processed for given job

fs_entries_failed
number

Total number of files and dirs failed for given job.

fs_entries_temp_error
number

Total number of error while processing job. There can be multiple errors per single entry if retries are enabled.

fs_entries_timedout
number

Total number of entries that have timedout.

fs_entries_unprocessed
number

Total number of entries that have not been processed. It may happen when job was stopped in progress.

bandwidth_seconds
number

Time in seconds used to calculate job bandwidth.

source
string

Source volpath for this job.

snapshot
string

Snapshot path selected for a job. If snapshot_glob contains pattern that matches many directories, then this field contains the one selected for the job.

name
required
string

Jobs with the same name will be run only on changed entries unless --from-scratch is given. When --from-scratch is given new results will override results from previous job with the same name. Entries can be queried based on job names. Each job has to have a name. Job name can be provided in command config in which case there is no need to provide it using API.

allow_overlapping_job
boolean
Default: false

When starting job dispatcher will check if no other job with common subdirectory is running. That option allows to disable that verification. It is highly recommended to disable also prescan as job will fail in case of prescan failed with overlapping scan error.

batch_per_dir
boolean
Default: false

When enabled dispatcher will create one batch per each directory containing all entries (in that case 'batch_size_entries' and 'batch_size_bytes' are ignored). When using batch_per_dir user is not allowed to add query filters to ensure that worker will receive all entries from a directory. This option is used by jobs that create tar archives. This option is exclusive to ignore_results option as combination of both options would result in running a job n times where n is the number of retries of the job.

batch_fields
object

This field contains mapping dictionary of params which should be read from FsEntry and passed to command. For example value {'jobs.hash.result.md5': 'md5', 'jobs.hash.mt': 'md5_mtime'} means that field jobs.hash.result.md5 should be passed to cmd_executor as md5 param and jobs.hash.mt should be passed as md5_mtime param.

cmd_output_format
string
Enum: "text" "json"
command
required
Array of strings
entries_from_file
string or null (entries_from_file_enum)
Enum: "paths" "sfids"

Determines the type of entries passed in file. Supported by restore job. Other jobs expects paths only.

generate_manifest
boolean or null

Generate a manifest file for a job. This value overwrites settings per command and global default set by dispatcher.generate_manifest.

ignore_results
boolean
Default: false

When job is started with this option - its generated job results are not stored in database. This option is exclusive to batch_per_dir option as combination of both options would result in running a job n times where n is the number of retries of the job.

post_verification
boolean
Default: true

Turns on verification that entry on which command was executed is that same expected version of entry From before cmd was run. Enabled, prevents from attaching job results to entry that has changed since cmd was executed on it.

pre_verification
boolean
Default: true

Turns on verification that entry on which command is going to be executed is that same expected version of entry as in the database. Enabled, prevents from running command on an entry whose verfion on filesystem results to entry that has changed since differs from version in the database

prescan_type
string
Enum: "diff" "mtime" "sync"
query_str
string
path_passing_method
string
Enum: "arg" "stdin" "stdin_json"

The way command receives entries:

  • arg - as an argument; command will be executed once per each entry (default)
  • stdin - through stdin, entries separated by null chars
  • stdin_json - through stdin, entries as JSON with data retrieved from DB. JSON is a list of dictionaries. Each dictionary contains local_path which is path to be processed and fields specified in batch_fields.
requested_by
string
root_path
required
string
volume
required
string
agent_fail_fast
boolean or null

Should the job fail before processing all entries in case some entries fail.

agent_fail_fast_min_batches
number or null

How many entries batches should fail before whole job is marked as failed and aborted.

agent_fail_fast_threshold
number or null

How many percent of entries should fail before whole job is marked as failed and aborted.

snapshot_glob
string or null

Path to the snapshot, relative to the volume root, may contain '*' or '.' that will be expanded. If many snapshots match the pattern, the last in the alphabetic order is used. Note that prescan is disabled by default when using that option.

sort_by
Array of strings
Items Enum: "ino" "parent_id" "parent_path" "fn" "ext" "depth" "mode" "uid" "gid" "username" "groupname" "usersid" "groupsid" "ct" "mt" "at" "size" "blck" "volume_id" "volume" "jobs" "fs"

List of columns that will be used to sort entries. To sort by json field (e.g. job result or custom argument) column can be defined with comma usage, for example "jobs.hash.result.md5" or "fs.win.dacl".

group_by
Array of strings
Items Enum: "ino" "parent_id" "parent_path" "fn" "ext" "depth" "mode" "uid" "gid" "username" "groupname" "usersid" "groupsid" "ct" "mt" "at" "size" "blck" "volume_id" "volume" "jobs" "fs"

The set of fields for which entries will be grouped into a single batch. For example, if group_by is set to ['ino'], all entries with the same inode will be processed in the same batch. If used together with sort_by then needs to be a prefix of the sort_by parameter.

retry_entries
string
Enum: "grouped" "only_failed" "whole_batch"

Decides how entries are retried.

  1. only_failed - retries only entries for which the command, or verification failed. This is the default value if neither batch_per_dir nor group_by options are used.
  2. whole_batch - retries all entries in the batch if the command or verification failed for any entry in the batch. This is the default value if batch_per_dir option is also used.
  3. grouped - retries all entries in a single group if any entry from the group failed. Can be used only along with group_by option and it is automatically used as the default value when group_by option is used. Please note that a single batch may contain multiple groups.
limit
integer or null

Limits entries processed by the job to N elements. NOTE: causes non-deterministic order when ran multiple times (due to nature of underlying database).

Responses

Request samples

Content type
application/json
{
  • "remove_agents": [
    ],
  • "id": 1,
  • "long_id": "j_cmd_generating_f_20181010_1052_3",
  • "heartbeat": 1593093530.123456,
  • "ended_at": 1593093530.123456,
  • "started_at": 1593093421.123456,
  • "created_at": 1593093420.123456,
  • "duration": 10,
  • "workers_per_agent": [],
  • "duration_hum": "1m10s",
  • "avg_file_size": 8191.8,
  • "avg_file_size_hum": "8K",
  • "bandwidth": "string",
  • "batches_in_progress": [
    ],
  • "incarnations": [
    ],
  • "options": {
    },
  • "status": "starting",
  • "state": {
    },
  • "runtime": 0,
  • "runtime_hum": "string",
  • "est_total_bytes": 0,
  • "est_total_entries_num": 0,
  • "last_dispatcher_operation": "string",
  • "pause_interval": 0,
  • "reason_code": "string",
  • "reason_msg": "string",
  • "tag_results_with": "string",
  • "volume_id": 0,
  • "rerun_from_job_id": 0,
  • "query": [
    ],
  • "throttle_interval": 0,
  • "retry_count": 0,
  • "throttle_cmd": "string",
  • "prefetch_size": 0,
  • "prefetch_parallel_threads": 0,
  • "items_in_batch": 0,
  • "batch_size_bytes": 0,
  • "prescan_id": 0,
  • "pre_verify_ctime": true,
  • "post_verify_ctime": true,
  • "manifest_status": "not-requested",
  • "manifest_query_ids": [
    ],
  • "manifest_loc": "string",
  • "cmd_name": "string",
  • "current_incarnation": {
    },
  • "first_incarnation": {
    },
  • "fs_bytes_done": 69870585,
  • "fs_entries_pushed": 200,
  • "fs_entries_done": 150,
  • "fs_entries_failed": 50,
  • "fs_entries_temp_error": 250,
  • "fs_entries_timedout": 1,
  • "fs_entries_unprocessed": 0,
  • "bandwidth_seconds": 16399744.619680852,
  • "source": "home:user/projects",
  • "snapshot": "string",
  • "name": "string",
  • "allow_overlapping_job": false,
  • "batch_per_dir": false,
  • "batch_fields": {
    },
  • "cmd_output_format": "text",
  • "command": [
    ],
  • "entries_from_file": "paths",
  • "generate_manifest": false,
  • "ignore_results": false,
  • "post_verification": true,
  • "pre_verification": true,
  • "prescan_type": "diff",
  • "query_str": "ext jpg not uid 5",
  • "path_passing_method": "arg",
  • "requested_by": "string",
  • "root_path": "string",
  • "volume": "string",
  • "agent_fail_fast": true,
  • "agent_fail_fast_min_batches": 100,
  • "agent_fail_fast_threshold": 100,
  • "snapshot_glob": "string",
  • "sort_by": [
    ],
  • "group_by": [
    ],
  • "retry_entries": "grouped",
  • "limit": 0
}

Response samples

Content type
application/json
{
  • "id": 1,
  • "long_id": "j_cmd_generating_f_20181010_1052_3",
  • "heartbeat": 1593093530.123456,
  • "ended_at": 1593093530.123456,
  • "started_at": 1593093421.123456,
  • "created_at": 1593093420.123456,
  • "duration": 10,
  • "workers_per_agent": [],
  • "duration_hum": "1m10s",
  • "avg_file_size": 8191.8,
  • "avg_file_size_hum": "8K",
  • "bandwidth": "string",
  • "batches_in_progress": [
    ],
  • "incarnations": [
    ],
  • "options": {
    },
  • "status": "starting",
  • "state": {
    },
  • "runtime": 0,
  • "runtime_hum": "string",
  • "est_total_bytes": 0,
  • "est_total_entries_num": 0,
  • "last_dispatcher_operation": "string",
  • "pause_interval": 0,
  • "reason_code": "string",
  • "reason_msg": "string",
  • "tag_results_with": "string",
  • "volume_id": 0,
  • "rerun_from_job_id": 0,
  • "query": [
    ],
  • "throttle_interval": 0,
  • "retry_count": 0,
  • "throttle_cmd": "string",
  • "prefetch_size": 0,
  • "prefetch_parallel_threads": 0,
  • "items_in_batch": 0,
  • "batch_size_bytes": 0,
  • "prescan_id": 0,
  • "pre_verify_ctime": true,
  • "post_verify_ctime": true,
  • "manifest_status": "not-requested",
  • "manifest_query_ids": [
    ],
  • "manifest_loc": "string",
  • "cmd_name": "string",
  • "current_incarnation": {
    },
  • "first_incarnation": {
    },
  • "fs_bytes_done": 69870585,
  • "fs_entries_pushed": 200,
  • "fs_entries_done": 150,
  • "fs_entries_failed": 50,
  • "fs_entries_temp_error": 250,
  • "fs_entries_timedout": 1,
  • "fs_entries_unprocessed": 0,
  • "bandwidth_seconds": 16399744.619680852,
  • "source": "home:user/projects",
  • "snapshot": "string",
  • "name": "string",
  • "allow_overlapping_job": false,
  • "batch_per_dir": false,
  • "batch_fields": {
    },
  • "cmd_output_format": "text",
  • "command": [
    ],
  • "entries_from_file": "paths",
  • "generate_manifest": false,
  • "ignore_results": false,
  • "post_verification": true,
  • "pre_verification": true,
  • "prescan_type": "diff",
  • "query_str": "ext jpg not uid 5",
  • "path_passing_method": "arg",
  • "requested_by": "string",
  • "root_path": "string",
  • "volume": "string",
  • "agent_fail_fast": true,
  • "agent_fail_fast_min_batches": 100,
  • "agent_fail_fast_threshold": 100,
  • "snapshot_glob": "string",
  • "sort_by": [
    ],
  • "group_by": [
    ],
  • "retry_entries": "grouped",
  • "limit": 0
}

Stop job with given id

Authorizations:
starfishAuth
path Parameters
job_id
required
string
Example: j_hash_20170821_1254_117

Id of the job

Responses

Stop all jobs or jobs matching given filters.

Stops all jobs or jobs matching given criteria. If no filters are provided, all jobs are stopped. Returns the list of that jobs.

Authorizations:
starfishAuth
query Parameters
created_by_username
Array of strings

Only jobs created by user with given username will be taken into account. Request may specify more then one name.

created_by_uid
Array of strings

Only jobs created by user with given UID will be taken into account. Request may specify more then one user id.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Resume job with given id

Authorizations:
starfishAuth
path Parameters
job_id
required
string
Example: j_hash_20170821_1254_117

Id of the job

query Parameters
force
boolean

Force to run job even it's already done (database will be scanned again and any uncompleted entries will be retried)

Responses

Upload file with entries for job with given id

File should consist of arbitrary number of file/dir paths separated with \0. Paths should be relative to job root_path. Job will start only after entries file is uploaded or will set status to timeout when file was not uploaded or the uploading took too long. In order to receive job entries the job should be started with entries_from_file = paths. Note that SFIDs are not yet supported in this case.

Authorizations:
starfishAuth
path Parameters
job_id
required
string
Example: j_hash_20170821_1254_117

Id of the job

query Parameters
is_data_completed
boolean
Default: true
Example: is_data_completed=true

if set to false, then after finished that HTTP call dispatcher will still wait for more entries

Request Body schema: application/binary
object (job_entries_file)

Responses

Start async query to generate manifest for archive job or low-level job

Authorizations:
starfishAuth
path Parameters
job_id
required
integer

Job id for which manifest should be created or fetched.

query Parameters
archive_job
boolean
Default: false

Value true means that job_id must be an archive job id. May not be used with parameter low_level_job=true

low_level_job
boolean
Default: false

Value true means that job_id must be a low-level job id. May not be used with parameter archive_job=true

csv
boolean
Default: false

When true then result will be prepared in CSV format.

Responses

Response samples

Content type
application/json
{
  • "query_id": "20180606_100701_7359bb_volume_name_",
  • "location": "/api/v1/async/query/20180606_100701_7359bb_volume_name_"
}

Download already generated manifest in a given format for archive job or low-level job

Authorizations:
starfishAuth
path Parameters
job_id
required
integer

Job id for which manifest should be created or fetched.

query Parameters
archive_job
boolean
Default: false

Value true means that job_id must be an archive job id. May not be used with parameter low_level_job=true

low_level_job
boolean
Default: false

Value true means that job_id must be a low-level job id. May not be used with parameter archive_job=true

csv
boolean
Default: false

When true then result will be prepared in CSV format.

Responses

Get list of job names

Authorizations:
starfishAuth

Responses

Response samples

Content type
application/json
[
  • "archivizer",
  • "hash",
  • "hash-quick",
  • "upload-builds"
]

license

License info

Get license info and license status per feature

Authorizations:
starfishAuth

Responses

Response samples

Content type
application/json
{
  • "license": {
    },
  • "features": {
    },
  • "comment": "string",
  • "licenses": {
    },
  • "features_expirations": {
    }
}

mapping

Retrieve information about users, groups and their memberships

List user name, uid and volume matching given filter predicates

Results are based on entries found when crawling a volume; it means that only users that ever owned an entry are guaranteed to be listed.

Authorizations:
starfishAuth
query Parameters
volume_name
string

Volume name. Response will contain only users/groups for a given volume.

user
string

Response will contain only users with a given name(s). Request may specify more then one user.

uid
string

Response will contain only users with a given user id(s). Request may specify more then one user id.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

List users with group ids they belong to

Results are based on entries found when crawling a volume; it means that only users and groups that ever owned an entry are guaranteed to be mapped.

Authorizations:
starfishAuth
query Parameters
volume_name
string

Volume name. Response will contain only users/groups for a given volume.

user
string

Response will contain only users with a given name(s). Request may specify more then one user.

uid
string

Response will contain only users with a given user id(s). Request may specify more then one user id.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

List group name, gid and volume matching given filter predicates

Results are based on entries found when crawling a volume; it means that only groups that ever owned an entry are guaranteed to be listed.

Authorizations:
starfishAuth
query Parameters
volume_name
string

Volume name. Response will contain only users/groups for a given volume.

group
string

Response will contain only grups with a given name(s). Request may specify more then one group.

gid
string

Response will contain only users with a given group id(s). Request may specify more then one group id.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

List groups with user ids of their members

Results are based on entries found when crawling a volume; it means that only groups and users that ever owned an entry are guaranteed to be mapped.

Authorizations:
starfishAuth
query Parameters
volume_name
string

Volume name. Response will contain only users/groups for a given volume.

group
string

Response will contain only grups with a given name(s). Request may specify more then one group.

gid
string

Response will contain only users with a given group id(s). Request may specify more then one group id.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

query

Find file entries by various criteria

Get entries from all volumes matching given filter predicates, via synchronous call

NOTE

The asynchronous version of Starfish (/async/query) should be used in most cases, since the synchronous query is limited by HTTP timeouts. For queries that return in 30 seconds or less, this query is appropriate.

Starfish uses async query internally, including the sf query command.

Authorizations:
starfishAuth
query Parameters
volumes_and_paths
string
Example: volumes_and_paths=home:projects/starfish

Name of the volume and path as volume:path. Multiple volumes with paths are not supported in query string. Multiple volumes and paths can be passed in request body as volumes_and_paths array.

zones
Array of arrays
Example: zones=sample_zone_name&zones=other_zone_name

Names of the zones. Multiple zones are supported in query strings

query
string
Example: query=type=f size=0-1024

Query filters. All filters supported by sf query command are supported. For example if command is:

  • sf query --type f --size 0-1024, it should be: type=f size=0-1024
  • sf query --type d --depth 2 --name *a*, it should be: type=d depth=2 name=*a*
format
string
Default: "parent_path fn type size blck ct mt at uid gid mode tags_explicit tags_inherited"

Space separated list of fields that should be returned. Note: output_format 'json' could have additional values even not specified here, for 'csv' it's a list of columns.

sort_by
string
Example: sort_by=parent_path,-ct,+size

Sort by given fields. Multiple fields should be comma-separated; you can prefix each field with + or - to sort ascending or descending. Without prefix it defaults to sorting ascending.

Allowed keys:

  • aggrs
  • ancestor_ids
  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • id
  • ino
  • mode
  • mt
  • parent_id
  • parent_path
  • rec_aggrs
  • size
  • uid
  • username
  • usersid
  • volume_id
  • volume
group_by
string
Example: group_by=volume,username

Group result by given fields. Multiple fields should be comma-separated. Allowed keys:

  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • mode
  • mt
  • parent_id
  • parent_path
  • aggrs
  • aggrs.min.atime
  • aggrs.total.blocks
  • aggrs.total.files
  • aggrs.total.size
  • rec_aggrs
  • rec_aggrs.size
  • rec_aggrs.files
  • rec_aggrs.dirs
  • rec_aggrs.size
  • rec_aggrs.max_sync_time
  • rec_aggrs.blocks
  • size
  • uid
  • username
  • usersid
  • volume
  • volume_id
limit
integer
Default: 1000

Limit the number of returned entries

force_tag_inherit
boolean
Default: false

Inherit tags even if they are in non-inheritable tagset

size_unit
string
Default: "B"

Determines size unit in which size-related fields will be returned. Allowed values: B,K,Ki,M,Mi,G,Gi,T,Ti,P,Pi,E,Ei,Z,Zi,Y,Yi.

size_unit_precision
integer

Determines number of decimal places returned in size-related fields. Works only if also 'size_unit' option is used. By default, size is rounded to 1 decimal place.

hum_size_precision
integer

Determines number of decimal places in human-readable size-related fields. By default, human-readable sizes are rounded to 1 decimal place.

type_hum_format
string

Determines what type_hum values are returned. The value of this parameter should be pairs of <filetype>=<value> separated by ";" where "filetype" is d for directory, f for regular file, l for symbolic link, b for block device, c for character device, s for socket, p for FIFO pipe and "value" is the desired type_hum value for that filetype. Example: "f=regular file;d=directory;l=symbolic link" For filetypes not listed in the format, default value is used.

humanize_nested
boolean
Default: false

Show nested fields such as aggrs, rec_aggrs or jobs with additional human-readable fields.

without_private_tags
boolean
Default: false

Do not show tags from private tagsets when this flag is set.

mount_agent
string
Default: "None"

Show mount path of volume for specified agent address. If an agent is specified that is not associated with a given volume, the API will return the mount_path of the default agent. This option supports single and multiple volume queries.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Get entries from all volumes matching given filter predicates, via synchronous call

See the description for GET /query/ endpoint. Parameters sent in the body which are not described have the same meaning as in the case of the GET version of the endpoint.

Authorizations:
starfishAuth
Request Body schema: application/json
volumes_and_paths
Array of strings

List of volume names and paths as volume:path.

zones
Array of strings

Names of the zones. Multiple zones are supported in query strings

query
string
format
string
sort_by
string
group_by
string
limit
integer
force_tag_inherit
boolean
size_unit
string
size_unit_precision
integer
hum_size_precision
integer
type_hum_format
string
humanize_nested
boolean
without_private_tags
boolean
mount_agent
string

Responses

Request samples

Content type
application/json
{
  • "volumes_and_paths": [
    ],
  • "zones": [
    ],
  • "query": "string",
  • "format": "string",
  • "sort_by": "string",
  • "group_by": "string",
  • "limit": 0,
  • "force_tag_inherit": true,
  • "size_unit": "string",
  • "size_unit_precision": 0,
  • "hum_size_precision": 0,
  • "type_hum_format": "string",
  • "humanize_nested": true,
  • "without_private_tags": true,
  • "mount_agent": "string"
}

Response samples

Content type
application/json
[
  • {
    }
]

Get entries from given volumes and paths matching given filter predicates, via synchronous call

NOTE

The asynchronous version of Starfish (/async/query) should be used in most cases, since the synchronous query is limited by HTTP timeouts. For queries that return in 30 seconds or less, this query is appropriate.

Starfish uses async query internally, including the sf query command.

Authorizations:
starfishAuth
path Parameters
volumes_and_paths
required
string
Example: home:projects%2Fstarfish/backup:projects%2Fstarfish/

Name of the volume and path as volume:path. / characters in path have to be replaced with %2F. Multiple volumes with paths can be given, separated with / (see example). When used in the interactive documentation, multiple volumes and paths cannot be passed. Use POST in this case.

query Parameters
zones
Array of arrays
Example: zones=sample_zone_name&zones=other_zone_name

Names of the zones. Multiple zones are supported in query strings

query
string
Example: query=type=f size=0-1024

Query filters. All filters supported by sf query command are supported. For example if command is:

  • sf query --type f --size 0-1024, it should be: type=f size=0-1024
  • sf query --type d --depth 2 --name *a*, it should be: type=d depth=2 name=*a*
format
string
Default: "parent_path fn type size blck ct mt at uid gid mode tags_explicit tags_inherited"

Space separated list of fields that should be returned. Note: output_format 'json' could have additional values even not specified here, for 'csv' it's a list of columns.

sort_by
string
Example: sort_by=parent_path,-ct,+size

Sort by given fields. Multiple fields should be comma-separated; you can prefix each field with + or - to sort ascending or descending. Without prefix it defaults to sorting ascending.

Allowed keys:

  • aggrs
  • ancestor_ids
  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • id
  • ino
  • mode
  • mt
  • parent_id
  • parent_path
  • rec_aggrs
  • size
  • uid
  • username
  • usersid
  • volume_id
  • volume
group_by
string
Example: group_by=volume,username

Group result by given fields. Multiple fields should be comma-separated. Allowed keys:

  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • mode
  • mt
  • parent_id
  • parent_path
  • aggrs
  • aggrs.min.atime
  • aggrs.total.blocks
  • aggrs.total.files
  • aggrs.total.size
  • rec_aggrs
  • rec_aggrs.size
  • rec_aggrs.files
  • rec_aggrs.dirs
  • rec_aggrs.size
  • rec_aggrs.max_sync_time
  • rec_aggrs.blocks
  • size
  • uid
  • username
  • usersid
  • volume
  • volume_id
limit
integer
Default: 1000

Limit the number of returned entries

force_tag_inherit
boolean
Default: false

Inherit tags even if they are in non-inheritable tagset

size_unit
string
Default: "B"

Determines size unit in which size-related fields will be returned. Allowed values: B,K,Ki,M,Mi,G,Gi,T,Ti,P,Pi,E,Ei,Z,Zi,Y,Yi.

size_unit_precision
integer

Determines number of decimal places returned in size-related fields. Works only if also 'size_unit' option is used. By default, size is rounded to 1 decimal place.

hum_size_precision
integer

Determines number of decimal places in human-readable size-related fields. By default, human-readable sizes are rounded to 1 decimal place.

type_hum_format
string

Determines what type_hum values are returned. The value of this parameter should be pairs of <filetype>=<value> separated by ";" where "filetype" is d for directory, f for regular file, l for symbolic link, b for block device, c for character device, s for socket, p for FIFO pipe and "value" is the desired type_hum value for that filetype. Example: "f=regular file;d=directory;l=symbolic link" For filetypes not listed in the format, default value is used.

humanize_nested
boolean
Default: false

Show nested fields such as aggrs, rec_aggrs or jobs with additional human-readable fields.

without_private_tags
boolean
Default: false

Do not show tags from private tagsets when this flag is set.

mount_agent
string
Default: "None"

Show mount path of volume for specified agent address. If an agent is specified that is not associated with a given volume, the API will return the mount_path of the default agent. This option supports single and multiple volume queries.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Start gathering entries from given volumes and paths matching given filter predicates, via asynchronous call.

Use this method instead of synchronous for most operations, since the synchronous query is limited by HTTP timeouts. Strongly recommended param is async_after_sec, which shortens feedback loop, especially for small queries.

Authorizations:
starfishAuth
query Parameters
volumes_and_paths
string
Example: volumes_and_paths=home:projects/starfish

Name of the volume and path as volume:path. Multiple volumes with paths are not supported in query string. Multiple volumes and paths can be passed in request body as volumes_and_paths array.

zones
Array of arrays
Example: zones=sample_zone_name&zones=other_zone_name

Names of the zones. Multiple zones are supported in query strings

queries
Array of strings
Default: []
Example: queries=type=d rec_aggrs.size=0-1024&queries=type=f size=0-1024

List of separate queries.

Each query is processed separately so entries are not sorted between queries. The result is an sorted output of first query and then of the next one etc. When passing multiple queries the result is sorted only partially. By design, entry which satisfy filters from multiple queries will appear more than once.

EXAMPLE:

Assume a database with 4 entries: [0-file, 1-dir, 2-file, 3-dir]

Query with

  • queries=["type=d", "type=f"] sort_by="fn" will return [1-dir, 3-dir, 0-file, 2-file]
  • queries=["type=f", "type=d"] sort_by="fn" will return [0-file, 2-file, 1-dir, 3-dir]
  • queries=["type=f name=*i*", "type=f name=*e*"], sort_by="fn" will return [0-file, 2-file, 0-file, 2-file]
format
string
Default: "parent_path fn type size blck ct mt at uid gid mode tags_explicit tags_inherited"

Space separated list of fields that should be returned. Note: output_format 'json' could have additional values even not specified here, for 'csv' it's a list of columns.

sort_by
string
Example: sort_by=parent_path,-ct,+size

Sort by given fields. Multiple fields should be comma-separated; you can prefix each field with + or - to sort ascending or descending. Without prefix it defaults to sorting ascending.

Allowed keys:

  • aggrs
  • ancestor_ids
  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • id
  • ino
  • mode
  • mt
  • parent_id
  • parent_path
  • rec_aggrs
  • size
  • uid
  • username
  • usersid
  • volume_id
  • volume
group_by
string
Example: group_by=volume,username

Group result by given fields. Multiple fields should be comma-separated. Allowed keys:

  • at
  • blck
  • ct
  • depth
  • ext
  • fn
  • gid
  • groupname
  • groupsid
  • mode
  • mt
  • parent_id
  • parent_path
  • aggrs
  • aggrs.min.atime
  • aggrs.total.blocks
  • aggrs.total.files
  • aggrs.total.size
  • rec_aggrs
  • rec_aggrs.size
  • rec_aggrs.files
  • rec_aggrs.dirs
  • rec_aggrs.size
  • rec_aggrs.max_sync_time
  • rec_aggrs.blocks
  • size
  • uid
  • username
  • usersid
  • volume
  • volume_id
force_tag_inherit
boolean
Default: false

Inherit tags even if they are in non-inheritable tagset

output_format
string
Default: "json"
Enum: "csv" "json" "txt"

Allow to change output format from json to csv, example: output_format=csv

delimiter
string
Default: ","

Column delimiter if output_format is csv

escape_paths
boolean
Default: false

escape \t and \n characters if output_format is csv

print_headers
boolean
Default: true

Print column names in the first line if output_format is csv

line_delimiter
string
Example: line_delimiter=

Line delimiter if output_format is txt (\n by default and \u0000 for null character.)

size_unit
string
Default: "B"

Determines size unit in which size-related fields will be returned. Allowed values: B,K,Ki,M,Mi,G,Gi,T,Ti,P,Pi,E,Ei,Z,Zi,Y,Yi.

size_unit_precision
integer

Determines number of decimal places returned in size-related fields. Works only if also 'size_unit' option is used. By default, size is rounded to 1 decimal place.

hum_size_precision
integer

Determines number of decimal places in human-readable size-related fields. By default, human-readable sizes are rounded to 1 decimal place.

type_hum_format
string

Determines what type_hum values are returned. The value of this parameter should be pairs of <filetype>=<value> separated by ";" where "filetype" is d for directory, f for regular file, l for symbolic link, b for block device, c for character device, s for socket, p for FIFO pipe and "value" is the desired type_hum value for that filetype. Example: "f=regular file;d=directory;l=symbolic link" For filetypes not listed in the format, default value is used.

humanize_nested
boolean
Default: false

Show nested fields such as aggrs, rec_aggrs or jobs with additional human-readable fields.

without_private_tags
boolean
Default: false

Do not show tags from private tagsets when this flag is set.

mount_agent
string
Default: "None"

Show mount path of volume for specified agent address. If an agent is specified that is not associated with a given volume, the API will return the mount_path of the default agent. This option supports single and multiple volume queries.

limit
integer

Limit the number of returned entries

async_after_sec
number <float>
Default: 5

If passed, async query waits async_after_sec before switching to async. If it finishes in async_after_sec, it returns result immediately with http status code 200

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Check status of gathering entries

Authorizations:
starfishAuth
path Parameters
query_id
required
string

Query ID that was returned by start async query operation

Responses

Response samples

Content type
application/json
{
  • "is_done": true,
  • "query_id": "string",
  • "location": "/api/v1/async/query/20180606_100701_7359bb_volume_name:"
}

Cancel query if it's still running and delete query result

Authorizations:
starfishAuth
path Parameters
query_id
required
string

Query ID that was returned by start async query operation

Responses

Download query output

Authorizations:
starfishAuth
path Parameters
query_id
required
string

Query ID that was returned by start async query operation

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Cancel query if it's still running and delete query result

Authorizations:
starfishAuth
path Parameters
query_id
required
string

Query ID that was returned by start async query operation

Responses

restore

Restore files

Create and start an restore job

Returns restore job object. Single restore job may contain multiple low-level jobs. Restore job will restore also permissions of files and directories. Existing directories permissions will not be overridden unless restore_permissions option is provided.

Authorizations:
starfishAuth
query Parameters
query
string
Example: query=type=f size=0-1024

Query filters. All filters supported by sf query command are supported. For example if command is:

  • sf query --type f --size 0-1024, it should be: type=f size=0-1024
  • sf query --type d --depth 2 --name *a*, it should be: type=d depth=2 name=*a*
Request Body schema: application/json
src_volume_and_path
string

the original location of files before archiving; this parameter will become optional once restoring from multiple volumes in a single is supported

dst_volume_and_path
string

volume:path to which files are to be restored

query
string

If defined here will overwrite query filters from url params.

object

All of these options can also be defined in top level dictionary

entries_from_file
string or null (entries_from_file_enum)
Enum: "paths" "sfids"

Determines the type of entries passed in file. Supported by restore job. Other jobs expects paths only.

object

Properties dedicated for restore job. All of these options can also be defined in top level dictionary.

Responses

Request samples

Content type
application/json
{
  • "src_volume_and_path": "src_volume_name:path/to/dir",
  • "dst_volume_and_path": "dst_volume_name:path/to/dir",
  • "query": "latest-copied-version",
  • "command_options": {
    },
  • "entries_from_file": "paths",
  • "restore_options": {
    }
}

Response samples

Content type
application/json
{
  • "href": "/api/restore/job/123",
  • "src_volume_and_path": "projects:dir1/dir2",
  • "dst_volume_and_path": "projects:dir1/dir2",
  • "entries_from_file": "paths",
  • "low_level_jobs": {
    },
  • "stats": {
    },
  • "id": 0,
  • "status": "STARTING",
  • "state": {
    },
  • "creation_time": 1593093530.123456,
  • "creation_time_hum": "2020-06-25 15:58:50",
  • "end_time": 1593093600.123456,
  • "end_time_hum": "2020-06-25 16:00:00",
  • "duration": 70,
  • "duration_hum": "1m10s",
  • "created_by_id": 1,
  • "created_by": {
    },
  • "created_by_hum": "Alice (uid=12)"
}

List restore jobs

Authorizations:
starfishAuth
query Parameters
status
Array of strings

Job status(es), either a list of statuses or a single status as a string. Cannot be used together with running.

running
boolean

if set to true, lists only running jobs; if set to false, lists only not-running (completed) jobs; cannot be used together with status

requested_by
Array of strings

created by a given entity, either a list of entities or a single entity as a string. For example 'gui', 'client', 'scheduler' etc.

creation_time
string

Supports FROM-TO and RELATIVE formats;

FROM-TO:

'# hour|day|week|month|year(s) ago' or 'YYYYMMDD[HHMM[SS]]' or 'now' or 'inf', for example:

  • 'inf-3 months ago'
  • '9 months ago-now'
  • '20200101-202003251314'

RELATIVE:

'[+|-]N[y|m|w|d|h]', meaning a number of years, months, weeks, days (default) or hours, for example:

  • '+4h': more (older) than 5 hours ago
  • '1d' (or '1'): between 2 days ago (48 hours) and 1 day (24 hours) ago
  • '-2y': less (newer) than 2 years ago
end_time
string

the same as creation_time

sort_by
string
Enum: "creation_time" "dst_path" "dst_volume_id" "end_time" "id" "phase" "query" "reason" "requested_by" "src_path" "src_volume_id" "status"
Example: sort_by=creation_time -dst_path

Sort by given fields. Multiple fields should be separated with some whitespace or comma. Each field could be prefixed with '+' or '-' to sort ascending or descending (default is ascending). By default, results are sorted by id, but the limit is applied descending. If limit is also specified, results are sorted first and then the limit is applied.

limit
integer

Maximum number of returned jobs

paging_offset
integer

Parameter that describes paging offset. It should be equal to number of entries that have been already printed on the previous pages.

For example:

  • paging_offset=0 and limit=2: two first entries will be printed,
  • paging_offset=4 and limit=2: 5th and 6th entry will be printed,
  • paging_offset=7 and limit=3: 8th, 9th and 10th entry will be printed.

With paged result comes field next_page_params, that describes parameters that should be passed with next request to retrieve the next page.

add_paging_params_to_response
boolean
Default: false

A flag specifying whether to include paging params in response.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

created_by_username
Array of strings

Only jobs created by user with given username will be taken into account. Request may specify more then one name.

created_by_uid
Array of strings

Only jobs created by user with given UID will be taken into account. Request may specify more then one user id.

Responses

Response samples

Content type
application/json
{
  • "restore_jobs": [
    ],
  • "next_page_params": {
    },
  • "matched_restore_jobs_count": 70
}

Get restore job by id

Authorizations:
starfishAuth
path Parameters
restore_job_id
required
integer

Id of the restore job

Responses

Response samples

Content type
application/json
{
  • "href": "/api/restore/job/123",
  • "src_volume_and_path": "projects:dir1/dir2",
  • "dst_volume_and_path": "projects:dir1/dir2",
  • "entries_from_file": "paths",
  • "low_level_jobs": {
    },
  • "stats": {
    },
  • "id": 0,
  • "status": "STARTING",
  • "state": {
    },
  • "creation_time": 1593093530.123456,
  • "creation_time_hum": "2020-06-25 15:58:50",
  • "end_time": 1593093600.123456,
  • "end_time_hum": "2020-06-25 16:00:00",
  • "duration": 70,
  • "duration_hum": "1m10s",
  • "created_by_id": 1,
  • "created_by": {
    },
  • "created_by_hum": "Alice (uid=12)"
}

Stop restore job with given id

Authorizations:
starfishAuth
path Parameters
restore_job_id
required
integer

Id of the restore job

Responses

Upload file with entries for restore job with given id

File should consist of arbitrary number of file/dir paths or SFIDs separated with \0. Paths should be relative to restore job root_path. Paths/SFIDs choice should match entries_from_file argument passed in restore job start request. Job will start only after entries file is uploaded or will set status to timeout when file was not uploaded or the uploading took too long.

Authorizations:
starfishAuth
path Parameters
restore_job_id
required
integer

Id of the restore job

Request Body schema: application/binary
object (job_entries_file)

Responses

Delete all restore jobs matching filter

Deletes all the restore jobs that corresponds to any of the filters specified.

Authorizations:
starfishAuth
query Parameters
volume_id
integer

numerical id of volume

Responses

scan

Scan volumes

Start new scan

Authorizations:
starfishAuth
Request Body schema: application/json
agent_address
string
Default: "https://localhost:30002"

Agent address where volume is present. Default agent address is stored in Starfish configuration file.

object
loading_priority
integer
Default: 0

Priority to load chunks. Value 100 means pgloader should load chunks immediately.

overlapping_check_disabled
boolean
Default: false

If set, scan is started without checking if it overlaps already pending scan; in rare cases using this parameter can lead to a broken tree structure in the database which can be fixed only by performing a sync scan; if really necessary, it is recommended to use this option for small jobs (e.g. refreshing directories in UI: depth = 0)

requested_by
required
string
Enum: "gui" "client" "scheduler" "dispatcher" "internal" "monitor"
type
required
string
Enum: "diff" "mtime" "sync"
volume
required
string

Responses

Request samples

Content type
application/json
{
  • "agent_address": "https://localhost:30002",
  • "crawler_options": {
    },
  • "loading_priority": 0,
  • "overlapping_check_disabled": false,
  • "requested_by": "gui",
  • "type": "diff",
  • "volume": "volume_name"
}

Response samples

Content type
application/json
{
  • "agent_state": "active",
  • "agent_heard": 0,
  • "status": "done",
  • "crawler": {
    },
  • "state": {
    },
  • "agent_address": "https://localhost:30002",
  • "crawler_options": {
    },
  • "loading_priority": 0,
  • "overlapping_check_disabled": false,
  • "requested_by": "gui",
  • "type": "diff",
  • "volume": "volume_name"
}

Get list of scans

Authorizations:
starfishAuth
query Parameters
sort_order
integer
Enum: -1 1

Use -1 for descending order and 1 for ascending

limit
integer

Max number of returned scans. Use "0" for unlimited response, by default 10000.

paging_offset
integer

Parameter that describes paging offset. It should be equal to number of entries that have been already printed on the previous pages.

For example:

  • paging_offset=0 and limit=2: two first entries will be printed,
  • paging_offset=4 and limit=2: 5th and 6th entry will be printed,
  • paging_offset=7 and limit=3: 8th, 9th and 10th entry will be printed.

With paged result comes field next_page_params, that describes parameters that should be passed with next request to retrieve the next page.

running
boolean

if set to true, lists only running scans; if set to false, lists only not-running (completed) scans; cannot be used together with state

volume
string

name of the volume

long_id
Array of strings

long id of the scan

num_id
Array of integers

numeric id of the scan

vol_paths
Array of strings

list volume names and paths joined with ':' (ex. "vol1", "vol2:path"), will list scans that have at least one startpoint equal to any of given vol:path

overlapping_vol_paths
Array of strings

list volume names and paths joined with ':' (ex. "vol1", "vol2:path"), will list scans that potentially overlap with any of given vol:path

state
Array of strings

Scan state(s), either a list of states or a single state as a string. Cannot be used together with running.

type
string

Could be used more then once in query then scan of any type will be returned

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

Responses

Response samples

Content type
application/json
{
  • "scans": [
    ],
  • "next_page_params": {
    },
  • "matched_scans_count": 70
}

Stop a running scan

Authorizations:
starfishAuth
path Parameters
scan_id
required
integer
Request Body schema: application/json
reason
string
Enum: "some_chunks_not_applied" "invalid_chunk_file" "invalid_mapping_file" "invalid_result_file" "volume_deleted" "crawler_failed" "event_monitor_failed" "timed_out" "unknown_scan_type" "unexpected_exception" "job_stopped" "service_stop" "manual"

the reason for scan stop

ignore_unavailable_agent
boolean
Default: false

stop scan even if the running agent is not available

Responses

Request samples

Content type
application/json
{
  • "reason": "service_stop",
  • "ignore_unavailable_agent": false
}

Response samples

Content type
application/json
{
  • "agent_state": "active",
  • "agent_heard": 0,
  • "status": "done",
  • "crawler": {
    },
  • "state": {
    },
  • "agent_address": "https://localhost:30002",
  • "crawler_options": {
    },
  • "loading_priority": 0,
  • "overlapping_check_disabled": false,
  • "requested_by": "gui",
  • "type": "diff",
  • "volume": "volume_name"
}

Get single scan by id

Authorizations:
starfishAuth
path Parameters
scan_id
required
integer

Responses

Response samples

Content type
application/json
{
  • "agent_state": "active",
  • "agent_heard": 0,
  • "status": "done",
  • "crawler": {
    },
  • "state": {
    },
  • "agent_address": "https://localhost:30002",
  • "crawler_options": {
    },
  • "loading_priority": 0,
  • "overlapping_check_disabled": false,
  • "requested_by": "gui",
  • "type": "diff",
  • "volume": "volume_name"
}

Get running event monitor scan by volume name

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Responses

Start a new event monitor scan

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Request Body schema: application/vnd.sf.fs_monitor.start+json
agent_address
string
disable_full_scan
boolean
extra_monitor_args
Array of strings
start_scan_on_agent
boolean

Responses

Request samples

Content type
application/vnd.sf.fs_monitor.start+json
{
  • "agent_address": "string",
  • "disable_full_scan": true,
  • "extra_monitor_args": [
    ],
  • "start_scan_on_agent": true
}

Change the state of an event monitor scan

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Request Body schema: application/vnd.sf.fs_monitor.change+json
action
required
string
Enum: "resume" "pause" "stop"
agent_address
string

Responses

Request samples

Content type
application/vnd.sf.fs_monitor.change+json
{
  • "action": "resume",
  • "agent_address": "string"
}

schedule

Scans and jobs schedules

List of cron entries for all volumes

Authorizations:
starfishAuth
query Parameters
sort_by
string
Enum: "cron" "next_run_timestamp" "path" "template" "volume_name"
Example: sort_by=cron

Sort by given fields. Multiple fields should be separated with some whitespace or comma. Each field could be prefixed with '+' or '-' to sort ascending or descending (default is ascending).

vol_and_path_list
string
Example: vol_and_path_list=vol1:/path/on/vol1,vol2:/path/on/vol2

Filter the output by returning only the entries with matching volumes and paths

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

Responses

Response samples

Content type
application/json
[
  • {
    }
]

List of cron entries for single volume

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

query Parameters
path
string

Path to directory for which we want to list cron entries. By default set to '' which means root of the volume. Entries from whole subtree will be returned.

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Set cron job for directory in volume

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

query Parameters
template
required
string

Name of starfish template which should be started when time comes.

cron
required
string

Cron expression which describes when templates should be run.

path
string

Path to directory for which we want set cron entry. Default is set to '' which is root of the volume.

Responses

Response samples

Content type
application/json
{
  • "template": "diff",
  • "next_run_timestamp": 1558724400,
  • "cron": "0 21 * * *"
}

Remove schedule for directory

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

query Parameters
path
string

Path to directory for which we want to remove cron entries. Default set to '' which is root of the volume. Will not remove schedules for any subdirectory of this path.

template
string

Name of template that should be removed. If more entries have template with the same name this call will remove all of them.

Responses

status

Report detailed system status

Report detailed system status

Authorizations:
starfishAuth
query Parameters
full_config
boolean
Default: false

If false, configuration for queries, commands and upload is trimmed from reported config.

no_obfuscation
boolean
Default: false

If no_obfuscation=False then fields like passwords or secret keys will be replaced with *** If no_obfuscation=True and user does not have permissions then HTTP code 401 is returned.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

custom_timeout
integer or null

If defined, status_pool_timeout, request_timeout and status_timeout will be replaced.

Responses

Response samples

Content type
application/json
{
  • "now_ts": 1679573885.295482,
  • "now_hum": "2023-03-23 13:18:05 +0100",
  • "sfhome": "string",
  • "client_version": "string",
  • "license": {
    },
  • "loaded_config": {
    },
  • "services": {
    },
  • "loki": {
    },
  • "agents": {
    },
  • "volumes": {
    },
  • "disk": {
    },
  • "host": {
    },
  • "postgres": {
    }
}

storage

Find and update file entries by path

See list of volumes with links to each volume content

Authorizations:
starfishAuth

Responses

Response samples

Content type
application/json
{}

Get information about top level entries of given volume

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Responses

Response samples

Content type
application/json
{
  • "version": "1.0",
  • "items": [
    ]
}

Same as GET method but do not return data - just HTTP headers are filled with entry metadata

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Responses

Response samples

Content type
application/json
{ }

/storage/{volume_name}/{path}

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

path
required
string

Responses

Response samples

Content type
application/json
{
  • "version": "1.0",
  • "items": [
    ]
}

Change tags assigned to entry

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

path
required
string
Request Body schema: application/vnd.sf.entry.patch+json
tag.add
Array of strings

list of tags which will be added to entry

tag.set
Array of strings

list of tags which will overwrite tags currently attached to entry

tag.delete
Array of strings

list of tags which will be removed from entry

Responses

Request samples

Content type
application/vnd.sf.entry.patch+json
{
  • "tag.add": [
    ],
  • "tag.set": [
    ],
  • "tag.delete": [
    ]
}

Response samples

Content type
application/json
{ }

summarize

Retrieve summary for a given volume and paths grouped by given column for each defined action_tag

Start calculating summarize query

Authorizations:
starfishAuth
Request Body schema: application/json
volumes_and_paths
Array of strings
Default: []

Name of the volume and path as volume:path. If missing or empty then all volumes are taken. It could be a string for single volume and path or an array with multiple volumes with paths.

query
string
Default: ""

List of filters which should be applied to summarize query

group_by
string (group_by_single_simple_key)
Enum: "at" "blck" "ct" "depth" "ext" "fn" "gid" "groupname" "groupsid" "ino" "mode" "mt" "parent_id" "parent_path" "size" "uid" "username" "usersid" "volume" "volume_id"

Group result by a single field.

Responses

Request samples

Content type
application/json
{
  • "volumes_and_paths": [
    ],
  • "query": "type=f size=10K-999P",
  • "group_by": "username"
}

Response samples

Content type
application/json
{
  • "summarize_id": "summarize_19700101_010000"
}

Check if summarize query is already done.

Check if summarize query is already done.

Authorizations:
starfishAuth
path Parameters
summarize_id
required
string

Id of started summarize query

Responses

Response samples

Content type
application/json
{
  • "summarize_id": "summarize_19700101_010000",
  • "is_done": true
}

Get results of finished summarize query

Get results of finished summarize query

Authorizations:
starfishAuth
path Parameters
summarize_id
required
string

Id of started summarize query

Responses

Response samples

Content type
application/json
[
  • {
    }
]

tag

Manage tags and tagsets

List available tags

Authorizations:
starfishAuth
query Parameters
with_tagset
boolean

If set to true, list of tags will be returned with tagsets in format {tagset}:{tag} (default tagset will be printed as an empty string, so all tags in default tagset will be printed as ':{tag}'). By default tags are returned without tagset. In such case tags with same name are returned as one.

in_tagset
string

List only tags from given tagset. To list tags in default tagset provide empty string. If not provided, tags from all tagsets are returned.

with_private
string

List also tags that are in private tagsets (which means tagset with name starting with '__'). This flag is true by default.

limit
integer

Maximum number of returned tags.

Responses

Response samples

Content type
application/json
{
  • "tags": [
    ]
}

Add tags in bulk

Authorizations:
starfishAuth
Request Body schema: application/vnd.sf.tag.bulk+json

What tags add to which paths. In case list of paths is empty, tag is created without attaching it to any path.

paths
Array of strings (volume_and_path)

list of paths as volume:path; if paths is an empty list tags are added without linking them to any entry

tags
Array of strings

list of tags

strict
boolean
Default: false

If set to true - request fails if any of the requested paths to tag is nonexistent. If set to false and some of the requested paths to tag are not found, the number of paths that were found in a database is returned in 'existing_paths_count'.

Responses

Request samples

Content type
application/vnd.sf.tag.bulk+json
{
  • "paths": [
    ],
  • "tags": [
    ],
  • "strict": false
}

Response samples

Content type
application/json
{
  • "errors": [
    ],
  • "existing_paths_count": 5,
  • "added_tags_count": 7
}

Detach tags from files in bulk

Authorizations:
starfishAuth
Request Body schema: application/vnd.sf.tag.detach+json

Set of tags to be detached and set of paths from which to detach. Both sets should be non-empty.

paths
Array of strings (volume_and_path)

list of paths as 'volume:path'

tags
Array of strings

list of tags

Responses

Request samples

Content type
application/vnd.sf.tag.detach+json
{
  • "paths": [
    ],
  • "tags": [
    ]
}

Response samples

Content type
application/json
{
  • "errors": [
    ],
  • "untagged_count": 5
}

Purge given tag from whole volume

If volume name is not given, or is null, tag is removed from entries on all volumes. Also, if volume name is not given, or is null, tag will no longer be available unless reintroduced. If tag name is not given, or is null, all tags in a given volume are removed.

Authorizations:
starfishAuth
Request Body schema: application/vnd.sf.tag.purge+json

What tags to remove from which volumes. Both volume and tag cannot be null at the same time.

volume
string or null

volume name

tag
string or null

tag name

Responses

Request samples

Content type
application/vnd.sf.tag.purge+json
{
  • "volume": "projects",
  • "tag": "tag1"
}

Response samples

Content type
application/json
{
  • "purged_count": 3,
  • "removed_tags_count": 1
}

Add explicit tag for each inherited tag for all entries in given subtree

Please note that on the contrary to POST /tag/ on some/dir/ (that adds explicit tag only to the some/dir/, all entries of some/dir/ inherit such tag), POST /tag/pin on some/dir/ assures that each entry in some/dir/ grants explicit tag for every inherited tag it had before. Therefore, after POST /tag/ a_tag on some/dir/ and POST /tag/pin on some/dir/, querying on explicit tags on some/dir/some/file will return a_tag, but without pinning it will not.

Authorizations:
starfishAuth
Request Body schema: application/vnd.sf.tag.pin+json
volume
required
string
path
required
string

path to subtree

set_tagset
string

name of the tagset that new explicit tags should receive

query
string

query used to filter entries which should have tags pinned

Responses

Request samples

Content type
application/vnd.sf.tag.pin+json
{
  • "volume": "vol1",
  • "path": "path/to/some/dir",
  • "set_tagset": "some_new_tagset",
  • "query": "string"
}

Response samples

Content type
application/json
12

Rename a given tag or assign it to a different tagset

If the tagset is not provided, the default one will be assumed. Currently, combining tags is not implemented, i.e. new tag cannot already exist.

Authorizations:
starfishAuth
Request Body schema: application/vnd.sf.tag.rename+json

Tag to be renamed and a new tag name

tag
required
string

Tag to be renamed in {tagset}:{tag_name} format. If tagset is not provided, the tag in the default tagset will be renamed.

new_tag
required
string

New tag name in {tagset}:{tag_name} format. If tagset is not provided, the renamed tag will belong to the default tagset. If the new tagset does not exist, it is created.

Responses

Request samples

Content type
application/vnd.sf.tag.rename+json
{
  • "tag": "tag1",
  • "new_tag": "tagset:tag2"
}

Response samples

Content type
application/json
{
  • "new_tag": "tagset:tag2"
}

Get list of all tagsets or a single tagset by id

Get a list of all tagsets if "id" is not provided. Get a single tagset if "id" is provided.

Authorizations:
starfishAuth
query Parameters
id
integer

Numerical id of tagset

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Create new tagset

Authorizations:
starfishAuth
Request Body schema: application/json

Tagset definition

name
string (tagset.name)

Non empty tagset name. May contain alphanumeric characters and symbols: _./?-+.

zone_names
Array of strings

list of zone names or identifiers

inheritable
boolean (tagset.inheritable)

Should tags in this tagset be displayed alongside items (files, directories) in tagged subtree?

pinnable
boolean (tagset.pinnable)

Should tags in this tagset be pinned when archiving

action
string (tagset.action)
Enum: "archive" "delete" "restore" "none"

Responses

Request samples

Content type
application/json
{
  • "name": "archive_to_s3",
  • "zone_names": [
    ],
  • "inheritable": true,
  • "pinnable": true,
  • "action": "archive"
}

Response samples

Content type
application/json
{
  • "name": "archive_to_s3",
  • "zone_ids": [
    ],
  • "zones": [
    ],
  • "inheritable": true,
  • "pinnable": true,
  • "action": "archive",
  • "tags": [
    ]
}

Get a single tagset by name

Authorizations:
starfishAuth
path Parameters
tagset_name
required
string

Tagset name. In case of default tagset (named "") use ":" like this: /tagset/:/.

Responses

Response samples

Content type
application/json
{
  • "name": "archive_to_s3",
  • "zone_ids": [
    ],
  • "zones": [
    ],
  • "inheritable": true,
  • "pinnable": true,
  • "action": "archive",
  • "tags": [
    ]
}

Update existing tagset

Update existing tagset by name. ?id= with tagset id may be provided instead of tagset_name, for example /tagset/?id=0.

Authorizations:
starfishAuth
path Parameters
tagset_name
required
string

Tagset name. In case of default tagset (named "") use ":" like this: /tagset/:/.

Request Body schema: application/json

Updated tagset properties

name
string
zone_id
integer or null

Bind that tagset to zone

drop_zone_id
integer or null

Unbind zone from that tagset

zone_ids
Array of integers or null

Set new list of binded zones. All previously attached zones will be detached.

inheritable
boolean
pinnable
boolean
action
string
Enum: "archive" "delete" "restore" "none"

Responses

Request samples

Content type
application/json
{
  • "name": "archive_to_s3",
  • "zone_id": 0,
  • "drop_zone_id": 0,
  • "zone_ids": [
    ],
  • "inheritable": true,
  • "pinnable": true,
  • "action": "archive"
}

Response samples

Content type
application/json
{
  • "name": "archive_to_s3",
  • "zone_ids": [
    ],
  • "zones": [
    ],
  • "inheritable": true,
  • "pinnable": true,
  • "action": "archive",
  • "tags": [
    ]
}

Delete single tagset

Delete single tagset by name. This will also delete all tags under that tagset. ?id= with tagset id may be provided instead of tagset_name, for example /tagset/?id=0.

Authorizations:
starfishAuth
path Parameters
tagset_name
required
string

Tagset name. In case of default tagset (named "") use ":" like this: /tagset/:/.

Responses

volume

Manipulate Starfish volumes

Get list of volumes or a single volume by numerical id

If id is not provided get a list of all volumes with detailed info about crons and size info. If an id is provided return a single volume.

Authorizations:
starfishAuth
query Parameters
id
integer

numerical id of volume

add_cron_info
boolean
Default: false

If enabled and querying a list of volumes then add also "cron" field with detailed information about crons entries attached to the volume.

confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

sort_by
string
Enum: "display_name" "free_space" "name" "path" "volume_id"
Example: sort_by=display_name

Sort by given fields. Multiple fields should be separated with some whitespace or comma. Each field could be prefixed with '+' or '-' to sort ascending or descending (default is ascending).

with_disk_usage
boolean
Default: false

If enabled and querying a list of volumes then add also "volume_size_info" field with details about disk size.
Note that to get proper "volume_size_info" data the volume must be scanned at least once.

with_mount_opts
boolean
Default: false

If enabled and querying a single volume then update the "mount_opts" field with details about mount options of the volume.\

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Save new volume

Authorizations:
starfishAuth
Request Body schema: application/json
vol
string

New volume name. Updating a volume with some new volume name is possible only when no scan and no job is pending on this volume, also redash reports cannot be calculated at that time.

agent_address
string

Agent address to be added to the volume (required when adding new volume). When adding new or modifying existing agent, adress will be normalized (possibly enhanced with schema and or port number). When removing an agent, part of address is also accepted as long as it uniquely identifies agent.

root
string

Path where the volume is mounted on the agent (required when adding new volume)

no_cron
boolean
Default: false

If set to true then default daily scan cron job will be not added for the volume

display_name
string

User familiar name that may contain also characters that are forbidden in vol field and should be used only for visualisation.

default_agent_address
string

The agent that will be used to scan volume when no agent provided in scan request.

dir_excludes
Array of strings

directories (glob patterns allowed) to be excluded during scanning

file_excludes
Array of strings

filenames (glob patterns are allowed) to be excluded during scanning

ignored_dir_stat_fields
Array of strings

If only fields from this list differ between db and fs, CHANGE event is not triggered. Applies to directories. These values are set per given volume, in addition to the global values in config, which apply to all volumes. Possible fields are: st_mode, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime, st_blocks, st_nlink, st_ino

ignored_file_stat_fields
Array of strings

If only fields from this list differ between db and fs, CHANGE event is not triggered. Applies to non-directories. These values are set per given volume, in addition to the global values in config, which apply to all volumes. Possible fields are: st_mode, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime, st_blocks, st_nlink, st_ino

store_win_acl
boolean
Default: true

Only applies to Windows volumes - cannot be set on Linux volumes. If enabled will store also Windows access control lists when scanning this volume.

store_win_attr
boolean
Default: false

Only applies to Windows volumes - cannot be set on Linux volumes. If enabled will store also Windows file attributes (read-only, hidden, etc.) when scanning this volume.

store_posix_acl
boolean
Default: false

Store also POSIX access control lists when scanning this volume. This may be slow.

total_capacity
number

Capacity of the volume. This will be ignored if capacity_set_manually is false.

capacity_set_manually
boolean

If set to false then total_capacity will be automatically set by agent.

free_space
number

free space of the volume. This will be ignored if free_space_set_manually if false.

free_space_set_manually
boolean

If set to false then free_space will be automatically set by agent.

user_params
object

Map of user parameters. This is a good place to store some properties about the volume. User may define any string to string pair. Some of them are used by SF internally (example: cost_per_gb). When using in update request reusing key will cause overwriting value and to remove some key its value should be null.

type
string

OS on which volume should be mounted ('Linux' or 'Windows')

Responses

Request samples

Content type
application/json
{
  • "vol": "foo",
  • "agent_address": "string",
  • "root": "string",
  • "no_cron": false,
  • "display_name": "string",
  • "default_agent_address": "string",
  • "dir_excludes": [
    ],
  • "file_excludes": [
    ],
  • "ignored_dir_stat_fields": [
    ],
  • "ignored_file_stat_fields": [
    ],
  • "store_win_acl": true,
  • "store_win_attr": false,
  • "store_posix_acl": false,
  • "total_capacity": 0,
  • "capacity_set_manually": true,
  • "free_space": 0,
  • "free_space_set_manually": true,
  • "user_params": {
    },
  • "type": "string"
}

Response samples

Content type
application/json
{
  • "volume": {
    },
  • "created": true
}

Get single volume by name

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

query Parameters
add_cron_info
boolean
Default: false

If enabled and querying a list of volumes then add also "cron" field with detailed information about crons entries attached to the volume.

with_disk_usage
boolean
Default: false

If enabled and querying a list of volumes then add also "volume_size_info" field with details about disk size.
Note that to get proper "volume_size_info" data the volume must be scanned at least once.

with_mount_opts
boolean
Default: false

If enabled and querying a single volume then update the "mount_opts" field with details about mount options of the volume.\

Responses

Response samples

Content type
application/json
{
  • "id": 5,
  • "vol": "foo",
  • "display_name": "/mnt/foo/",
  • "inode": 657,
  • "inode_str": "657",
  • "store_win_acl": null,
  • "store_posix_acl": false,
  • "total_capacity": 31231231237654,
  • "capacity_set_manually": false,
  • "free_space": 333222111000,
  • "free_space_set_manually": true,
  • "default_agent_address": "http://agent1:30002",
  • "cron": [
    ],
  • "mounts": {
    },
  • "mount_opts": {
    },
  • "dir_excludes": [
    ],
  • "file_excludes": [ ],
  • "ignored_dir_stat_fields": [
    ],
  • "ignored_file_stat_fields": [
    ],
  • "user_params": { },
  • "type": "Linux",
  • "volume_size_info": {
    },
  • "cron_service_up": "true,",
  • "number_of_files": "0,",
  • "number_of_dirs": "1,",
  • "sum_of_logical_sizes_div_nlinks": "4096,",
  • "sum_of_logical_sizes_no_nlinks": "4096,",
  • "sum_of_physical_sizes_div_nlinks": "4096,",
  • "sum_of_blocks_div_nlinks": "8,",
  • "sum_of_physical_sizes_no_nlinks": "4096,",
  • "sum_of_blocks": "8,",
  • "sum_of_logical_sizes": "4096,",
  • "sum_of_physical_sizes": "4096,"
}

Update single volume by name

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

Request Body schema: application/json
skip_check_on_agent
boolean
Default: false

Do not request agent to verify if agent_address and root is valid.

vol
string

New volume name. Updating a volume with some new volume name is possible only when no scan and no job is pending on this volume, also redash reports cannot be calculated at that time.

agent_address
string

Agent address to be added to the volume (required when adding new volume). When adding new or modifying existing agent, adress will be normalized (possibly enhanced with schema and or port number). When removing an agent, part of address is also accepted as long as it uniquely identifies agent.

root
string

Path where the volume is mounted on the agent (required when adding new volume)

no_cron
boolean
Default: false

If set to true then default daily scan cron job will be not added for the volume

display_name
string

User familiar name that may contain also characters that are forbidden in vol field and should be used only for visualisation.

default_agent_address
string

The agent that will be used to scan volume when no agent provided in scan request.

dir_excludes
Array of strings

directories (glob patterns allowed) to be excluded during scanning

file_excludes
Array of strings

filenames (glob patterns are allowed) to be excluded during scanning

ignored_dir_stat_fields
Array of strings

If only fields from this list differ between db and fs, CHANGE event is not triggered. Applies to directories. These values are set per given volume, in addition to the global values in config, which apply to all volumes. Possible fields are: st_mode, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime, st_blocks, st_nlink, st_ino

ignored_file_stat_fields
Array of strings

If only fields from this list differ between db and fs, CHANGE event is not triggered. Applies to non-directories. These values are set per given volume, in addition to the global values in config, which apply to all volumes. Possible fields are: st_mode, st_uid, st_gid, st_size, st_atime, st_mtime, st_ctime, st_blocks, st_nlink, st_ino

store_win_acl
boolean
Default: true

Only applies to Windows volumes - cannot be set on Linux volumes. If enabled will store also Windows access control lists when scanning this volume.

store_win_attr
boolean
Default: false

Only applies to Windows volumes - cannot be set on Linux volumes. If enabled will store also Windows file attributes (read-only, hidden, etc.) when scanning this volume.

store_posix_acl
boolean
Default: false

Store also POSIX access control lists when scanning this volume. This may be slow.

total_capacity
number

Capacity of the volume. This will be ignored if capacity_set_manually is false.

capacity_set_manually
boolean

If set to false then total_capacity will be automatically set by agent.

free_space
number

free space of the volume. This will be ignored if free_space_set_manually if false.

free_space_set_manually
boolean

If set to false then free_space will be automatically set by agent.

user_params
object

Map of user parameters. This is a good place to store some properties about the volume. User may define any string to string pair. Some of them are used by SF internally (example: cost_per_gb). When using in update request reusing key will cause overwriting value and to remove some key its value should be null.

type
string

OS on which volume should be mounted ('Linux' or 'Windows')

Responses

Request samples

Content type
application/json
{
  • "skip_check_on_agent": true,
  • "vol": "foo",
  • "agent_address": "string",
  • "root": "string",
  • "no_cron": false,
  • "display_name": "string",
  • "default_agent_address": "string",
  • "dir_excludes": [
    ],
  • "file_excludes": [
    ],
  • "ignored_dir_stat_fields": [
    ],
  • "ignored_file_stat_fields": [
    ],
  • "store_win_acl": true,
  • "store_win_attr": false,
  • "store_posix_acl": false,
  • "total_capacity": 0,
  • "capacity_set_manually": true,
  • "free_space": 0,
  • "free_space_set_manually": true,
  • "user_params": {
    },
  • "type": "string"
}

Response samples

Content type
application/json
{
  • "id": 1,
  • "vol": "foo",
  • "display_name": "/mnt/foo/",
  • "inode": 657,
  • "inode_str": "657",
  • "store_win_acl": null,
  • "store_posix_acl": false,
  • "total_capacity": 31231231237654,
  • "capacity_set_manually": false,
  • "free_space": 333222111000,
  • "free_space_set_manually": true,
  • "default_agent_address": "http://agent1:30002",
  • "mounts": {
    },
  • "mount_opts": {
    },
  • "dir_excludes": [
    ],
  • "file_excludes": [ ],
  • "ignored_dir_stat_fields": [
    ],
  • "ignored_file_stat_fields": [
    ],
  • "user_params": { },
  • "type": "Linux"
}

Delete volume by name

Authorizations:
starfishAuth
path Parameters
volume_name
required
string

name of volume

query Parameters
remove_reports
boolean
Default: false

Flag that determines if redash reports data should be deleted together with volume

Responses

Response samples

Content type
application/json
{ }

zone

Manage Starfish zones

Create new zone

Returns zone object.

Authorizations:
starfishAuth
Request Body schema: application/json
name
string

Zone name, not empty string which may contain only letters A-Za-z, digits 0-9, underscore _ or dash - but does not start with a dash.

Array of objects (user)

list of managers

Array of objects (zonegroup)

list of managing groups'

paths
Array of strings (volume_and_path)

list of paths as 'volume:path'

user_params
object
Default: {}

Optional dictionary of user-defined key-value pairs

Responses

Request samples

Content type
application/json
{
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "paths": [
    ],
  • "user_params": {
    }
}

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

List all zones

Returns list of zone objects.

Authorizations:
starfishAuth
query Parameters
confidential
boolean
Default: false

If enabled then fields that may contain confidential info will be replaced either with *** or a hash

Responses

Response samples

Content type
application/json
[
  • {
    }
]

Update zone

Updates zone object. See "Request body" for list of fields that can be updated.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

Request Body schema: application/json
name
string

Zone name, not empty string which may contain only letters A-Za-z, digits 0-9, underscore _ or dash - but does not start with a dash.

Array of objects (user)

list of managers

Array of objects (zonegroup)

list of managing groups'

paths
Array of strings (volume_and_path)

list of paths as 'volume:path'

user_params
object
Default: {}

Optional dictionary of user-defined key-value pairs

Responses

Request samples

Content type
application/json
{
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "paths": [
    ],
  • "user_params": {
    }
}

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

Show given zone

Returns zone object.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

Responses

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

Delete zone

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

query Parameters
force
boolean

Remove zone even if it's used by some tagset

Responses

Updates user params of the zone.

Updates a user params in a zone object.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

Request Body schema: application/json
object (user_params)
Default: {}

Dictionary of user-defined key-value pairs

Responses

Request samples

Content type
application/json
{
  • "cost_per_gb": "0.0123",
  • "purpose": "keep users data",
  • "location": "2.3.b"
}

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

Add/update a single user param of the zone.

Adds/updates a single user param to a zone object.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

user_param_name
required
string

name (key) of the zone user parameter

Request Body schema: application/text
value
required
string

Responses

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

Delete a single user param from the zone.

Deletes a single user param from zone object.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

user_param_name
required
string

name (key) of the zone user parameter

Responses

Response samples

Content type
application/json
{
  • "id": 0,
  • "name": "zone_name",
  • "managers": [
    ],
  • "managing_groups": [
    ],
  • "restore_managers": [
    ],
  • "restore_managing_groups": [
    ],
  • "paths": [
    ],
  • "tagsets": [
    ],
  • "user_params": {
    }
}

Update restoring zone managers and restoring managing groups

Add, set or delete zone managers and/or managing groups that can restore within the zone. To allow/forbid only selected zone managers or managing groups to restore, add user: or group: prefix to username and groupname, respectively. To allow/forbid all zone managers and managing groups to restore within the zone, pass zone-managers constant as a list element. When none of the zone managers or managing groups can restore within the zone, only SF admin can restore inside it.

Authorizations:
starfishAuth
path Parameters
zone_id
required
integer

ID of zone

Request Body schema: application/zone.authz.restore.patch+json
restoring_managers.add
Array of strings

Add selected (or all) zone managers/managing groups as restoring managers/managing groups. Cannot be used with restoring_managers.set. If zone-managers is passed as a list element, all zone managers and managing groups will be able to restore within the zone.

restoring_managers.delete
Array of strings

Delete selected (or all) zone managers/managing groups as restoring managers/managing groups. Cannot be used with restoring_managers.set. When all zone managers and managing groups can restore within the zone, deleting selected zone managers/managing groups is not allowed. If zone-managers is passed as a list element, only SF admin will be able to restore inside the zone.

restoring_managers.set
Array of strings

Set selected (or all) zone managers/managing groups as restoring managers/managing groups. If zone-managers is passed as a list element, all zone managers and managing groups will be able to restore within the zone.

Responses

Request samples

Content type
application/zone.authz.restore.patch+json
{
  • "restoring_managers.add": [
    ],
  • "restoring_managers.delete": [
    ]
}

Response samples

Content type
application/json
[
  • "user:alice",
  • "user:bob",
  • "group:admins"
]