Metadata

Onedata comes with an extensive support for metadata management, which can be used to describe all kinds of resources in Onedata including files, folders, spaces and users.

Metadata types in Onedata

Metadata in Onedata are organized into 3 levels:

  • Filesystem attributes - basic metadata related to file system operations such as file size, creation and modification timestamps, POSIX access rights, etc.,
  • Extended attributes - these attributes enable assigning custom key-value pairs with resources in Onedata. These attributes can include for instance information about owner and creators of the file, Access Control Lists, license, etc.,
  • Custom metadata - this level provides most flexibility and Onedata itself does not assume any schema related with these metadata. For each resource, user can assign a separate document in one of supported metadata formats (currently JSON and RDF).

The filesystem and extended level attributes are accessible directly via POSIX and CDMI protocols.

Filesystem attributes

This section describes typical filesystem metadata attributes. The list of attributes at this level is closed (i.e. users cannot add new attributes) and most of them are read-only, which means their values cannot be directly modified (cdmi_ attributes). Other attributes (currently only posix_mode) can be modified by the user using the REST API.

Attribute Sample value Description
size 1024 File size in bytes
mode 0666 POSIX access mode in octal form (i.e. 4 digits starting with 0)
atime 1470304148 Unix last access timestamp
mtime 1470304148 Unix last modification timestamp
ctime 1470304148 Unix last status change timestamp
storage_group_id 1470304148 Gid of the storage group owner of this file (the same Gid is displayed via oneclient)
storage_user_id 1470304148 Uid of the storage owner of this file
name file.txt The name of the object (Space, folder or file)
owner_id 79c0ed35-f32e-4db3-a87f-76a588c9b2f9 ID of the file owner
shares ["b3-a87f-76a588c9b279c0ed35-f32e-4db", ...] Array of share Id's associated with this file or folder
type 'reg' Specifies whether the resource is a regular file (reg), a directory (dir) or a link (lnk)

Extended attributes

In a general case, extended attributes are platform agnostic and users can choose whatever keys and values to be assigned for these level attributes.

One restriction is that all keys, beginning with onedata_ prefix, should be avoided as they are used by Onedata platform for special purposes, in particular for presentation in Graphical User Interface and Open Data publishing and management.

Setting extended attributes using Graphical User Interface

Graphical user interface provides means for editing extended attributes in the form of a key-value pairs, as presented in the figure below.

The extended metadata values can be assigned to either files or folders.

Setting extended attributes using REST API

Extended attributes can be modified either from the Graphical User Interface, from the command line as well as via the REST API. Below are some examples:

Set the extended attribute "license" to "CC-0" using REST API

curl --tlsv1.2 -X PUT -H "X-Auth-Token: $TOKEN" \
-H 'Content-type: application/json' -d '{ "license": "CC-0" }'
"https://$HOST/api/v3/oneprovider/attributes/MySpace1/File2.txt?extended=true"

List all extended attributes using REST API

curl --tlsv1.2 -X GET -H "X-Auth-Token: $TOKEN" \
"https://$HOST/api/v3/oneprovider/attributes/MySpace1/File2.txt?extended=true"

Setting extended attributes using command line

Oneclient mounted spaces have support for extended attribute (xattr) feature, which can be accessed and manipulating using such tools as xattr or getfattr and setfattr. For instance to set an attribute on a file:

[/mnt/oneclient/MySpace1]$ ls
File2.txt

[/mnt/oneclient/MySpace1]$ xattr -l File2.txt
license: CC-0
org.onedata.uuid: Z3VpZCM1NzMwZjNjNjRjYmI0Y2M1MjllZjRlYWVhNDJkNDY4MyNmNzMzMzA1Zj
dhMGE4MWRjZTM5NjY2NzEzYTUxNmYwYg
org.onedata.space_id: f733305f7a0a81dce39666713a516f0b
org.onedata.file_id: /f733305f7a0a81dce39666713a516f0b/5/7/3/5730f3c64cbb4cc529e
f4eaea42d4683
org.onedata.storage_id: 55e4475e8dc60dc3ebd070f8dd424f24
org.onedata.access_type: unknown
org.onedata.file_blocks: [##################################################]
org.onedata.file_blocks_count: 1
org.onedata.replication_progress: 100%

[/mnt/oneclient/MySpace1]$ xattr -w license CC-1 File2.txt

Custom metadata

In addition to filesystem level and extended attributes, Onedata supports arbitrary metadata documents to be assigned to each resource, which are stored in separate metadata backends. Currently supported backends include:

  • JSON
  • RDF

In each of these backends, user can store any properly formatted metadata documents, which can be modified and retrieved using the REST API or in the future in the Graphical User Interface.

Advanced metadata queries

Onedata supports creation of custom views on files' metadata. They can be used for:

  • efficient querying for files
  • producing tables and lists of information based on files' metadata
  • extracting or filtering information from files' metadata
  • calculating, summarizing or reducing the information on the stored metadata

Views are a result of continuous indexing of documents. Documents are mapped using user-defined mapping function. Optionally, results of the mapping can also be reduced using a reduce function if it is provided by the user. Internally, views are based on Couchbase Views. Please visit this site for more comprehensive explanation of concepts used among this documentation.

There are two types of views that can be created:

  • map-reduce views
  • spatial views - spatial views are similar to map-reduce views. They are suited for querying multi-dimensional data. The main difference is that they don't have a reduce function.

Currently, views can be created on the following models:

Concepts

Mapping function

All information presented in this section is relevant to both map-reduce and spatial views. Function used by spatial views is called as a spatial in Couchbase documentation. For simplicity, in this documentation, the mapping name will be used for both terms, as they must comply to the same rules (with one exception, emphasised below).

In order to create a view, it is necessary to write a simple Javascript mapping function. It will be used to map the data stored in the document to the value which should be indexed. Mapping is performed by using emit() function. Each call to emit() results in a new row of data in the view result. More info on mapping functions concepts can be found here.

In Onedata views API, the mapping function submitted by the user is wrapped inside additional Javascript code, in order to comply with Couchbase API.

The mapping function should accept 4 arguments:

  • id - CDMI object id of a file,
  • type - type of the document that is being mapped by the function. One of:
    • "file_meta"
    • "times"
    • "custom_metadata"
    • "file_popularity"
  • meta - values stored in the document being mapped,
  • ctx - context object used for storing helpful information. Currently it stores:
    • providerId,
    • spaceId.

The mapping function should return (key, value) pair or pairs that are to be emitted to the view via emit() function.

If one document shall be mapped to exactly one row in the view, the mapping function should return 2-element list [key, value], where key and value can be any JS object.

If one document shall be mapped to many rows in the view, the mapping function should return an object with the key 'list'. The value should be a list of 2-element lists [key, value]. The emit() function will be called for each 2-element list in the top-level list.

NOTE: Spatial view key format

The mapping function defined for a spatial view must return the key as a multidimensional bounding box. There are 3 accepted ways of defining a key in a spatial function:

  • single values - list of numerical values, which will be expaned to a collapsed range. For example, list [1.0, 2, 3.5] will be internally expanded to list of ranges [[1.0, 1.0], [2 , 2], [3.5, 3.5]]
  • ranges - list of ranges. For example: [[1.0, 2.0], [100, 1000]]
  • GeoJSON geometry - the following GeoJSON objects are supported:
    • Point
    • MultiPoint
    • LineString
    • MultiLineString
    • MultiPolygon
    • GeometryCollection

Above formats of defining keys might be combined. The only constraint is that GeoJSON object must be the first element of the list.

Defining spatial view keys is thoroughly described here.

Valid formats of the mapping function are presented below. key and value can be any valid JSON objects:

  • returning a single view row
     function (id, type, meta, ctx) {
         var key = ...
         var value = ...
         return [key, value];
     }
    
  • returning multiple view rows

       function (id, type, meta, ctx) {
           var key1 = ...
           var value1 = ...
           var key2 = ...
           var value2 = ...
           .
           .
           .
           var keyN = ...
           var valueN = ...
    
           return {'list': [
               [key1, value1],
               [key2, value2],
               .
               .
               .
               [keyN, valueN]
           ]};
       }
    

A few examples of the mapping function are presented here.

Reduce function (optional)

Reduce function is optional. It can be used only for map-reduce views. Typical uses for a reduce function are to produce a summarized count of the input data, or to provide sum or other calculations on the input data.

Contrary to the mapping function, the reduce function is not wrapped by any additional Javascript code. It is passed as it is to the Couchbase and therefore all information and notices presented here are relevant. In particular:

Metadata models that can be indexed

File meta model

Model that stores basic file metadata, such as:

  • name - name of the file
  • type - specifies whether the resource is a regular file (REG) or a directory (DIR)
  • mode - POSIX access mode in octal form (i.e. 4 digits starting with 0)
  • owner - Id of an owner of the file
  • group_owner - Id of a group owner of the file
  • provider_id - Id of a provider on which the file was created
  • shares - list of share Id's associated with this file or folder
  • deleted - flag informing that file was marked to be deleted

Times model

This model was extracted from the file_meta due to efficiency reasons. It stores classical Unix timestamps:

  • atime - Unix last access timestamp
  • mtime - Unix last modification timestamp
  • ctime - Unix last status timestamp

Custom metadata model

Model used for storing extended attributes and custom metadata. Currently, views can operate on both extended attributes as well as JSON metadata, RDF metadata backend indexing is not yet supported. The model has the following fields:

  • onedata_json - which stores map of JSON metadata values
  • onedata_rdf - which stores RDF metadata
  • extended attributes set by users

File popularity model

Model used for tracking file popularity. These documents will be available only if collecting file popularity statistics is turned on in the given space. It can be turned on only by space admin via Onepanel. The File popularity document is available only for a file which has been opened at least once on a given provider.
It stores:

  • size - total sum of the file's blocks stored on given provider
  • open_count - number of open operations on the file
  • last_open - timestamp fo last open on the file
  • hr_hist - hourly histogram of number of open operations on the file per hour, in the last 24 hours
  • dy_hist - daily histogram of number of open operations on the file per day, in the last 30 days
  • mth_hist - monthly histogram of number of open operations on the file per month, in the last 12 months
  • hr_mov_avg - moving average of number of open operations on the file per hour
  • dy_mov_avg - moving average of number of open operations on the file per day
  • mth_mov_avg - moving average of number of open operations on the file per month

REST API

All operations on views are listed in the below table, with links to comprehensive description of appropriate requests and their parameters.

Request Link to API
Create view API
Get view API
Update view API
Remove view API
Update view reduce function API
Remove view reduce function API
List views API
Query view API

Mapping function examples

View based on single attribute

The example below presents a simple function which creates a view over a license extended attribute.

function (id, type, meta, ctx) {
    if(type === "custom_metadata"){
        if(meta['license']) {
            return [meta['license'], id];
        }
    }
}

View based on multiple attributes

It is possible to create custom views, based on multiple attribute fields, e.g.:

function(id, type, meta, ctx) {
    if(type === "custom_metadata"){
        if(meta['license'] && meta['year']) {
            return [[meta['license'], meta['year']], id];
        }
    }
}

View based on file name

The example below presents a function which can be used to create a view over file's name (name attribute of the file_meta model).

function (id, type, meta, ctx) {
    if(type === "file_meta"){
        if(meta['name']) {
            return [meta['name'], id];
        }
    }
}

View based on JSON metadata

In order to create views over user JSON metadata, the functions attribute path must start from onedata_json key, which is a special attribute which provides access to user-defined JSON document attached to a resource, e.g.:

function(id, type, meta, ctx) {
    if (type === "custom_metadata"){
        if(meta['onedata_json']['title']) {
            return [meta['onedata_json']['title'], id];
        }
    }
}

Spatial view with list of values as a key

The example below presents a function which can be used to create a spatial view over 2 extended attributes: 'jobPriority' and 'jobScheduleTime'. Such view can be queried for files with the attributes' values within range passed to the query.

function(id, type, meta, ctx) {
    if(type === "custom_metadata"){
        if (meta['jobPriority'] && meta['jobScheduleTime']){
            return [
                [meta['jobPriority'], meta['jobScheduleTime']], // key 
                id                                              // value 
            ];
        }
    }
}

Spatial view with list of ranges as a key

The example below presents a function which can be used to create a spatial view over ranges of 2 extended attributes: 'jobMaxExecutionTime' and 'jobMaxIterations'. Such view can be queried for files with the attributes' ranges within range passed to the query.

function(id, type, meta, ctx) {
    if(type === "custom_metadata"){
        if (meta['jobMaxExecutionTime'] && meta['jobMaxIterations']){
            return [
                [[0, meta['jobMaxExecutionTime']], [0, meta['jobMaxIterations']]], // key
                id                                                                 // value
            ];
        }
    }
}

Spatial view with GeoJSON as a key

The example below presents a function which returns a GeoJSON object as a key.

function(id, type, meta, ctx) {
    if(type === "custom_metadata"){
        if (meta['latitude'] && meta['longitude']){
            return [
                [{
                    "type": "Point",
                    "coordinates": [meta['latitude'], meta['longitude']]
                }],                                                         // key 
                id                                                          // value
            ];
        }
    }
}