# Views
Onedata supports creation of custom database views for indexing file metadata. They can be used for:
- efficient querying for files
- producing tables and lists of information based on file metadata
- extracting or filtering information from file metadata
- calculating, summarizing or reducing the information from file metadata
Views are a result of continuous indexing of documents. Documents are mapped using a user-defined mapping function. Optionally, results of the mapping can also be reduced using a reduce function if it is provided by the user. Internally, views are based on Couchbase Views (opens new window). Please visit this site for more detailed explanation of concepts used within this documentation.
There are two types of views that can be created:
- map-reduce views (opens new window) — a perspective on the data stored in a database in a format that can be used to represent the data in a specific way, define and filter the information, and provide a basis for searching or querying the data in the database based on the content.
- spatial views (opens new window) — spatial views are similar to map-reduce views. They are suited for querying multi-dimensional data. The main difference is that they don't have a reduce function.
Currently, views can be created on the following models storing file metadata:
# Concepts
# Mapping function
All information presented in this chapter is relevant to both map-reduce and spatial views. Function used by spatial views is called as a spatial in Couchbase documentation. For simplicity, in this documentation, the mapping name is used for both terms, as they must comply with the same rules (with one exception, emphasised below).
In order to create a view, it is necessary to provide a mapping function in JavaScript.
It is used to map the data stored in a document to the value which should be indexed.
Mapping is performed using emit()
function. Each call to emit()
results in a new row of data in the view result.
More information on mapping functions can be found here (opens new window).
In the views API, the mapping function submitted by the user is wrapped inside additional Javascript code, in order to comply with Couchbase API.
The mapping function must accept 4 arguments:
id
— ID of the file (string)type
— type of the document that is being mapped by the function, one of:"file_meta"
"times"
"custom_metadata"
"file_popularity"
meta
— values stored in the document being mapped (formats are described further on)ctx
— additional information that might be helpful during indexing:providerId
spaceId
ctx = {
"providerId": "b705b0f664645a2c69934d9043b33c2207228257",
"spaceId": "86b1daff220b40c235e6af1c235769a4ec4fe91a"
}
NOTE: The mapping function will be called for each file-related document (as listed in the
type
argument above). For example,emit()
will be called separately for the same file when its name changes (file_meta
), its content is modified (times
) and an extended attributes is set (custom_metadata
). It is important to consider the type of the indexed document to avoid duplicate mappings.
The mapping function must return (key, value)
pair or pairs that are to be emitted
to the view via emit()
function.
If one document shall be mapped to exactly one row in the view, the mapping
function must return a 2-element list [key, value]
, where key and value can be any JS term.
If one document shall be mapped to many rows in the view, the mapping
function must return an object with the key 'list'
. The value must be
a list of 2-element lists [key, value]
. The emit()
function is called for
each 2-element list in the top-level list.
Valid formats of the mapping function are presented below. key
and value
can be any valid JSON objects:
- returning a single view row
function (id, type, meta, ctx) { var key = ... var value = ... return [key, value]; }
- returning multiple view rows
function (id, type, meta, ctx) { var key1 = ... var value1 = ... var key2 = ... var value2 = ... . . . var keyN = ... var valueN = ... return {'list': [ [key1, value1], [key2, value2], . . . [keyN, valueN] ]}; }
A few examples of the mapping function are presented here.
# Spatial view key format
The mapping function defined for a spatial view must return the key as a multidimensional bounding box. There are 3 accepted ways of defining a key in a spatial function:
- single values — list of numerical values, which is expanded to a collapsed range.
For example, list
[1.0, 2, 3.5]
is internally expanded to list of ranges[[1.0, 1.0], [2 , 2], [3.5, 3.5]]
- ranges — list of ranges. For example:
[[1.0, 2.0], [100, 1000]]
- GeoJSON geometry — the following GeoJSON objects are supported:
- Point
- MultiPoint
- LineString
- MultiLineString
- MultiPolygon
- GeometryCollection
Above formats of defining keys might be combined. The only constraint is that GeoJSON object must be the first element of the list. Defining spatial view keys is thoroughly described here (opens new window).
# Reduce function (optional)
Reduce function is optional. It can be used only for map-reduce views. Typical uses for a reduce function are to produce a summarized count of the input data, or to provide sum or other calculations on the input data.
Contrary to the mapping function, the reduce function is not wrapped by any additional Javascript code. It is passed as it is to the Couchbase and therefore all information and notices presented here (opens new window) are relevant, in particular:
- built-in reduce functions:
- writing custom reduce functions (opens new window)
# Indexable metadata models
# File meta model
Indexed by the emit(id, type, meta, ctx)
function where type === "file_meta"
.
Model that stores basic file metadata:
name
— name of the filetype
— type of the file. One of: regular file (REG
), directory (DIR
)mode
— POSIX access mode as a decimal integeracl
— access control listowner
— ID of an owner of the fileprovider_id
— ID of a provider on which the file was createddeleted
— flag informing that file was marked to be deleted- other fields that are hardly useful in views:
shares
,is_scope
,parent_uuid
file_meta = {
"name": "results.txt",
"type": "REG",
"mode": 436, // "0664" in octal
"acl": [{
"acetype": 0,
"aceflags": 0,
"identifier": "bd3ae5725fd0348d8fbd97beafd5d3d1f23e1fb6",
"name": "OWNER@",
"acemask": 0
}],
"owner": "191417070ff9f0ad36d99065c9034b23d1ca799e",
"provider_id": "bbdeee4b71842378f7834a12ddf04b68dd61d1c1",
"deleted": false
}
# Times model
Indexed by the emit(id, type, meta, ctx)
function where type === "times"
.
This model was extracted from the file_meta
due to efficiency reasons.
It stores classical Unix timestamps (in seconds since Epoch):
atime
— Unix last access timestampmtime
— Unix last modification timestampctime
— Unix last status timestamp
times = {
"atime": 1582374672,
"mtime": 1582374672,
"ctime": 1582374672
}
# Custom metadata model
Indexed by the emit(id, type, meta, ctx)
function where type === "custom_metadata"
.
Model used for storing extended attributes and custom metadata. Currently, views can operate on both extended attributes as well as JSON metadata, RDF metadata backend indexing is not yet supported. The model has the following fields:
onedata_json
— map of JSON metadata valuesonedata_rdf
— RDF metadata in plain text- extended attributes set by users — a key-value map on the top level of the object
custom_metadata = {
"onedata_json": {
"country": "FR",
"year": 2020
},
"onedata_rdf": "<?xml version=\"1.0\"?>\n\n<rdf:RDF\nxmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\nxmlns:si=\"https://www.w3schools.com/rdf/\">\n\n<rdf:Description rdf:about=\"https://www.w3schools.com\">\n <si:title>W3Schools</si:title>\n <si:author>Jan Egil Refsnes</si:author>\n</rdf:Description>\n\n</rdf:RDF>",
"colour": "red",
"licence": "MIT"
}
# File popularity model
Indexed by the emit(id, type, meta, ctx)
function where type === "file_popularity"
.
Model used for tracking file popularity.
These documents are available only if collecting file popularity statistics is
enabled in the given space by a Oneprovider admin.
It can be turned on only by space admin via Onepanel.
The file popularity document is available only for files which have been opened at least once on a given provider.
It stores:
size
— total sum of the file's blocks stored on given provideropen_count
— number ofopen
operations on the filelast_open
— timestamp fo lastopen
on the filehr_hist
— hourly histogram of number ofopen
operations on the file per hour, in the last 24 hours, represented as a list of 24 integersdy_hist
— daily histogram of number ofopen
operations on the file per day, in the last 30 days, represented as a list of 30 integersmth_hist
— monthly histogram of number ofopen
operations on the file per month, in the last 12 months, represented as a list of 12 integershr_mov_avg
— moving average of number ofopen
operations on the file per hourdy_mov_avg
— moving average of number ofopen
operations on the file per daymth_mov_avg
— moving average of number ofopen
operations on the file per month
file_popularity = {
"size": 102423541,
"open_count": 123,
"last_open": 1601889434,
"hr_hist": [15, 0, 0, 0, 0, 1, 25, 125, 11, 3, 4, 0, 0, 0, 11, 0, 1, 0, 13, 0, 0, 0, 2, 1],
"dy_hist": [212, 0, 0, 10, 0, 0, 10, 0, 150, 0, 0, 0, 1250, 0, 20, 0, 40, 280, 30, 0, 10, 110, 0, 0, 90, 250, 110, 130, 110, 0],
"mth_hist": [2812, 300, 0, 3750, 0, 3300, 0, 0, 2700, 0, 0, 0],
"hr_mov_avg": 8.83,
"dy_mov_avg": 93.73,
"mth_mov_avg": 1071.83
}
# REST API
All operations on views can be performed using the REST API. Refer to the linked API documentation for detailed information and examples.
Request | Link to API |
---|---|
Create view | API (opens new window) |
Get view | API (opens new window) |
Update view | API (opens new window) |
Remove view | API (opens new window) |
Update view reduce function | API (opens new window) |
Remove view reduce function | API (opens new window) |
List views | API (opens new window) |
Query view | API (opens new window) |
# Mapping function examples
# View based on single attribute
Index files by the value of licence
field:
function (id, type, meta, ctx) {
if(type === "custom_metadata"){
if(meta['license']) {
return [meta['license'], id];
}
}
}
# View based on multiple attributes
Index files by values of licence
and year
fields:
function(id, type, meta, ctx) {
if(type === "custom_metadata"){
if(meta['license'] && meta['year'] != null) {
return [[meta['license'], meta['year']], id];
}
}
}
# View based on file name
Index files by their names (name
attribute of the file_meta
model):
function (id, type, meta, ctx) {
if(type === "file_meta"){
if(meta['name']) {
return [meta['name'], id];
}
}
}
# View based on JSON metadata
Index files by values of title
field stored in user-defined JSON custom metadata:
function(id, type, meta, ctx) {
if (type === "custom_metadata"){
if(meta['onedata_json']['title']) {
return [meta['onedata_json']['title'], id];
}
}
}
# Spatial view with list of values as a key
Create a spatial view over two extended attributes: 'jobPriority'
and 'jobScheduleTime'
.
Such view can be queried for files with the attributes' values within range passed to the query.
function(id, type, meta, ctx) {
if(type === "custom_metadata"){
if (meta['jobPriority'] != null && meta['jobScheduleTime'] != null){
return [
[meta['jobPriority'], meta['jobScheduleTime']], // key
id // value
];
}
}
}
# Spatial view with list of ranges as a key
Create a spatial view over ranges of two extended attributes: 'jobMaxExecutionTime'
and 'jobMaxIterations'
.
Such view can be queried for files with the attributes' ranges within range passed to the query.
function(id, type, meta, ctx) {
if(type === "custom_metadata"){
if (meta['jobMaxExecutionTime'] != null && meta['jobMaxIterations'] != null){
return [
[[0, meta['jobMaxExecutionTime']], [0, meta['jobMaxIterations']]], // key
id // value
];
}
}
}
# Spatial view with GeoJSON as a key
Create a view which has a GeoJSON object as a key.
function(id, type, meta, ctx) {
if(type === "custom_metadata"){
if (meta['latitude'] != null && meta['longitude'] != null){
return [
[{
"type": "Point",
"coordinates": [meta['latitude'], meta['longitude']]
}], // key
id // value
];
}
}
}