# File popularity
As a prerequisite for understanding this chapter, we recommend that you familiarize yourself with the concept of views.
The file popularity mechanism enables tracking of usage statistics for files in a space. It allows listing File IDs sorted in ascending order by the popularity function, so that the least popular files are at the beginning of the list.
NOTE: Usage statistics can be collected only for local storage supporting the space. It is impossible to obtain file popularity statistics gathered by a remote provider.
The mechanism can be enabled for chosen space in the Spaces -> "Space Name" -> File popularity
tab,
in the Oneprovider panel GUI (as shown below), or using the REST API.
Internally, the mechanism creates the file popularity view. All notes presented in the Views chapter apply also to the file popularity view.
NOTE: The file popularity view is a special view, therefore it is forbidden to create a view with such name. Furthermore, it is forbidden and impossible to modify or delete the view using Views API.
# Querying the file popularity view
The file popularity view can be queried using the following request:
curl -sS -k -H "X-Auth-Token:$TOKEN" -X GET https://$HOST/api/v3/oneprovider/spaces/$SPACE_ID/views/file-popularity/query
An example of such a request is presented in the file popularity configuration tab of Onepanel GUI. The example request returns the 10 least popular files in the space.
For more information on querying views, see here.
# Advanced topics
# The popularity function
The key that is emitted to the file popularity view is the value of the *popularity function for a given file. The function is defined as follows:
P(lastOpenHour, avgOpenCountPerDay) = w1 * lastOpenHour + w2 * min(avgOpenCountPerDay, MAX_AVG_OPEN_COUNT_PER_DAY)
where:
lastOpenHour
— parameter which is equal to the timestamp (in hours since 01.01.1970) of the last open operation on the file,w1
— the weight of thelastOpenHour
parameter,avgOpenCountPerDay
— parameter equal to the moving average of the number of open operations on the file per day — the value is calculated over the last 30 days,w2
— weight of theavgOpenCountPerDay
parameter,MAX_AVG_OPEN_COUNT_PER_DAY
— upper boundary for theavgOpenCountPerDay
parameter.
Entries in the views are modified only when the associated document in the database is
modified. It means that an entry in the file popularity view is modified only when the
file popularity model document is updated, which happens on each file close
operation. The downside of this approach is that the avgOpenCountPerDay
may not be
recalculated in certain circumstances and the file may be indexed as “popular” forever,
contrary to the actual popularity. This is possible when the file has been intensively
used for some time but hasn't been opened since then so that no recalculation could be
triggered to update its popularity. This is why the lastOpenHour
parameter is used in
the popularity function — to balance the importance of avgOpenCountPerDay
parameter.
# Default parameters
The default values of the file popularity view are as follows:
w1 = 1.0
w2 = 20.0
MAX_AVG_OPEN_COUNT_PER_DAY = 100
The default value of MAX_AVG_OPEN_COUNT_PER_DAY
makes all files with avgOpenCountPerDay > 100
be treated as equally popular.
The above values of w1
and w2
cause the below two files to have similar calculated popularity:
- a file that has been opened just once
- a file that had been opened about 1000 times in the month preceding the last open and the open was performed a month before opening the former file
These weights were estimated using the following approach:
Assume that we have 2 files: F1 and F2.
F1 was opened at timestamp (in hours) T1.
F1 - lastOpenHour1 = T1
- number of opens in the month preceding last open: opensCount1 = 1
- avgOpenCountPerDay1 = avg1 = opensCount1 / 30 = 1 / 30
F2 was opened a month earlier than T1 for the last time.
F2 - lastOpenHour2 = T2 = T1 - 30 * 24
- number of opens in the month preceding last open: opensCount2 = 1000
- avgOpenCountPerDay2 = avg2 = opensCount2 / 30 = 1000 / 30
Calculate popularity for both files:
P1 = P(lastOpenHour1, avgOpenCountPerDay1)
P1 = w1 * T1 + w2 * min(avg1, 100)
P1 = w1 * T1 + w2 * avg1
P2 = P(lastOpenHour2, avgOpenCountPerDay2)
P2 = w1 * T2 + w2 * min(avg2, 100)
P2 = w1 * T2 + w2 * avg2
P2 = w1 * (T1 - 720) + w2 * avg2
We want P1 = P2:
w1 * T1 + w2 * avg1 = w1 * (T1 - 720) + w2 * avg2
w1 * T1 + w2 * avg1 = w1 * T1 - w1 * 720 + w2 * avg2
w1 * 720 = w2 * (avg2 - avg1)
w1 / w2 = (avg2 - avg1)/720
w1 / w2 = 999 / 21600
We can set w1 := 1 and therefore we have:
w2 = 21600 / 999 ~= 21,62
Finally, to make it simpler, we set:
w1 := 1.0
w2 := 20.0
# Tuning the file popularity function
The three parameters of the function: w1
, w2
, and MAX_AVG_OPEN_COUNT_PER_DAY
can be
modified in the file popularity configuration panel.
NOTE: Modification of the popularity function parameters results in modification of the mapping function of the file popularity view. It means that all already indexed files need to be re-indexed. Such operation can be very time-consuming, depending on the number of the files in the space.
NOTE: The same notice applies to disabling/enabling the mechanism. Disabling the view results in its deletion, therefore re-enabling the view results in re-indexing of all files in the space.
MODIFICATION OF THE POPULARITY FUNCTION MUST BE PERFORMED WITH CARE!!!
# REST API
All operations related to file popularity can be performed using the REST API. Refer to the linked API documentation for detailed information and examples.
Request | Link to API |
---|---|
Get file popularity configuration | API (opens new window) |
Update file popularity configuration | API (opens new window) |