Retrieval Data Structure
From Wikipedia, the free encyclopedia
In computer science, a retrieval data structure, also known as static function, is a space-efficient dictionary-like data type composed of a collection of (key, value) pairs that allows the following operations:[1]
- Construction from a collection of (key, value) pairs
- Retrieve the value associated with the given key or anything if the key is not contained in the collection
- Update the value associated with a key (optional)
They can also be thought of as a function for a universe and the set of keys where retrieve has to return for any value and an arbitrary value from otherwise.
In contrast to static functions, AMQ-filters support (probabilistic) membership queries and dictionaries additionally allow operations like listing keys or looking up the value associated with a key and returning some other symbol if the key is not contained.
As can be derived from the operations, this data structure does not need to store the keys at all and may actually use less space than would be needed for a simple list of the key value pairs. This makes it attractive in situations where the associated data is small (e.g. a few bits) compared to the keys because we can save a lot by reducing the space used by keys.
To give a simple example suppose video game names annotated with a boolean indicating whether the game contains a dog that can be petted are given. A static function built from this database can reproduce the associated flag for all names contained in the original set and an arbitrary one for other names. The size of this static function can be made to be only bits for a small which is obviously much less than any pair based representation.[1]
Given a set of key-value pairs, where each value is bits, a retrieval data structure that uses bits is said to have redundancy . An ideal retrieval data structure should have small redundancy, while supporting fast retrieval. [2]
In the static setting, where the only operations are Construct and Retrieve, it is possible to construct solutions with redundancy .[2][3] However, depending on the parameter regime, it is not always possible to achieve such a small redundancy while also supporting constant-time retrieval.[4] For example, if and assuming machine words of length bits, any solution with constant-time retrieval queries must incur redundancy .[4]
The optimal redundancy changes significantly if one considers non-static versions of the problem.
- In the value-dynamic retrieval problem, where the data structure must support an Update operation, the optimal redundancy is no matter the value of and regardless of time efficiency.[5] Moreover, when , it is believed that the optimal redundancy, up to low order terms, is achieved by a minimal perfect hash function.[5]
- In the dynamic retrieval problem, the data structure must support Insert and Delete operations on the set of key-value pairs, with a maximum capacity on the size of the set at any given moment. Assuming keys from a large polynomial-size universe, the optimal redundancy for this version of the problem is bits, even if .[6][7] Moreover, if the capacity constraint of is removed, then it is still possible to achieve redundancy , where is the largest size the set has had so far.[8]
- In the incremental retrieval problem, the data structure must support Insert operations on the set of key-value pairs, with a maximum capacity on the size of the set. In this setting, the optimal redundancy is bits, as in the static setting.[7]

