Collections
When deciding on data structures it is important to understand their tradeoffs. Choosing the wrong structure can create a bottleneck as the application scales, and migrating the state to the new data structures will come at a cost.
You can choose between two types of collections:
- Native collections (e.g.
Array
,Map
,Set
), provided by the the language - SDK collections (e.g.
IterableMap
,Vector
), provided by the NEAR SDK
Understanding how the contract stores and loads both types of collections is crucial to decide which one to use.
Use native collections for small amounts of data that need to be accessed all together, and SDK collections for large amounts of data that do not need to be accessed all together.
If your collection has up to 100 entries, it's acceptable to use the native collection. For larger ones, prefer to use SDK collection. For comparison please refer to this benchmark.
How the State is Handled
Each time the contract is executed, the first thing it will do is to read the values and deserialize them into memory, and after the function finishes, it will serialize and write the values back to the database.
For native collections, the contract will fully load the collection into memory before any method executes. This happens even if the method you invoke does not use the collection. Know that this will have impact on GAS you spend for methods in your contract.
Native Collectionsโ
Native collections are those provided by the language:
- JS:
Array
,Set
,Map
,Object
... - Rust:
Vector
,HashMap
,Set
...
All entries in a native collection are serialized into a single value and stored together into the state. This means that every time a function execute, the SDK will read and deserialize all entries in the native collection.
Serialization & Storage Example
The array [1,2,3,4]
will be serialized into the JSON string "[1,2,3,4]"
in Javascript, and the Borsh byte-stream [0,0,0,4,1,2,3,4]
in Rust before being stored
Native collections are useful if you are planning to store smalls amounts of data that need to be accessed all together
As the native collection grows, deserializing it from memory will cost more and more gas. If the collections grows too large, your contract might expend all the gas trying to read its state, making it fail on each function call
SDK Collectionsโ
The NEAR SDKs expose collections that are optimized for random access of large amounts of data. SDK collections are instantiated using a "prefix", which is used as an index to split the data into chunks. This way, SDK collections can defer reading and writing to the store until needed.
These collections are built to have an interface similar to native collections.
Serialization & Storage Example
The sdk array [1,2,3,4]
with prefix "p"
will be stored as the string "p"
in the contract's attribute, and create four entries in the contract's storage: p-0:1
, p-1:2
...
SDK collections are useful when you are planning to store large amounts of data that do not need to be accessed all together
Exposed Collectionsโ
- ๐ JavaScript
- ๐ฆ Rust
- ๐ฆ Rust (legacy)
SDK Collection | Native Equivalent | Description |
---|---|---|
Vector | Array | A growable array type. The values are sharded in memory and can be used for iterable and indexable values that are dynamically sized. |
LookupSet | Set | A set, which is similar to LookupMap but without storing values, can be used for checking the unique existence of values. This structure is not iterable and can only be used for lookups. |
UnorderedSet | Set | An iterable equivalent of LookupSet which stores additional metadata for the elements contained in the set. |
LookupMap | Map | This structure behaves as a thin wrapper around the key-value storage available to contracts. This structure does not contain any metadata about the elements in the map, so it is not iterable. |
UnorderedMap | Map | Similar to LookupMap , except that it stores additional data to be able to iterate through elements in the data structure. |
SDK collection | std ย equivalent | Description |
---|---|---|
store::Vector<T> | Vec<T> | A growable array type. The values are sharded in memory and can be used for iterable and indexable values that are dynamically sized. |
store::LookupMap | HashMap | This structure behaves as a thin wrapper around the key-value storage available to contracts. This structure does not contairn any metadata about the elements in the map, so it is not iterable. |
store::IterableMap | HashMap | Similar to LookupMap , except that it stores additional data to be able to iterate through elements in the data structure. |
store::UnorderedMap | HashMap | Similar to LookupMap , except that it stores additional data to be able to iterate through elements in the data structure. |
store::LookupSet<T> | HashSet<T> | A set, which is similar to LookupMap but without storing values, can be used for checking the unique existence of values. This structure is not iterable and can only be used for lookups. |
store::IterableSet<T> | HashSet<T> | An iterable equivalent of LookupSet which stores additional metadata for the elements contained in the set. |
store::UnorderedSet<T> | HashSet<T> | An iterable equivalent of LookupSet which stores additional metadata for the elements contained in the set. |
The near_sdk::collections
is now deprecated in favor of near_sdk::store
. To use near_sdk::collections
you will have to use the legacy
feature.
SDK collection | std ย equivalent | Description |
---|---|---|
collections::Vector<T> | Vec<T> | A growable array type. The values are sharded in memory and can be used for iterable and indexable values that are dynamically sized. |
collections::LookupMap | HashMap | This structure behaves as a thin wrapper around the key-value storage available to contracts. This structure does not contairn any metadata about the elements in the map, so it is not iterable. |
collections::UnorderedMap | HashMap | Similar to LookupMap , except that it stores additional data to be able to iterate through elements in the data structure. |
collections::TreeMap | BTreeMap | An ordered equivalent of UnorderedMap . The underlying implementation is based on an AVL tree. This structure should be used when a consistent order is needed or accessing the min/max keys is needed. |
collections::LookupSet<T> | HashSet<T> | A set, which is similar to LookupMap but without storing values, can be used for checking the unique existence of values. This structure is not iterable and can only be used for lookups. |
collections::UnorderedSet<T> | HashSet<T> | An iterable equivalent of LookupSet which stores additional metadata for the elements contained in the set. |
collections::LazyOption<T> | Option<T> | Optional value in storage. This value will only be read from storage when interacted with. This value will be Some<T> when the value is saved in storage, and None if the value at the prefix does not exist. |
Featuresโ
Type | Iterable | Clear All Values | Preserves Insertion Order | Range Selection |
---|---|---|---|---|
Vector | โ | โ | โ | โ |
LookupSet | ||||
UnorderedSet | โ | โ | โ | |
IterableSet | โ | โ | โ | |
LookupMap | ||||
UnorderedMap | โ | โ | โ | |
IterableMap | โ | โ | โ | |
TreeMap | โ | โ | โ | โ |
Complexityโ
Type | Access | Insert | Delete | Search | Traverse | Clear |
---|---|---|---|---|---|---|
Vector | O(1) | O(1)* | O(1)** | O(n) | O(n) | O(n) |
LookupSet | O(1) | O(1) | O(1) | O(1) | N/A | N/A |
UnorderedSet | O(1) | O(1) | O(1) | O(1) | O(n) | O(n) |
IterableSet | O(1) | O(1) | O(1) | O(1) | O(n) | O(n) |
LookupMap | O(1) | O(1) | O(1) | O(1) | N/A | N/A |
IterableMap | O(1) | O(1) | O(1) | O(1) | O(n) | O(n) |
TreeMap | O(1) | O(log n) | O(log n) | O(log n) | O(n) | O(n) |
* - to insert at the end of the vector using push_back
(or push_front
for deque)
** - to delete from the end of the vector using pop
(or pop_front
for deque), or delete using swap_remove
which swaps the element with the last element of the vector and then removes it.
SDK Collections Cookbookโ
Let's see how to use the SDK collections in practice
Instantiationโ
All structures need to be initialized using a unique prefix
, which will be used to index the collection's values in the account's state
- ๐ JavaScript
- ๐ฆ Rust
- ๐ฆ Rust (legacy)
Loading...
Do not forget to use the schema
to define how your contract's state is structured
Loading...
Notice how we use enums
to ensure all collections have a different prefix. Another advantage of using enums
is that they are serialized into a single byte
prefix.
Loading...
Notice how we use enums
to ensure all collections have a different prefix. Another advantage of using enums
is that they are serialized into a single byte
prefix.
Be careful of not using the same prefix in two collections, otherwise, their storage space will collide, and you might overwrite information from one collection when writing in the other
Vectorโ
Implements a vector/array which persists in the contract's storage. Please refer to the Rust and JS SDK's for a full reference on their interfaces.
- ๐ JavaScript
- ๐ฆ Rust
- ๐ฆ Rust (legacy)
Loading...
Loading...
Loading...
LookupMapโ
Implements a map/dictionary which persists in the contract's storage. Please refer to the Rust and JS SDK's for a full reference on their interfaces.
- ๐ JavaScript
- ๐ฆ Rust
Loading...
Loading...
UnorderedMap / IterableMapโ
Implements a map/dictionary which persists in the contract's storage. Please refer to the Rust and JS SDK's for a full reference on their interfaces.
- ๐ JavaScript
- ๐ฆ Rust
Loading...
Loading...
LookupSetโ
Implements a set which persists in the contract's storage. Please refer to the Rust and JS SDK's for a full reference on their interfaces.
- ๐ JavaScript
- ๐ฆ Rust
Loading...
Loading...
UnorderedSet / IterableSetโ
Implements a map/dictionary which persists in the contract's storage. Please refer to the Rust and JS SDK's for a full reference on their interfaces.
- ๐ JavaScript
- ๐ฆ Rust
Loading...
Loading...
Treeโ
An ordered equivalent of Map. The underlying implementation is based on an AVL. You should use this structure when you need to: have a consistent order, or access the min/max keys.
- ๐ฆ Rust
Loading...
LazyOption (Legacy)โ
LazyOptions are great to store large values (i.e. a wasm file), since its value will not be read from storage until it is interacted with.
It acts like an Option
that can either hold a value or not and also requires a unique prefix (a key in this case)
like other persistent collections.
Compared to other collections, LazyOption
only allows you to initialize the value during initialization.
Nesting Collectionsโ
When nesting SDK collections, be careful to use different prefixes for all collections, including the nested ones.
- ๐ JavaScript
- ๐ฆ Rust
Loading...
Loading...
Notice how we use enums
that take a String
argument to ensure all collections have a different prefix
Error prone patternsโ
Because the values are not kept in memory and are lazily loaded from storage, it's important to make sure if a collection is replaced or removed, that the storage is cleared. In addition, it is important that if the collection is modified, the collection itself is updated in state because most collections will store some metadata.
Some error-prone patterns to avoid that cannot be restricted at the type level are:
- ๐ฆ Rust
use near_sdk::store::UnorderedMap;
let mut m = UnorderedMap::<u8, String>::new(b"m");
m.insert(1, "test".to_string());
assert_eq!(m.len(), 1);
assert_eq!(m.get(&1), Some(&"test".to_string()));
// Bug 1: Should not replace any collections without clearing state, this will reset any
// metadata, such as the number of elements, leading to bugs. If you replace the collection
// with something with a different prefix, it will be functional, but you will lose any
// previous data and the old values will not be removed from storage.
m = UnorderedMap::new(b"m");
assert!(m.is_empty());
assert_eq!(m.get(&1), Some(&"test".to_string()));
// Bug 2: Should not use the same prefix as another collection
// or there will be unexpected side effects.
let m2 = UnorderedMap::<u8, String>::new(b"m");
assert!(m2.is_empty());
assert_eq!(m2.get(&1), Some(&"test".to_string()));
// Bug 3: forgetting to save the collection in storage. When the collection is attached to
// the contract state (`self` in `#[near]`) this will be done automatically, but if
// interacting with storage manually or working with nested collections, this is relevant.
use near_sdk::store::Vector;
// Simulate roughly what happens during a function call that initializes state.
{
let v = Vector::<u8>::new(b"v");
near_sdk::env::state_write(&v);
}
// Simulate what happens during a function call that just modifies the collection
// but does not store the collection itself.
{
let mut v: Vector<u8> = near_sdk::env::state_read().unwrap();
v.push(1);
// The bug is here that the collection itself if not written back
}
let v: Vector<u8> = near_sdk::env::state_read().unwrap();
// This will report as if the collection is empty, even though the element exists
assert!(v.get(0).is_none());
assert!(
near_sdk::env::storage_read(&[b"v".as_slice(), &0u32.to_le_bytes()].concat()).is_some()
);
// Bug 4 (only relevant for `near_sdk::store`): These collections will cache writes as well
// as reads, and the writes are performed on [`Drop`](https://doc.rust-lang.org/std/ops/trait.Drop.html)
// so if the collection is kept in static memory or something like `std::mem::forget` is used,
// the changes will not be persisted.
use near_sdk::store::IterableSet;
let mut m: IterableSet<u8> = IterableSet::new(b"l");
m.insert(1);
assert!(m.contains(&1));
// This would be the fix, manually flushing the intermediate changes to storage.
// m.flush();
std::mem::forget(m);
m = IterableSet::new(b"l");
assert!(!m.contains(&1));
Nesting Errorsโ
By extension of the error-prone patterns to avoid mentioned in the collections section, it is important to keep in mind how these bugs can easily be introduced into a contract when using nested collections.
Some issues for more context:
The following cases are the most commonly encountered bugs that cannot be restricted at the type level (only relevant for near_sdk::collections
, not near_sdk::store
):
use near_sdk::borsh::{self, BorshSerialize};
use near_sdk::collections::{LookupMap, UnorderedSet};
use near_sdk::BorshStorageKey;
#[derive(BorshStorageKey, BorshSerialize)]
pub enum StorageKey {
Root,
Nested(u8),
}
// Bug 1: Nested collection is removed without clearing it's own state.
let mut root: LookupMap<u8, UnorderedSet<String>> = LookupMap::new(StorageKey::Root);
let mut nested = UnorderedSet::new(StorageKey::Nested(1));
nested.insert(&"test".to_string());
root.insert(&1, &nested);
// Remove inserted collection without clearing it's sub-state.
let mut _removed = root.remove(&1).unwrap();
// This line would fix the bug:
// _removed.clear();
// This collection will now be in an inconsistent state if an empty UnorderedSet is put
// in the same entry of `root`.
root.insert(&1, &UnorderedSet::new(StorageKey::Nested(1)));
let n = root.get(&1).unwrap();
assert!(n.is_empty());
assert!(n.contains(&"test".to_string()));
// Bug 2: Nested collection is modified without updating the collection itself in the outer collection.
//
// This is fixed at the type level in `near_sdk::store` because the values are modified
// in-place and guarded by regular Rust borrow-checker rules.
root.insert(&2, &UnorderedSet::new(StorageKey::Nested(2)));
let mut nested = root.get(&2).unwrap();
nested.insert(&"some value".to_string());
// This line would fix the bug:
// root.insert(&2, &nested);
let n = root.get(&2).unwrap();
assert!(n.is_empty());
assert!(n.contains(&"some value".to_string()));
Paginationโ
Persistent collections such as IterableMap/UnorderedMap
, IterableSet/UnorderedSet
and Vector
may
contain more elements than the amount of gas available to read them all.
In order to expose them all through view calls, we can use pagination.
- ๐ JavaScript
- ๐ฆ Rust
With JavaScript this can be done using iterators with toArray
and slice
.
/// Returns multiple elements from the `UnorderedMap`.
/// - `from_index` is the index to start from.
/// - `limit` is the maximum number of elements to return.
@view({})
get_updates({ from_index, limit }: { from_index: number, limit:number }) {
return this.status_updates.toArray().slice(from_index, limit);
}
With Rust this can be done using iterators with Skip
and Take
. This will only load elements from storage within the range.
#[near(contract_state)]
#[derive(PanicOnDefault)]
pub struct Contract {
pub status_updates: IterableMap<AccountId, String>,
}
#[near]
impl Contract {
/// Retrieves multiple elements from the `IterableMap`.
/// - `from_index` is the index to start from.
/// - `limit` is the maximum number of elements to return.
pub fn get_updates(&self, from_index: usize, limit: usize) -> Vec<(AccountId, String)> {
self.status_updates
.iter()
.skip(from_index)
.take(limit)
.collect()
}
}
Storage Costโ
Your contract needs to lock a portion of their balance proportional to the amount of data they stored in the blockchain. This means that:
- If more data is added and the storage increases โ, then your contract's balance decreases โ.
- If data is deleted and the storage decreases โ, then your contract's balance increases โ.
Currently, it cost approximately 1 โ to store 100kb of data.
You can save on smart contract storage if using NEAR Account IDs by encoding them using base32. Since they consist of [a-z.-_]
characters with a maximum length of 64 characters, they can be encoded using 5 bits per character, with terminal \0
. Going to a size of 65 * 5 = 325 bits from the original (64 + 4) * 8 = 544 bits. This is a 40% reduction in storage costs
Your contract will panic if you try to store data but don't have NEAR to cover its storage cost
Be mindful of potential small deposit attacks
Storage Constraints on NEARโ
For storing data on-chain itโs important to keep in mind the following:
- There is a 4mb limit on how much you can upload at once
Letโs say for example, someone wants to put an NFT purely on-chain (rather than IPFS or some other decentralized storage solution) youโll have almost an unlimited amount of storage but will have to pay 1 $NEAR per 100kb of storage used.
Users will be limited to 4MB per contract call upload due to MAX_GAS constraints. The maximum amount of gas one can attach to a given functionCall is 300TGas.