It avoids DDOS attacks on the group of vaults managing a piece of data. Everything that is free will be abused by attackers. As GET commands are free, we can be sure that someday an army of bots will try to disrupt a section by using this operation. Imagine N1 clients controlled by an attacker, each client being able to issue N2 GET commands per second. Without caching the load on the target section is N1 * N2 requests to handle per second. With caching the load is removed from the target section and split among the vaults directly connected to the clients of the attacker, each one handling only N2 requests by second.
Popular data is returned more rapidly to clients. The result is the inverse of standard web sites on legacy internet for which the more popular will be the slower, due to the heavy load on the web site. On safe network this load is removed from the sections holding the data and split among the caching nodes.
The problem is that caching is applied only on Immutable Data and not on Mutable Data, nor on non-existing data.
Mutable data
At first glance it seems that caching cannot be applied on this kind of data, precisely because it is mutable so a copy of this kind of data in a vault may not be in phase with the source data anymore. But sometimes this data doesnât change very often (for example the directory containing the files of a static web site) and it is a waste of resource to fetch this kind of data from the source vault every time.
I propose to add mutable data in the cache of vaults and define a global expiration delay to the cached copies of mutable data, say for example 10 minutes.
Most of the time when browsing a site, the user doesnât need the latest information up to the minute. For very specific needs of more accurate information I propose a new kind of get that is not free: the user must pay a small fee to ensure that the request fetches the information from the source group storing the mutable data.
This way the DDOS attack is avoided in both cases:
With free get because data is cached
With paid gets because the small fee multiplied by the N1 * N2 requests per second is too costly to sustain for the attacker.
To be clear, what I propose is not 2 classes of mutable data but 2 kinds of get commands that can be applied on every mutable data and using free or paid gets is the choice of the user.
Remark: I am only talking of mutable data; immutable data remains free and DDOS attacks are mitigated with caching
Non-existing data
Another kind of DDOS attack is to request a chunk of data that doesnât exist. To mitigate it, the information that an address doesnât contain data should be cached. Not only this address but also the biggest range of free space containing it (to also mitigate an attacker targeting a group by trying millions/billions of addresses in a small range). This special structure is similar to a prefix.
The global expiration delay mentioned for mutable data is also applicable to these cached prefixes. The 2 kinds of gets are also applicable when a cached prefix indicates that data doesnât exist:
With free gets then fetch is stopped at the caching vault.
With paid gets fetch is resumed towards the group managing the address
Ah nice post, glad to see this discussion and a possible option abt MD caching, non-existing data info/error caching is also interesting. Ofc rate limiting and stuff if purely not via a single proxy for clients can also serve some options but all decent options to consider
hmm could vary a bit depending on the size of the MD itself ofc, but itâd ofc be saving the actual payload(entries map) if only returning the version. cost wise and caching pov, itâd be a bit more tricky cos caching would not even involve all the hops itâd require to get to the actual DM group for the object so would immediately be better, but ofc just getting a small amount of anything than a large amount is certainly going to be better to in that comparison.
I propose to add mutable data in the cache of vaults and define a global expiration delay to the cached copies of mutable data, say for example 10 minutes.
I assume that when that expiration delay starts and stops wonât/canât be synchronized between different vaults, because no concept of time in the Safe network? So not all caches have the same version at a certain time. Iâm not suggesting that it is a problem, I just want to understand the proposal better.
Ok, about âconcept of timeâ thatâs possible, I donât know the finer details.
But I should have thought a bit more before making the sync remark: if all vaults that have a cache of a certain MD would GET the new version at the same time, you maybe risk reintroducing a DDOS attack I thinkâŚ
Although 1 Vault with a cached copy can serve a lot of clients, so DDOS is reduced with a big factor.
The idea for this previously was (just for info). We have 2 cache types
Opportunistic Caching
This is currently what we do with immutable data or any self validating type that does not change through time. So a FIFO cache in each vault. It is very efficient.
Deterministic Caching
Where there is data that changes like mutable data then we must do our best to ensure a client receives the latest copy, however as you point out it means the data location must be known and the data cannot be just spread about the place in a disconnected manner.
Deterministic caching therefor is used when there is a large number of Gets on any mutable data chunk. The notion is that the group reply with the chunk, but the first neighbor group also caches it. This is registered with the origin group, so any changes mean the origin group also push those changes to the neighbors that also hold the data. When the neighbor group drops the data (as it is no longer popular) then the push of changes is ignored.
This is a summary, but I hope you can see the pattern. So spread data outwards from its location in a deterministic manner. As a group supplies the data, the neighbor group caches it and will supply the next request. The outside group holding the data will hold it until it cannot (cache is empty) and will tell itâs neighbor (closer to the origin). Its neighbor will push that chunk to the top of its fifo and the whole thing continues back to the origin, in time.
So deterministic cache is like a cache that grows but is always connected back to the origin group. The origin group will push updates to the outer group holding the data (possibly some of those in between, due to xor paths).
I hope you can understand this pattern, it is not in an RFC yet and should be, so please feel free anyone as there are details to work out. It is likely not required for launch but will be for the growth of the network.
This seems complex to implement with caching and my proposal doesnât try to do that. It just ensures that the data is not older than 10 minutes (or any other chosen value).
I think that the vast majority of get requests for mutable data concerns data that changes infrequently and it is a waste of resources to not cache these data chunks.
For the minority of cases where the user wants to be sure to get the latest value from the source vault a small fee must be paid, which should be negligible for the user but enough to deter DDOS attacks.
I agree, but we do need to be careful about the time (letâs call 10 mins T) We need to be sure we have got the data from a reputable source (as a vault I mean) and that it is not already more than T old. This is not as simple as it may sound if the data origin is âfar awayâ from us. Confirming that may cause as much traffic as a client asking for it, but probably much more traffic in reality.
Agreed, esp if the data is popular and also to stop section spamming etc.
For safecoin etc. and things we wish to have consensus on the latest then it would require a secondary scheme in your proposal.
I suppose the balance is in 2 schemes to handle mutable data caching, or 1 slightly more complex? Letâs keep probing this one as it would be magnificent to have a solid answer to it.
Thanks for sharing this! I assumed something was planned for MD but hadnât read details before.
Having reliable 2 way comms for cache management sounds like it gives some powerful options. I was actually discussing a similar solution to sync cascading caches on a project I was working on recently, which I find reassuring! We required that the caches were never stale too.
I am assuming that the design will need to be robust enough to handle situations where cache invalidation requests are not delivered? With continued development of the routing libraries, I assume there is much in place to ensure this is a rarity too?
Yes for sure this has to be via PARSEC consensus and secure message relay. A few details to iron out and RFC needed, but the pattern seems solid enough.
Have there been thoughts into whether MD writes will wait for caches to invalidate before being updated? This would slow writes, but result in caches never returning stale data. Perhaps having the option to wait or not would be useful foe app developers, as there will be use cases foe both.
This is an absolutely brilliant and elegant idea! I mean, the concept is perfect, i hope it is also implementable without detrimental drawbacks that are not known yet.
edit: Maybe just cache the mutable data like immutable data? If a cache hit happens then the user is served the cached version, and notified of this. Then the user can decide if/at what age they want to fetch the latest version. This way every paid fetch of original ârefreshesâ the cache for everyone else to be used for free.