Permanent data

spenc · December 16, 2024, 11:01pm

But, if an attacker wanted one specific file to stay offline, would they be able to hammer that file with requests? Depending on how the protocol works this could be extremely asymmetric, ie. if it takes only 100 bytes to request the file and then the server has to send 10 MB back.

Roger - I feel like a better example is backblaze backup. With their service you pay $8 a month for infinite storage. How it works out for them is they price it to be profitable at ~ 1TB, most use less and that subsidizes those that use more. And apple doesn’t have any ongoing costs to the store once you buy it but hosting a file (and providing unlimited? access to reads) definitely does.

I’m still concerned about this. A normal cloud architecture would want the user to add a caching layer to keep reads down, but since reads are free? then there’s no incentive for app developers to do that and I’m worried that could overload the network. If not in cost, at the very least in availability.

neo · December 16, 2024, 11:06pm

They could try but in the end more nodes will be holding the file and retrieving the chunks.

Also it will only be for a limited time as its costly to run a bot farm to do this. The file requests would have to be fast to disallow other requests to be serviced

If this ever proves to be a problem then the current caching algo can be tuned to improve the situation. For instance it could have every node in the routing path to the target node could start requesting the chunk to keep in cache and serve it up too. Thus 10’s of thousands of nodes could end up caching the chunks while the DDOS is on

Traktion · December 17, 2024, 9:18am

Maybe we should focus on the goal of a potential DDoS attack.

If the goal is to take a chunk or file offline, this should be very difficult. The network will act as a swarm to host the popular data, through caching. For immutable data, it will obviously be highly cacheable, as it can be confidently cached indefinitely.

So, for immutable data, a DDoS would need to be an attack on the whole network, i.e. tie all the nodes up serving nonsense, so they cannot serve genuine requests. I suspect the resources to do that would be high, as there will be many nodes.

For mutable data, DDoS may be more feasible. Being able to overload nodes which, say, provide maps to immutable data, could have an impact.

I’m not sure what the current caching policy is for mutable data, but short term caching would go a long way. It wouldn’t be as effective as immutable data caching, but could be sufficient to fend off high demand.

I’ve not dived into the code around the caching though, so I’m referring to discussions from memory. I’d be interested to know the current state of the art is too.

dirvine · December 17, 2024, 9:35am

We are working on a few items right now. Such as a Pointer data types that links with Transactions to always allow HEAD to be found and history to be guaranteed. It’s a use of Transactions to give everyone what they thought registers had.

Also some self_encryption changes for huge file handling.

That Is all going well.

On caching though we have not put any in place really, although there is some lazy caching for sure where nodes hold data they are not directly responsible for.

However there is also RBS (range based Search) and this does a few things, but mainly it prevents injection attacks. It is caching if you look at it sideways. But how that works is it uses the median DISTANCE from an address as the close_group. So instead of X members of a close group (5) which is normal, in times of injection of sybil does to kill data for example, this DISTANCE measure ensure we speak to at least the honest nodes in that range. We will also speak to sybil nodes, but their impact is zeroed out.

So it’s more ignoring sybil nodes and knowing we have honest nodes instead of spreading the data much further and wider. The only way sybil nodes could beat this is have so many in each address space that the DISTANCE gets small enough over time to exclude any normal nodes. However this attack would be a massive % of nodes required to ensure not even a single node is seen in any lookup.

neo · December 17, 2024, 9:49am

Did the algorithm change? IE if I SE a (smallish) file previously, would it chunk to the same chunks with the same xor addresses?

dirvine · December 17, 2024, 11:03am

I suspect you mean did the data produced change? So no the basic SE algorithm for creating a chunk did not change. So it produces the same chunks. But the overall SE algorithm changes to handle streaming of data to chunks and chunks to data. It also allows streaming parallel decryption to allow folk to get many chunks in parallel and stream to a file.

In addition there is an extra step where the encryption functions take the data_map produced and encrypt that until the Len() of the data map is == 3. So we don’t get massive data_maps back.

The decryption check get’s the root data map back before decrypting the file.

So there are 2 new methods, shrink_data_map and get_root_data_map and these are used internally, but also exposed to the API for folk that want to mess around

neo · December 17, 2024, 11:06am

From my understanding SE used the last chunk and second chunk to be part of the key to encrypt the first chunk. So this didn’t change?

And thus streaming doesn’t used the last chunk since it doesn’t exist. Is that a different SE then to allow this.

Or am I missing it completely.

Traktion · December 17, 2024, 11:15am

Is this the case for mutable and immutable data, currently? Are they planned for before or after TGE?

It would be great from a resilience and performance perspective to have these in sooner rather than later.

dirvine · December 17, 2024, 11:20am

No this is still the same, otherwise chunks would be different and data would be different.

the last chunk still exists when the stream closes.

Yes, all data.

Happening now, I think recent tests have this in place @qi_ma knows more though. We had issues with getting downstream PR’s into libp2p and then published so did have to do extra work while waiting.

neo · December 17, 2024, 7:49pm

Ah I was taking streaming as in streams like live streaming where the last chunk doesn’t exist when the live stream starts to be recorded. Guess live streams will have to be a series of “files” of a few chunks each. Number of chunks in each “file” will determine the minimum delay the live stream experiences. Prob 8 chunks. Maybe 4 for low res.

Topic		Replies	Views
The Perpetual Web? Beginners	37	1466	December 24, 2022
SAFE Storage economics - one-time fee, forever service Features	91	7045	August 6, 2016
Store Of Value Community	61	924	October 8, 2024
Permanent data storage + lots of nodes falling away Features	38	3499	March 24, 2018
Infinite longevity requires data to be unique and recyclable Development	9	891	July 13, 2017

Permanent data

Related topics