Make many tiny nodes/vaults. abandoned them when full?

Please explain to me how a user isn’t incentivized by the network to split their storage up into many small nodes so that they can retire ones that have filled up so that they can free up that space for more nodes to make money from. I’ve already read past threads on this exact topic as I know it has been brought up before, like here:

https://forum.autonomi.community/t/qq-about-how-safe-will-protect-against-this-in-the-network-farming-related/

But I found no satisfactory answer. That thread seemed to beat around the bush on the question. No response given explained why doing this is not advantageous. If my node has reached the point where I have to expand its size to gain any more rewards, I’m going to be motivated to use that disk space for another purpose. If I keep nodes small, then they will fill up more quickly so I can repurpose the disk space associated with that node without affecting the performance of other nodes.

I can think of some potential solutions to the problem, but I’m not aware of any that have been implemented. At the time that the above thread was being discussed in 2018, @happybeing stated [quote]
I believe that the advantage of age tails off, so being around for twelve months may not be that much better than six months
[/quote]

and at the time I read that, I thought to myself that having nodes gain value through age could be a motivating factor to keep around a node and expand its size, but if advantages tail off with age, then I may still be motivated to drop an aged full node, rather than expand its size because I could just drop that node and add that disk space to another younger node that has matured to 6 months, but now it turns out that we no longer apply an advantage to aged nodes. According to @neo [quote=“neo, post:91, topic:40573”]
That whole concept has gone and nodes are individuals living in the world of XOR addresses. They have neighbours and so on, and they make decisions on their own. By following “the rules” the other nodes will continue to talk to your node and live in happy connection with others and be productive. Occasionally one node will see another as not working right and “shun” it, not talk to it any more and your node may do that to other nodes.
[/quote]

so given this, I see no motivation at all to keep around a node once it is full when I could instead use that disk space for something else. If I use that disk space to start up a new node, then perhaps under the given system that new node is in a much less favorable situation than the full node I dropped, but at least that space is now able to potentially make me rewards at some point instead of being tied up, and I have plenty of other nodes that have a good reputation with their respective neighbours thanks to firing up many small ones, so I see really no harm in dropping this full node to free up that disk space rather than expanding that node. Should we somehow reward nodes based on size? Give larger nodes a better reputation so that they are more likely to receive new data and rewards? Or is there something I am missing?

Another somewhat related question I have is, if a lot of nodes/vaults are dropped regularly, does that not mean that existing nodes/vaults are filling up without being rewarded? Is there anything built into the algorithm for reassign this data that perhaps gives it to younger or smaller nodes in order to incentivize aged or larger nodes?

Thanks for any clarification on this topic as I’m still struggling to convince myself that the network will work based on what I’ve learned of it over the years and a lot of the time the implementation seemed so open-ended and unsettled that I decided to just wait and see how everything turned out before trying too hard to understand it all, but now that it seems we are launching I need it to make sense to me before I can fully embrace it.

1 Like

Sorry for not completely reading you post.

For the title though, nodes should never get full of records they are responsible for. Data shifts between nodes as new nodes joins and nodes leave. This results in the nodes all having around the same number of records they are responsible for.

Nodes have two sets of records, ones they are responsible for and ones that they are keeping just in case they are needed.

The first set is the set of records the node is responsible for and that is records that it is one of the closest 5 nodes to the record’s xor address. This may change (decrease) as new nodes join that are close to any of those records. And may increase as other nodes leave and it becomes one of the closest 5 to some of those records.

The second set are records that while the node is not one of the 5 closest nodes it is still close enough to be helpful to keep. This helps in caching, it helps when nodes leave because it saves having to obtain some records that it became responsible for.

Thus we see nodes can be “full” in the sense they have a full compliment of records but are not full in a network sense because the number of records it is responsible for is but a subset of the records it holds. The joining of new nodes means that nodes overall will reduce the number of records they each are responsible for. If a lot of nodes leave without joins happening then this figure increases.

tl;dr Thus nodes are never full unless the network is full and stopping nodes will not help

And I didn’t fully read the post since the premise is faulty and its so long.

8 Likes

So if a new node joins, data is immediately shifted to it from other nodes?

6 Likes

You got it.

I’m just jumping in with an analogy: data on the network is like an ever-moving river, moving from old nodes to new nodes, so your nodes’ ability to earn never ends.


Check out the Dev Forum

2 Likes

Along with what has been said already about nodes never ‘filling up’, nodes will only earn if they are doing what they’re supposed to be doing, which includes storing and delivering all of the data they’re responsible for when they should.

If a node is shut down before it is storing what it’s responsible for, it won’t earn anything, or at least not at its full potential, so won’t be an advantage to anyone trying to maximise their earnings.

The scenario that you picked up on in discussions elsewhere may be relating to recent discussions about maximising rewards in the beta test networks, which did at times result in it making sense to ‘over provision’ nodes by not actually providing the disk space that the nodes would require if full, as it was clear that in those tests the nodes were not getting anywhere close to full. But, this was just a quirk of the beta testing / rewards setup, and isn’t going to be the case once the network is properly up & running.

9 Likes

Yeah, this is great. The idea that nodes can monitor each other to ensure they are serving requests seems very important. I’m very interested to learn how this is done… data proxied between nodes so they can verify one another? I look forward to digging into the code at some point.

2 Likes

To be clear this is across the network, not between nodes on the one machine. The nodes on a machine have no knowledge of each other.

The checking is along the lines of “are you responding correctly” to messages including a health check message. Also checks on the other nodes responding to record checks correctly. etc etc

3 Likes

Yes, I get it. The purpose is to ensure no nodes are trying to cheat the system. My only remaining concern is that it may be rather difficult to ensure/prove that the algorithms used to vet other nodes are full-proof and that there is no way to trick these algorithms and somehow game the network. The weakness in autonomi seems to me to be in its complexity. In networks like bitcoin, work can be proven mathematically, while proving diskspace allocation is not so mathematically straight forward and it may be difficult to prove that the model for vetting nodes has no security weaknesses that can be gamed. Up until now, there has been little incentive for anyone to try to game the system for profit. I wonder if some bounty program might be useful to incentivize people to try before the autonomi network becomes more popular.

1 Like

Nodes do not prove diskspace allocation, but whether they have stored the correct data, for which they must be responsible. Which is as mathematically provable as the calculations in the Bitcoin blocks.


Check out the Dev Forum

4 Likes

Really? To my knowledge the only way to prove that a node is currently storing all the data it is responsible for in any given moment in time is to request all of that data and apply a validation routine to all of that data, so this would have to be done for every moment in time to validate for all time. Presumably the network couldn’t possibly do that as it would be too computationally and bandwidth intensive to be possible, so instead the network resorts to more complex algorithms that are more practical. You’re saying that we have a model where by nodes intermittently vet one another that has been mathematically proven to be impossible to circumvent? Presumably this mathematical proof has been published? Could you provide a link?

I’m not familiar with the specifics, but the general principle is that everything is relative.

For example, with Bitcoin you can relatively assume you are on the longest chain, but you can’t prove it at any point until enough time has passed.

It’s the same with Autonomi, it is known that every single piece of data will be moved at a certain time when nodes enter and exit and when enough time passes it will be proven to have been valid. There is nothing complex about it.


Check out the Dev Forum

1 Like

Here is a more comprehensive consideration on the matter of trust and gameablility of such storage networks (ie. Filecoin, Arweave, Storj, Sia, Ochain):

I think not just storage networks, but many non-PoW, non-PoS decentralized networks suffer from the same issues of greater complexity meaning more possibility for the network to be gameable by some unconsidered tactic or new future technology. With PoW, it is very helpful that their is one single blockchain that can be checked by everyone to ensure the system is working as expected. Without that, if nodes were for instance vulnerable to some sort of attack that allowed them to be compromised or manipulated, it may not be so easy to detect the foul play.

1 Like

Yes at some point you require a higher level of validation than we have in most storage systems. You can implement them at higher costs but still not perfect. Once a certain level of validation is achieved then its considered enough.

For instance the home computer relies on the disk drive/ssd to validate the data coming off the storage media with certain error correction//checking codes. Drives used to only have CRC style codes added to sectors to provide a confidence level on the accuracy of the data. Now drives have ECC to correct a certain level of errors, and needed too since the raw error rate is higher on drives in last 20 years, but the corrected error rate is much better.

NAS has its RAID with various levels of detection/correction on top of the drive level error correction/detection. This has been expanded upon and other storage mechanisms introduced over the years. But even with all this there is the potential for errors to creep in undetected and/or unrecoverable.

People used MD5, then various sha levels of hashes to validate data has come though correctly, which provides a check level beyond the storage mechanism. Or even ‘n’ of ‘m’ encoding of data to provide detection and repair of files.

Having said that, its a game of costs (time, effort, money) vs what is acceptable. Autonomi has always been a probabilistic network, and even bitcoin is that in some respects as @Dimitar pointed out with knowing the correct blockchain and/or blocks being used. With that a certain level of checking on other nodes having the data stored they claim to be storing is one such check. Using sha256 for hashing the contents of records to know if it changed (disk errors/validation) and so on.

One such check on a close neighbour is possible due to at least 5 nodes holding each record. A node that is one of the closest 5 to a record can hash the record with a salt, and then give the salt to a neighbour that is also one of the closest 5 to that record and ask it to hash its copy with the provided salt. If the result does not match then the other node failed the validation check. Of course the error could be on the asking node. So then check the other nodes in the closest 5 to get a confidence level of which one is in error. You must be storing the record in order to give a valid response, and checking done without transferring the whole record across the wire.

3 Likes