I’ve seen bit rot mentioned as a potential problem (use ECC RAM etc.) for farmers.
Since the whole concept of MAID is a drive array, could not the chunks be compared every so often and if 3 agree and one doesn’t, the network drops it and dupes one of the other good chunks back so there’s still 4?
When a vault shuts down and a new copy of a chunk is made the chunk that it uses to copy must have a valid hash of the chunk otherwise the network will drop that one and choose another copy of the chunk and make 2 new copies.
So its like a disk drives error checking system with the concept of RAID
@neo answers part of this, but also, for immutable data at least, the hash of the stored data is the network address of the data. So the immutable data can also be checked for integrity at any point at rest or in transit, independent of other nodes. If the new hash of the stored or in-transit data does not match the network address, it is corrupt and will to be replaced by a valid chunk.
If you are a farmer then I expect it would be an advantage to reduce bitrot on your own hardware. That way, your vault would be more valuable to the network. The question is if the extra cost of EEC supporting hardware is worth it. A good EEC setup will cost an extra £300+ over a decent non-EEC NAS build, and way more then a simple single board computer setup.
There are so many questions now regarding the most efficient hardware arrangement. Surely there must be some hints as to what works best. ECC RAM is only one issue.
Do we run multiple VMs on a large machine, many small independent machines, a single large machine, and what size/type RAM, how powerful the CPU etc.? I asked elsewhere and understand I must wait and see.
It’s a series of max-min problems, and I just wish there was some small amount of guidance on this, and can’t understand why there isn’t a rough idea from internal testing already done; I’d like to be ready day 1.
I have about 50TB standing by (tiny compared to some of you I’m sure), but once I know the best config I’ll massively increase that to hopefully saturate my internet connection. Of course, these are all comments meant for other threads – but I had to get 'em out. I just can’t wait to help turn the internet paradigm on its head!
In all seriousness I have run computers for 40+ years. Hard drive technology sees little “bitrot” apart from drive failures.
Memory systems for the most part have extremely small error rates, and very large time between errors.
The inbuilt error detection of the SAFE network is designed to overcome any individuals errors and even if you eventually get one chunk in error on your disks, the network would simply reject your copy of the chunk as bad.
You are many (10-100 of) thousands times more likely to have communications errors that cause a chunk to be in error when serving up the chunk than any unknown disk errors (bitrot).
ECC ram would be overkill for the purpose of vaults because of the error detection inbuilt into the SAFE network operations. ECC might save you 1 in a billion chunks being in error (if any errors in the lifetime of your disk drive)[quote=“Mick, post:6, topic:12212”]
I have about 50TB
[/quote]
If you have the connection bandwidth to the internet to match then you might serve 1 billion chunks in a relatively small amount of time. But would the cost of fault tolerant hardware justify the few bad chunks you might serve up ever 3-12 months. The network will make new copies for any bad chunks that appear in your storage systems.
tl;dr
The SAFE network is designed to use consumer grade hardware and is one of the reasons it keeps so many copies of each chunk on the network.
Well if my tests of vaults during a previous testnet when we ran vaults are anything to go by then the choice will be yours. I ran a vault on a C.H.I.P. computer. Single ARM core running at 1GHz with 512MB ram.
The vault used very little processor time (few %), ran without issue and was using WiFi connection.
As to the number of machines (real/virtual) then simple maths gives you a broad answer - many vaults.
If you use small machines then a small number of vaults per machine, if large machines then many vaults (per VM if using VMs).
But again the bandwidth of your internet connection will be a limiting factor on the number of vaults you can run.
The maths is about maximising your GET requests from the network to your vaults. The vaults will fill at a certain rate (average) so one large vault will fill at “x” chunks per day and 100 vaults will fill at “x” chunks per vault per day, giving 100x chunks stored overall. Each chunk stored creates a potential GET request, so more chunks stored means more GET requests (on average)
Awesome response neo. I have 150Mb/s up and down, unlimited. Unlimited u/d Gb is pending, but probably not for 6-12 months unfortunately. I will happily take any advice you can give regarding vault configuration. Must one run VMs to get many vaults, or can many instances of the software be run in parallel? For simplicity I’m really hoping not to have to set up a plethora of SBCs, as much as I love them, but rather a few hefty servers. Cheers mate!