More Young People Engage in Unprotected Farming

We’ve had questions (and attempts to answer them) about “best” ways to organize storage on which to place vaults.

At the moment there’s a mini rush going on related to the other well-known decentralized storage project because they’ve started running their Alpha 2 (I wouldn’t call it Beta 1) network. They call it Test B.

One thing that caught my attention was repeated discussion about “best” media, RAID levels and so on. While it is normal that people are concerned about that, and similar questions were asked on this forum, I wanted to put together my thoughts regarding RAID levels.

First, despite the “rock bottom” cheap capacity peers can provide, there is no escape from the laws of economics. Some (let’s not name names, except Warren’s) have declared the end of artificial resource scarcity, but what we’re witnessing is the same old economizing which confirms my earlier expectations that we can’t ever stop economizing (unsurprising, really, except to socialists). We can, and won’t be able to afford to uneconomically use resources on a large scale.

Second, some expect ridiculously low storage prices that would be brought about by creative individuals serving data off cheap desktop drives without “bloated” enterprise storage hardware.

But instead the uncertainty (of whether a miner will be online tomorrow) that is a consequence of decentralization results in the need to replicate data multiple times, and - this is important - punishments have to be meted out for extended or permanent downtime, as well as for data corruption (where data is destroyed/corrupted on a per-contract basis). While on the network the default level the replication provides sufficient protection, on the individual level you can’t benefit from it as a farmer.

If you lose a drive together with vaults that live on it, the network will survive but your reputation (and wallet) won’t. If you have just 1 HDD, you have no choice but to go unprotected, but if you’re a serious small and medium farmer (“SMF”), you’ll have 4 or more disk drives. What then?

You can set them up in a non-RAID-ed configuration (4 separate disks and with a separate file system on each). Normally that means that a failure of any single drive doesn’t impact the rest. But with these systems the network must still punish you individually, so if you happen to be running these four drives even if the other 3 survive, your reputation will suffer and perhaps significantly so (that is, it may turn out to be cheaper to RAID5 them, than to lose 1 disk out of 4, because the associated reputation risk may be too high relative to the cost of 1 HDD).

Or you can setup 4 accounts (one per each drive) to prevent that, but you can’t in advance tell which is worse - to suffer a 100% reputation wipe-out of 1 account, or partial data loss of 1/4th of your data (the first scenario, above). As I commented last year, we won’t be able to know which approach is better until the way algos work is finalized, and even then there will be different versions of “better” depending where you live, how much you pay for power, HDDs, etc.

My current conclusion is that I want to use RAID1 (for mixed personal + mining use) or RAID5 (for mining only), but because of the uncertainty (I don’t know the way system will punish and reward farmers) I did the opposite - I prepared a farming rig with a single disk.

Once we have more details on how the system works I think SMF’s will use some sort of RAID protection (probably RAID5 or RAID6, because RAID1 is quite inefficient). But, as I argued in here, this will make it harder for SMFs to compete.

Summary

  • Because the network has to work with untrusted farmers and because individual participants want to protect their reputation, data efficiency will be very low (multiple copies on RAID5/RAID6 or RAID1)

  • It will probably make economic sense for casual farmers to engage in “unprotected farming” (this is where I am starting, although I didn’t see myself doing this)

  • I think we’ll see a lot of rigs with 2 disks and one identity per disk drive. That way it’ll be easier to split risk and even engage in “merged farming”.

  • At the moment I guesstimate the risk of farmer centralization is very low; even specialization will be hard!

6 Likes

I dont think that running a R5-6 will make it harder to compete in the long run. I think it will actually make SMF much more efficient long term. With R5 you are only losing a single drive per setup. I currently have three R5 setups in my house. One straight R5, and two hardware R5 in a software R0 (striping) for my media server. Between the two setups, I’ve had about 6-7 drives go bad over 6 years and have yet to lose a singe byte of data. That would not be possible with any non-raid setup. And I would only have half the space in a R1 setup (currently ~17.5TB between the two) and would only have ~9 TB on R1, R5 has protected my data just fine.

TL;DR: R5 is more than adequate to protect user level data, keep constant uptime (if your OS/hardware supports hotswap), and protect/build your reputation even more than saving a single drive amount of space across the setup.

1 Like

For both my archives and my farming setup, I am planning to go BTRFS raid 6.

1 Like

I don’t think there will be that much punishments for going offline. David stated somewhere that people who go online with their laptop for 30 min. should be able to Farm a little Safecoin.

You can set them up in a non-RAID-ed configuration (4 separate disks and with a separate file system on each).

I think most people won’t. Non-persistent vaults will only be online as long as RAM and HD are okay. How many times a month or year do you loose data on your own computer? I think my computer runs about 13 hours a day. So when the network is live, and I want to earn some Safecoin, I’ll just run a Vault for these hours. If my computer goes offline for what reason, no problem. It only takes my rank down a bit.

This is a security and simplicity proposition. There will be persistent data via archive nodes in due course. I continually strive to ensure we reduce account info and ensure all people get a chance to farm on even the smallest devices. This also means you do not have a persistent vault ID and it’s re calculated at every reboot. It has very interesting implications and will move data much faster across the network. To be persistent we had to store the vault private key locally (there are some options). This way there is nothing/nada to steal from your computer that is going to help any attacker. So it is pretty nice in many ways.

In essence it allows much smaller fast/off/on nodes to take part in the network easily. As a vault stays on longer it holds data for a while and grows rank (automatically) per session. It makes rank calculation very simple and also makes ‘gaming’ the system significantly harder (it was already very hard).

3 Likes

I think you missed some of the changes that came with the decision to make vaults non-persistent by default:

Your punishment for going offline is that your vault is essentially wiped, and you’ll have to build it up again over time.

5 Likes

So btrfs or zfs would be suitable for this case. You can hotswap, and continue to pool more into maisafe.

BTRFS raid 5 and 6 have become usable during the last months. BTRFS is much more flexible than ZFS. You could easily grow the pool and even change raid mode on the fly.

2 Likes

And it got snapshots. Easy rollback in case of data corruption.

But not sure how this could be used in maidsafe.

Just to avoid disonnection by pool failure or loss of personnal data.

A blog post about the latest BTRFS

http://markmcb.com/2015/07/19/migration-from-luks-encrypted-hardware-raid-10-to-luks-encrypted-btrfs-raid-6/

1 Like

Maybe it won’t, but at this point we just don’t know. If we both pay the same cost of capacity, I’ll be able to provide 50% more capacity than you with RAID5 (I use 1 HDD, you have 4+1 and 1 hot spare, so 66% * 5TB = 3.3 TB whereas I provide 5 TB). So I would be able to sell my capacity below your cost of h/w.

I meant un-protected, not non-persistent. For example, I could have a PC with 4 disks, and run 4 VMs, each placing its vaults on one of the disks. Each of the 4 farms would be on its own, and die if the disk it uses dies.

Okay, but where do I make more money: mine from 4 separate VMs with 1 HDD attached to each, and then suffer a complete wipe-out on one of them, or mine from one VM with a 3+1 RAID5.
I argue that if my chances to lose a disk are 5%, then I’m getting 95% of my non-RAID-ed setup, whereas you’re getting 75% (or less, if you have a hot spare handy).

If you have 1x5TB drive and I have 5x 1TB drive, you spent around $200 for it (average on newegg) and I spent around $60x5 = $300. Now it boils down to SAFEcoin economics. Is my extra $100 going to be paid for by my constant uptime? I don’t think anyone, even the dev team knows just yet. I would hope so? We’ll see.

The only difference (as far as I can tell) is that I claim this uncertainty will prompt people to farm unprotected.

For example once we enter the stage at which vaulted data won’t be discarded, I can start farming with a single 5TB drive and - depending on how the economics turn out later - convert my setup to a 3-drive RAID5 array later. I think this approach provides more flexibility in face of uncertainty.
The same logic may not apply to someone who starts later once all the parameters are more or less fixed.

This debate is a bit narrow and ignores many relevant factors, which to me make the whole idea that farming with large purpose built setups questionable in itself.

A more interesting question to me than which purpose built setup will be more or less profitable, is at what point do they become profitable at all?

The question arises (but is ultimately unanswerable without live testing) because the network favours small, none purpose built setups, that are constantly switched off and on. These will have the effect of lowering farming rewards. We don’t know how much, but it could be enough to make purpose built systems unprofitable.

This doesn’t mean purpose built will not be profitable, but it certainly raises the question.

I don’t see much point in debating the outcome of this, even though it’s a more interesting question to me. It just makes me very excited to see the outcome for real

I don’t remember reading anywhere that it would prefer systems to be turned off. I also specifically remember David talking about “if a node reaches archive status” meaning it would not have been turned off for a very long time (they are wiped on reboot)

The network will favor vaults slightly over the network average size, but that doesn’t mean it wont help to have a “purpose built” setup. For me, it sure will be profitable. My server stays on all day anyway, I already pay for my internet. Any income at all will be profitable.

Please note, I’m not saying this discussion is not pointless to debate before we can see the network for real… Just a bored Sunday.

Well it depends on your costs. If your system would be bought and powered regardless, then you can ignore those costs. But this is unusual

When I say the network favours small vaults that will be turned off and on frequently I don’t mean they are rewarded more than ones that are large and permanently on, I mean that the network ensures those vaults have advantages - when you consider the overall costs and rewards for each kind of approach.

2 Likes

I think what’s especially relevant is how quickly your vault fills up again after being wiped. If it’s a matter of days, your only loss is that you don’t farm at full capacity during those days. If it takes weeks or months to get back to full capacity, the loss is more significant.

Also, fresh data might produce more gets

Probably, but I think most of the “new” data you’ll get after a wipe will be from churn, not from new uploads.

1 Like

Interesting scenario. Churn gains don’t pay, right?