Will there be farming pools on SAFE network?

vtnerd · August 10, 2014, 7:06pm

Others in my group, or others in the network? The latter is irrelevant for what I am proposing.

EDIT:
Or are you saying that it won’t matter much statistically if I crowd out the others in my group since it will be small compared to the entire network? I’m not sure of the math behind this particular claim, so this might be an idea that won’t work in practice.

wes · August 10, 2014, 7:51pm

Sorry, I was sitting at my desktop at the time and currently can’t use one hand, so typing is tedious. I should have grabbed my phone from the start. I apologize.

Farming of coins happens on “get” requests. (the paper does make it sound like puts, maybe I’m reading it wrong, or maybe it changed, but currently it relies on gets). That change changes the whole question, so I’ll let you ask any questions that leads you to, I won’t try to guess.

vtnerd · August 10, 2014, 11:53pm

This whitepaper says in several places that the put requests are the triggers for the mine requests. Based on some other forum posts, I think this algorithm is still in flux, so is there any resource that describes the current intended implementation? I could look at the code, but knowing the intended implementation would be nice (allows comparison). I will list my assumptions below:

Assuming mine interval is based on successful GETs (first response + accuracy):
This wouldn’t make my proposed situation impossible, just more difficult. Several people in the pool buy some decent netbook with a SSD and connect to each other over a fast connection (LAN). SSD’s connected over a low latency link, with an efficient distribution server, should still get the responses quicker than the average rotational disk. Jeff Atwood has this lovely chart on response times from various sources. SSDs are absurdly quick, so they are an obvious choice if response time is highly factored into mining. The part you aren’t seeing in his chart are the local network access times. Currently my laptop has a <2ms response time to the internet gateway (laptop — 100mbit ethernet —> linksys dd-wrt — 802.11 —> netgear gateway). Machines with SSDs connected over ethernet should easily outpace any rotational disk, because they should be hitting sub microsecond response times. Upgraded networks (gigabit or 10GB) should have lower response times because the clock speeds on the network cards require faster and more accurate clocks. Less time until the next packet gets dropped on the wire.

Assuming the merkle tree isn’t verified by others in the network
Is the signed merkle tree request for a vault ever verified by others? The whitepaper never mentions, and I don’t see any posts on the forum so …

I would then play “loose” with the rules of the MaidSafe network. Each person in the pool could create their own vault and oversubscribe the harddisk space (have multiple claims to the disk space via the pool). Over-subscription would mean dropping the least frequently accessed files first. This would be tricky, because if you aren’t careful on the over-subscription you should lower your POR quickly. The healthy_space calculation would be used to determine the amount of over-subscription that would yield a high POR across all vaults. This should be the way to maximum the likelihood of getting COIN and storage space.

Some background
I think colleges are moving to all WiFi which may kill this idea, but Virginia Tech had ethernet for all dorm rooms when I was there (2003-2009). So I think Universities would be a good place to run this pool because they had an extremely fast connection between peers, and a fat pipe to the internet to handle the multiple vaults running at a time. Probably get banned from the network due to the bandwidth usage though.

The second thing that comes to mind is HFT firms. Assuming they have some fast pipes between them and secondary pipes to the internet, they could pool their resources (CPU, memory, disk, and network). The secondary network would allow them to do the pooling with minimal trust too.

As Usual
Information on the design of the network is difficult for me to piece together, since its scattered everywhere. So point out faults and better documentation resources! I’m overdue at code grokking though - so that as well.

fergish · August 11, 2014, 2:10am

I’m not in there on some of the deeper tech details here, but I still don’t see what you’re trying to do by pooling, and it seems I should be able to get it even without the higher tech data. Take the college dorm circumstance you lay out, where everybody has fast connections between each other and the web. What is the benefit to any vault to be connected to others in some interdependant way? Wouldn’t they each be able to pass their gets out directly at least as well as through some sort of pool?

vtnerd · August 11, 2014, 3:53am

The fundamentals of bitcoin and safecoin seem no different to me. Bitcoin is about proving CPU time, and safecoin is about proving hard-disk storage. Mining in both systems is based on probability, and should get more difficult over time. Since safecoin wants storage (like bitcoin wants CPU power), the payout probability is designed to increase when someone is providing large amounts of reliable storage. Like with bitcoin, eventually the odds of ever receiving coin should become small, at which time it makes sense to pool resources. Each person gets a lower payout (a fraction of a safecoin) but gets a higher probability of getting some coin from the network (since the payout probability is based on the amount of reliable storage).

I can come up with two differences right now:

People cannot store data on the network without POR.
It could be economical to not store infrequently accessed data.

People cannot store data on the network without POR
This feature might make pools unnecessary because people could lease their extra POR. So individual farmers should care less about mining than on the bitcoin network. However, if safecoin gains in value like people hope AND storage costs remain cheap, then mining coin could still be hugely profitable. At which point pooling your available storage in intelligent ways with others could make sense.

It could economical to not store infrequently accessed data
If the only penalty for not being able to provide a resource is a hit to the POR, then it would make sense to over-subscribe your storage (claim more than you have) and take the infrequent hits to the POR. Pooling with multiple partners amplifies this effect because then you run multiple vaults. Additional vaults mean higher probability of getting coin (more PUT/GET requests). More importantly there is a larger data set for identifying infrequently accessed data. For example, if you had 10 vaults banded together, vault 9 might be storing a higher amount of infrequently requested data. It could therefore erase that data, and provide storage for vaults 0-8 instead. This would increase the POR on the vaults that are paying out, and decrease the POR on the vaults that are not paying out. It should still be advantage. This is somewhat dependent on networking between these vaults and the internet, but it seems plausible to me.

Now, the whitepaper (which is only partially trustable apparently), mentions a merkle hash on the data stored in the vault. Its not mentioned explicitly anywhere in the documentation, but I assume the vaults storing the duplicated data are supposed to verify a valid mined request. It would be nice to see how the merkle hash is constructed (probably yet another thing I have to find in the code) because if the tree is static each round then you could prune some leaves (delete the data but keep the hash) and still get a valid root hash. I think you would have to mix the structure of the tree so that it had to be re-generated each time. These things may been done already, but its hard to find information on it.

janitor · August 11, 2014, 10:09am

What does?
At the end of my paragraph you quoted I said “but as far as I know that’s not how SAFE works”.

I know… I think it’s irrelevant. Why would I want to refuse to store my data on a cheaper or better node?

I doubt this factor will matter.
You quote your fastest (to the Internet gateway) latency is 2 miliseconds. For anywhere useful (like some other SAFE node on the Net) it’s probably more than 10ms (20ms?) which is much slower than access speed of a HDD.

That seems very risky, because by false reporting you’re almost guaranteed to be discovered (even if the owner attempts to delete their file, can you confirm it was deleted when you don’t have it?). And from the WP:

If people were to try and game the system by providing farmers to store data and then switch them off, they will simply remove their ability to earn. At some time in the future it is envisaged the network will be able to detect such data and remove it from the network.

While this “future” hasn’t arrived yet, it probably will, so by false reporting and without the ability to regenerate that data at will, I don’t see how one possibly can win in the end…
One could occasionally read random selected bytes from his files stored on the SAFE network to make sure they’re there. I think cheaters would quickly be discovered and penalized.

OT: If SAFE and similar approaches become any successful - which I think it will - telcos will start charging for bandwidth/QoS or else networks will get very slow (while telcos wont want to invest without being able to charge for that), which relatively quickly change the economics of distributed storage.

fergish · August 11, 2014, 2:23pm

True. Apologies.

janitor · August 11, 2014, 3:48pm

NP. Thanks for that link, by the way, it’s very useful!

dyamanaka · August 11, 2014, 4:49pm

I plan to be a professional farmer on the SAFE Network. But it is very hard to predict public behavior of a system that has not been launched yet. We have not defined rules for: vault ranking, farming rate, penalties, storage tampering, etc. I appreciate @vtnerd’s ideas on how to make a pool possible.

Here’s what I do know.
Blockchain pooling offers a more stable rate of return. It was the natural evolution because of the blockchain payment structure. IMO, this pay structure eliminates smaller resource contributions. Regardless of the reason behind it, this is the reality of the result. A blockchain miner is left with… go solo and probably never see any coin, or join a pool and see a tiny portion of coin.

ANT pooling may be possible and could improve SAFE Network latency. If this brings a better user experience, it should be encouraged. My one concern is dominating other farmers to the point they are not able to earn Safecoin for resources provided. I’ve always liked the motto “equal pay for equal work.”

Safecoin Pay Structure
According to my understanding, Safecoin distribution involves only 4 vaults competing for every 1Mb chunk. If this remains true, then an individual farmer should remain competitive because he/she is only competing against 3 other vaults, not the pool. If 1 of the 4 vaults belongs to a large pool, their resources should not increase that vaults ability to dominate the other 3 non-pool vaults. It still remains a race between 4 vaults. Depending on the 4 vaults, geographic proximity, drive response time, bandwith availability, ISP latency, and any other factors that determines a successful GET response, each individual farmer should get a chance to provide resources.

If a pool farmer is able to exclude the other 3 vaults on every GET request, regardless of where the request originated, I would be concerned. I did write a theory on how one might do this, but it involves a large amount of resources, multiple geographic locations and storage tampering. I hope it is very unlikely if not impossible.

happybeing · August 11, 2014, 6:28pm

This is only true for one particular chunk. Farmers are competing against each other for rank, and it is rank that will determine who gets to store the most data, and the most farming opportunities.

Rank will be based on a number of factors according to some fiendish MaidSafe algorithm, likely to incorporate: uptime (percentage availability), reliability (percent of requests responded to which will depend on several other factors such as connectivity and uptime), trustability (adherence to expected behaviour as judged by its neighbours), and response time.

So if factors such as these, that affect rank can be improved by pooling, we can expect to see it. If not, woo hoo

dyamanaka · August 11, 2014, 6:57pm

I thought “available_space” determines how much data can be stored on a vault? This amount is stated by the vault owner.

Ranking
Unless the Network is born with rank 4+ vaults from the start, it must be able to store data in vaults at the lowest rank (rank 1). I have not seen any published rules on ranking so, it’s based on discussions only. I would imagine a vault rank 1 should be able to store up to the max amount of their “available_space.”

Because Vaults may also serve other personas (roles), their higher ranking grants them more responsibilities. Again, I have no idea what those are at each rank level. But it should not hinder a new vault from being able to store data from the start.

happybeing · August 11, 2014, 7:39pm

I think this is probably a wrong assumption - at least once the network is established.

I haven’t collated all the snippets I’ve picked up from discussions, but I recall David Irvine on this one, and think he suggested that new vaults joining the network won’t be trusted immediately and that they would have to serve time before being “one of the four” in charge of any chunks. I’m not sure, but this might involve them being given data, but still not one of the four.

Anyway, I am largely speculating. I’m just responding because I think you might be making assumptions that from memory, I think are not valid. I agree we won’t know until the algo is decided (although I suspect David has at least the main points of this well established pending testing), and of course not even David will know how it performs until it goes live.

BTW there may well be quite a few high ranking nodes at launch. MaidSafe have purchased capacity to help seed the network, and since they will be able to vouch for the trustworthiness and performance of the capacity they manage, it makes sense they bump rank there at least while things get going. I may be wrong of course, again, speculating

dyamanaka · August 11, 2014, 8:01pm

I agree a node can not be trusted immediately, hence they start with the lowest rank, having the least amount of responsibilities. But the most basic responsibility of the vault is to store data.

We are in disagreement over the function of vaults in relation to their ranks, which is fine. I don’t know what exactly they are either. I’ll leave it at that.

happybeing · August 11, 2014, 8:24pm

And there will be vaults that are not trusted to store data, or which are only used to store what David calls “offline copies”. To start with, or during phases of high growth, it will no doubt be easier for a vault to obtain some “online chunks” and compete with its three partners for farming opportunities. But evenso, a higher rank node will end up with more “online chunks” than a lower ranked node, until it is limited by its own capacity.

There may be a balance that helps lower ranked nodes get a look in though. Imagine if all the data could be handled by max rank nodes without any others getting a look in. I’m not sure if the network would still try to spread the love around, but intuitive it seems necessary and better for the health of the network and for encouraging more small farmers.

fergish · August 12, 2014, 12:36am

Okay, some of the traffic generated on this thread has gotten under my mental skin and required me to hone my analytical pencil.

So I’m going to go out on a limb here and make a very bold, absolute statement and then set about proving it. (I’m hoping @dirvine will speak my correctness/error here.)

First, though, I need to make a couple things clear:

a. I’m not a programmer or systems person. I’m only marginally computer savvy.

b. Nevertheless, I’ve become pretty well versed with the guiding philosophy and a lot of the particulars behind the Safe network and have developed some good conceptual models of how it will have to work.

So the question before us is “Will there be farming pools on the SAFE network?”

The answer is clearly, unequivocally, absolutely “No, not in any meaningful way. In fact, any effort to create a pool of farming nodes will make it impossible to farm at all.”

Here’s why: The whole philosophy of the SAFE network is that each node follows a very clear and relatively simple (if complex) set of instructions. A certain type of input comes in, the node does a specific action regarding it. There is NO discretion. Every action is predictable according to the core programming. What’s more, all other nodes are operating on the same set of instructions. What’s more, all actions of a node are being monitored by a group of other nodes, which will note any departure from the expected actions and downgrade the offending node in trust level so that it is less and less trusted, and more and more marginalized, until it is ignored completely. Long before it is bumped completely, it will have lost any opportunity to earn safecoin.

While the individual user will be enabled to do fantastic things using his or her discretion, the nodes composing the network have no latitude to do anything unpredictable.

Therefore, for farming pools to be possible, they would have to be provided for at the base-level core programming, which they are not. (Though, in a way, the entire network IS already designed as ONE giant pool and awards are directly proportional to contribution. But sub-pools are not accounted for in the core programming.) Any group of nodes what exhibit human influence will be marginalized–by design. If this were not the case, the network as envisioned would not be possible. So nodes cannot collaborate to do anything different than any other nodes.

Therefore, anything like pooling could only be possible at least one level up from the core—i.e., AFTER farming has already occurred. For it to be otherwise would be a violation of the basic philosophical design of the network and it could not function in an autonomous way.

So, if those who create nodes wish to direct any safecoin they may earn to a specific address to be divided up according to an agreed-upon scheme, cheers. But that’s not a farming pool; that’s income redistribution, which is absolutely fine if done voluntarily. One can also run charity nodes (not a bad idea!).

People might wish to co-own nodes or share a super-fast internet connection, or . . . or . . . , but that’s at a different level and has no connection to a pool in the sense we’ve come to understand the term in the cryptocurrency space.

Nuf said.

vtnerd · August 12, 2014, 2:10am

Are you suggesting that the biggest factor in performance will be the time it takes to deliver to the end user? That would affect things, because crossing an oceanic cable is in the 200ms range (I think). So my proposed pool raced to beat a rotational disk, but it doesn’t matter because that disk is 50 miles from the actual target while I’m thousands of miles.

If the formula rewards quick access (likely), then it will make sense to have a quick response time regardless. If my group of low-latency machines can produce chunks faster on average than peers, I should still have a better chance at mining coin. That said @janitor and @dyamanaka are pushing me to think of creative ways to have decentralized pools to handle the geographic issue. I’ve got a few more ideas. I will type up some more thoughts on this after I’ve thought about it more.

I was proposing taking hits to my calculated score by not using space for infrequently accessed data. If a tossed chunk were to become frequently accessed for some reason (I think this is the major risk you are talking about), I could retrieve it from another node on the network that stored it. This is assuming I could replay an existing GET request with a different return address. Might be worth preventing that if it hasn’t been done already OR requiring a merkle hash be generated in a specific way each time for a mine request. The merkle hash would have to be generated in such a way that pre-pruning of leaves would not be possible.

vtnerd · August 12, 2014, 2:13am

Well, hopefully I’m not wasting bytes typing all of this crazy stuff up then.

I’ve seen ANT mentioned by you ( @dyamanaka ) several times now. Is this the new acryonym for the autonomous network described in David Irvine’s paper (from maidsafe.net), or is it something different?

fergish · August 12, 2014, 2:32am

vtnerd,

Before waxing too verbose, I highly recommend looking over forum topics for answers. One post or whitepaper is not enough to get an educated view of what the network design is, and so throwing out lots of ideas based upon partial data and other models just tends to get yourself and others confused. Check this forum for ANT tech and you can get the whole scoop. Look above to my post about pooling and note David Irvine’s like and make of it what you will, but don’t ignore what passes. Too much raw erudition without real interaction starts to look like trolling. Not an accusation, just an observation from a longterm denizen.

dyamanaka · August 12, 2014, 3:40am

It’s the same just enhanced, the original reference to Ant Technology is from MaidSafe. AFAIK, it came about while they were doing press interviews, talking about the Ant Lady and her research on ant (the insect) behavior. I created a topic called Ant Technology to see if we (the community) could make an acronym and start using it as our reference as to how the SAFE Network functions. It eventually led to a poll and concluded with the majority vote for: Autonomous Network Technology. I linked the topic so you can read up on it.

We have a lot of brilliant people on this forum with crazy ideas, myself included. I would not ask anyone to waste their time exploring if they weren’t convinced their idea was possible. My personal desire is to help innovate, evolve, and enhance Project SAFE. Sometimes that means challenging current ideas. I see your ideas as beneficial because, like it or not, others will try to make a pool, and if I understood how it’s possible, it may also help me protect the Network from possible abuses, and or exploits. On a positive note, it may even increase Network performance.

I’ve spent a good amount of time talking to everyone, and debating with them. As long as the conversation remains civil, I think its healthy to hash it out. We all have a common goal but differ in ways to accomplish it.

If you want to explore your ideas with me, you’re welcome to do so.

fergish · August 12, 2014, 3:45am

I feel properly (and appropriately) chastised.

Topic		Replies	Views
Network speed? Network data structures? Forever storage economics? Challenge-response authentication? Development	61	4882	September 21, 2015
The Perils of Big Farming Ant-Node (was Safe-node)	54	3451	August 31, 2015
A couple of Questions about Safecoin that's been bugging me Beginners	27	3219	October 16, 2015
How is Farming Centralization Disincentivized? Autonomi Network Token (incl (e)MAID)	33	5432	June 14, 2017
The Safe Network's economics Autonomi Network Token (incl (e)MAID)	33	3753	January 20, 2015

Will there be farming pools on SAFE network?

Related topics