Wouldn’t it be easier to require a payment that is the sum of all bids in the close group?
I suppose one question this raises is what is to stop the system becoming one were the nodes are setting the price and we end up with what the market will bear instead of lowest reasonable pricing to enable access to storage for ALL and not just those who are rich enough. And it allows one or more nodes to profiteer off people with say 10 times the price of the others.
To me we need at least the group agreeing to a price for all nodes charge for that chunk. Be it an average price, or a larger group agreeing regularly to a price until next consensus on pricing. (be it a delay or set number of events). That way one node trying to profiteer will not succeed the way it wants to.
Much harder to have all 8 nodes in the group trying to profiteer and a lot less likely all in a larger group of nodes. If all node operators want to charge too much then let it be but the network will obviously die from no one willing to pay the price
We should have made this more clear. The client takes the best average (cheapest) from a majority of nodes. So outliers can be ignored.
Fantastic update Maid team! Thanks for the hard yards this week. Good to see some progress on payments too.
Seems/feels we are closer than ever.
Cheers
It’s still early doors here. We’ve very much just getting something in place which we can then have at. So Qs here will be good!
We’ll be sampling across the network, so folk could artificially inflate prices, but the overall trend will need to be in that direction for it to have an impact. (Each close group, atm supplies 8, and we take the 5th in the ascendant order… so 3 of the nodes will have to be at it to even register eg).
No, atm, we’re just going for simplicity to get us off the ground.
If the price is one they’ll accept, yeh.
Right now, it’s by a close group off of a dbc id. PR is pending to do sampling at startup, and intermittently update it. We will of course be tweaking how often that needs to happen etc to get closer to the optimum. And if we can get something simple to go for a lower bound, then we’ll do that. (Atm though, that involves nodes signing, getting PKs, thresholds and a lot of wee moving parts that would probably make it more unreliable than useful, I think).
Close groups change, and then we’d have to be verifying sigs and bids. Right now we just say something like “will you accept this?”, and that’s it. They can store it, or choose not to. If enough other nodes take it, then it’ll get replicated anyway and they’ll be forced to store it. (force here imagines we have fault detection, which we dont yet, but will)
Okay this may sound like a stupid question but I have never understood XOR space or “close groups” or any of that so… here goes. I assume those addresses are IP address. If not then please specify and redefine. If so how are those IP addresses aquired, sorted into a range and how does that relate to the IP scrubbing that is used in SAFE security protocols? Basically I’m confused, if IPs are scrubbed from the network how are they then used in sorting data costs? I’m sure there is a logical explaination but… it wasn’t clearly explained here and I didn’t get the memo.
They are not IP addresses but 256 bit randomly allocated binary addresses, one for each node, set when they start up.
xor space is a way of calculating a distance between those addresses (nodes).
Another would be to count how many bits two addresses have in common, for example. Forty bits the same, distance equals fory and so on.
But a much better measure is to calculate the ‘xor’ (exclusive OR) value of the two addresses, which gives you another 256 bit value which is the ‘xor distance’ between the two addresses.
If you want more detail search. There will be much more detailed explanations available including examples to illustrate what ‘xor’ does in practice.
If I read it correctly he was asking how does the client know the IP address of the nodes that need to store the record.
Basically the client knows the XOR address of the record, but how does the client find the IP addresses of the nodes in the group.
Yes the question wasn’t clear but I think that is the essence of the question, the knowledge he was after
Will the data still be spread randomly enough around the network so it don’t get skewed and end up at few entities, which if they go down will risk the network?
I like the proposed approach beacause it is supply and demand based and don’t need artificial incentives, human tinkering or tricky algorithms and it will handle price shocks well. Just worried about data getting spread evenly across the network and to many different entities. But it feels like if you take an average of a group except outliers then that would give a good random spread of data across the network,
With such a robust approach it now gives me the feeling that nothing will stop the Safe-network from success. Only further wish for that there were some way of accepting nodes who might temporary lose connection. Only scenarios I am a little worried about is if example solar flares or countries like Russia cuts it internet from the rest of the world but hopefully the network will survive such events.
I don’t think there is any scrubbing. At one point, there were going to be ‘hops’ like Tor uses to hide IP, but that’s out of the picture for now as it slows everything down.
This is good because not everyone will want to use a hop-layer of anonymity and would slow down the network as a whole by a considerable amount and for many users that would dramatically reduce it’s utility.
Such hops can be added later though and could conceivably be added within the network by a third party proxy provider too … or possibly you could operate your client or node via a regular proxy (VPN) of some sort to achieve the same effect.
Also XOR addresses are linked to IP addresses on the network - they must be or there’d be no way to communicate.
– all just my opinion, I’m not an expert on the details here, but my understanding. Hope that clarifies things for you.
Idk. Pretty sure I’d seen mention of relays for nodes connecting from home through a NAT. Maybe that is unrelated to IP scrubbing though. Perhaps I’m thinking of proxy nodes as you mention and doesn’t relate to relay.
Yes, you are right. The relay is more a hole punch internal. We can ignore that and just imagine all nodes can connect to any other node. Clients cannot (yet) connect to any client.
Proxies though are possible, even service providers who perform vpn and or tor type connectivity will certainly be possible.
Essentially yes though I wasn’t clear if it was the xor address or IP address being referred to. This is also why I ask these questions to point out these clearification issues. In short I want to know how the info gets from a to b.
I am unsure too
Back in the day it did it by hopping. That is the client sent the chunk (record) to the node closest to the final set of nodes that will store it. Repeat node sending chunk to a node closer, until final nodes reached and chunk is stored.
The client connects to many nodes (20 I think now) and when hopping was the method then one of those 20 is closer to the target. And by hopping the chunk will finally reach the group of nodes to store it.
But without hopping then I also do not know the method used.
The connection between XOR address and the nodes IP address is fairly simple. Each node will be known in the Safe Network by a XOR ID and for internet routing it has the normal IP address. In order for the client to talk to the nodes it still has to use IP address and the nodes a client is connected to will be stored in a table (XOR-ID, IP address) within the client
This is roughly how I think it works it used to work, but not any more (see David’s reply below):
-
each node has a 256 bit address and knows the 256 bit addresses and IP addresses of its close neighbours
-
when data is sent it has an ultimate destination, which is a 256 bit address
-
the data is passed from one node to another until it reaches the destination, with each node passing the data to the node from its list which is closest to the destination
So we don’t hop any more.
The client or node will ask a group of nodes it knows are “xor closest” to the destination. They reply with a bunch of nodes IDs (xor names) and IP:port combinations. From that they then sort closest to the destination and choose the top X (8) nodes.
This repeats until the results are the same in 2 itterations, i.e. we cannot find any closer nodes. At that point we are speaking to the correct close group.
###############
Why did we used to use recursive (hop) approach instead of this (kad default) iterative approach?
- This was mainly because kad as a uni experiment was great, however no NAT traversal and each node was considered directly connected.
- Nodes connected, communicated and then disconnected. The reason being was open sockets. We could not connect more than 500 sockets at once or many modern OSs would crap out.
So we had this approach of recursive hops. The reasons were
- We could hole punch a few nodes and connect primarily to our close group
- We could get away with fewer connections, thereby allowing nodes to connect to just enough nodes (and no more)
Now though, with stream multiplexing we can connect to thousands of nodes at a time. Also we can hole punch just one connection and be OK (mostly, networking nonsense and gets quite deep)
So with stream mux we can use default kad behaviour, which is much much simpler than a recursive network.
i.e. The tech was not there to do real defaul kad and now it exists. So we don’t need sections and prefix based network groups. We can be much more efficient
Is a side effect of this to expose IP addresses much more widely?
Because with hopping, a node only gets to know about the IP of the nodes it forwards messages to, whereas now a node can rapidly discover large numbers of IP addresses by querying one group, then the next and so on.
If so, doesn’t this make censorship of the network by state firewalls or ISPs easy?
EDIT: it feels like it makes a number of attacks much easier. For example, targeted DDoS of files by identifying the groups close to a single chunk of a targeted file.
A related targeting risk is infection by malware of IPs hosting a chunk from critical files. Remote malware infection seems to be getting much easier and with the ability to target all the node hosts of a given chunk, seems to be a cheap way to sabotage a large number of files selected for a given purpose at a crucial time.
It does.
It can do, but we can use tor etc.
This cannot happen as clients and nodes check the chunk is valid (i.e. hash of content == name they requested)
It’s kind not a very good safety measure. i.e.
Say we have 1milliion groups
An attacker can create 1 milion keys and request to join each group.
Bottom line is we need better than obscurity to protect nodes, but tor and many others are also working on this one.
This file storage payment mechanism appears to be inconsistent with reality. A pay once for indefinite storage is an SFN (something for nothing) system due to the indefiniteness of data life. Since this is not supposed to be a big-data venture, it would make a lot more sense to be free if acceptable transfer rate storage is provided by the requester based on their desired redundancy. If only 4 copies are desired, then 1/4 the amount of storage provided will be available. If they cannot provide this directly, they could use an internet service to accomplish this for them. Otherwise, some kind of subscription would be required to store data based on desired redundancy to compensate for the perpetual nature of the storage. To prevent commercial data farmers from dominating the system as their business (centralization of storage), a limit on the amount of provided storage vs. requested storage could be imposed per account (this is a major self regulatory contribution), or restrict them to handling subscription storage only (data warehouse accounts so they are known to “be in the data business”). There needs to be a quid pro quo for the life of the data. Failing to maintain available storage to compensate for used storage would need to revert to subscription fees, with some grace period. Access charges will not work in this scenario since data can be stored then abandoned, which incurs perpetual costs. The system must be sustainable. This kind of scenario allows everyone on the network to make a contribution to controlling storage costs, which cannot be avoided (though some may be offered without compensation), and also controlling profiteering and storage centralization. A provide-to-use mechanism allows the system to self regulate and seems to be (IMHO) consistent with a distributed system. The subscription part becomes the difficult component for equitable distribution of subscription fees, which could only be available for actual storage, not capacity. This then self-regulates too much unused capacity for sale (no compensation). Maybe I’m missing something here, but a system cannot run on altruism indefinitely and experience large growth (possibly exponentially).