Safe Network Dev Update - August 27, 2020

JPL · September 2, 2020, 12:01pm

So the hash of the name is the address? How do you avoid collisions?

dirvine · September 2, 2020, 12:08pm

If it exists you cannot create it. A bit like DNS.

There are other approaches as well so hash of owner + type is much harder to collide, but collision avoidance is as simple as try to store.

JPL · September 2, 2020, 1:03pm

I get that with say a person selecting a URL for a website, but what if an app wants to generate a mutable data item? Does it just have to keep on trying til it finds an address that hasn’t been used, or must app developers be persuaded to always use very long identifiers for data they create to lessen the risk of collisions?

david-beinn · September 2, 2020, 1:06pm

Another quick question on this subject. Is there always just one level of xor addresses, or can there be multiple levels differentiated by type-tag?

so for example is it possible to have

123…xyz : type-tag 1500 and
123…xyz : type-tag 1501

as two separate addresses on the network? Or is the type-tag just to give additional information about how the data should be processed?

bochaco · September 2, 2020, 1:09pm

Currently the name is the address in types like Sequence (also in previous AppendOnlyData), and it’s chosen randomly if the user doesn’t provide a xorname. E.g. at the moment in CLI providing such optional xorname is only possible for Sequence creation $ safe seq store --xorname <xorname hex> <data>. You should get an error if you have a collision, eventually we can have the CLI to retry with a new address in such scenarios.

happybeing · September 2, 2020, 1:51pm

This is how it used to be (yes, reach tag refers to a different xor address space), but we’ll have to wait to see how the new data types will work.

Yes, but it isn’t as bad as it sounds:

you can convert short names into long, effectively random strings (cf. DNS)
the address space is so large that if the addresses are reasonably random, clashes will be so rare you are unlikely to ever encounter them (although all good programmers should cater for them!)

JPL · September 2, 2020, 2:03pm

eg by adding a salt or randon number and hashing?

david-beinn · September 2, 2020, 2:30pm

Thanks again Mark,

Just brainstorming, but that might theoretically allow a type tag where the first part of the address was derived from a person’s public name (and thus reserved for only that public name,) and the second part derived from a file name. This could allow a type of ‘flat’ filesystem that could be accessed without ever downloading an index of pointers.

So if you had a public file called ‘useful-information’ at {hash of ‘Mark’} + {hash of ‘useful-information’} , anybody could access that by knowing the formula, but nobody could squat it either.

happybeing · September 2, 2020, 2:52pm

If people can discover your method they can squat, at a cost, so it isn’t a solution for that. Only for creating a way to look data up in a very large address space. If you want to prevent squatting you either have to keep the mechanism secret (not possible) or have some kind of gatekeeping (e.g. lookup in an index which a squatter cannot write to).

david-beinn · September 2, 2020, 3:14pm

The idea is that with a particular type tag, the network would prevent anybody except ‘mark’ from storing information at an address beginning with (hash of ‘Mark’). Same for addresses beginning with (hash of ‘David’) on that type tag.

Whether that’s desirable for the network to do I don’t know, but I can’t see that it’s necessarily impossible.

happybeing · September 2, 2020, 3:29pm

Its not impossible, but would require this to be built into the network. This isn’t what the old API allowed, so you could create a topic to advocate for this.

david-beinn · September 2, 2020, 3:34pm

Yes, maybe it would be worth making a proper suggestion.

For a while there I was thinking type tags didn’t have that power to open up a whole new address scheme.

danda · September 2, 2020, 5:04pm

For a first implementation, I’m just planning to keep the design as simple as possible. After that is working, we can think of possible optimizations.

Do you mean immutable data files? The current thinking is that inode metadata entries in the tree will store an XorName representing a file. This is actually same/similar as in the current FilesContainer design. We can’t hash the XorName to s smaller (eg 16 bit) value, because a) that loses data so we couldn’t actually find the file and b) chance of hash collisions.

dirvine · September 2, 2020, 5:13pm

It “could” store the data_map itself, that can represent a dir or a file. It will be unique (hash of content) so stored like a blob. We just need to watch for files smaller than 3Kb though. We have a few options, but just actual encrypt of those may be enough? Needs some thought.

danda · September 2, 2020, 5:45pm

Yes, definitely more thought/design will be needed here, but we know we have some options. I plan to look at that in more detail once basic fuse+file_api+crdt_tree is demonstrated.

Content of very small files could even be stored directly in the inode. In this case, they would not be directly accessible on the network via SafeUrl, so that’s a tradeoff.

davidpbrown · September 2, 2020, 5:57pm

ooh… that just sparked a thought.

Could the network handle a url type that leads nowhere but either exists or does not? xorurls are almost long enough to be useful as data in and of themselves??.. I don’t know about making them per user, to avoid clashes but those could be very useful, if they were of the order of 64B/chars available as simple data points.

A risk perhaps of those being dust and spam …depends how it was implements… assuming that it could be.

david-beinn · September 2, 2020, 6:16pm

No, I meant mutable specifically, on the basis that we can choose the address of mutable data when we create it, as opposed to immutable where it’s obviously dictated by the content.

The requirements of the use case I was thinking of would be quite different I suppose from the filesystem usage, which needs to be pretty flexible. I was thinking for the example of a search index (inverted index) the tree could be quite static, but point to mutable containers where all the action would take place.

If you’ll bear with me I’ll try and explain what I meant, though I may be hopelessly oversimplifying the idea of trees - I’ve not quite got my head round the different types yet! But here goes, to store the words A, AN, ANT and AND:

               A (27)

       N (16)

T (34) D (58)

I’ve just put numerical 2 digit numbers here, but theoretically 2 bytes could give us 256x256 random possibilities for the value at each node (I think.)

When the new file is created, we create its address by hashing eg. AND + 58 + public name of owner, therefore no information loss, at least not in quite the sense you were thinking.

To finish off, I was then thinking that after the file gets to a certain size, new nodes would point to new trees, instead of straight to the file, to keep the download size down.

Probably nothing in it as an idea, but thought I might as well explain what I meant at least!

danda · September 2, 2020, 7:53pm

@david-beinn thx for the explanation.

So iiuc you are contemplating using a crdt_tree to represent the actual content of the file. Thus far, our design is only using it for the directory structure and file metadata. Partly this is because SAFE already has infrastructure/design around immutable data infrastructure, so it seems best to leverage that.

In a fully clean room design, all avenues for storing file content could be explored. There are various algorithms that have been designed for collaborative text editing that might work for ascii files at least, eg: logoot, LSeq, treedoc, Woot, RGA.

I’m not ruling anything out at this point.

david-beinn · September 2, 2020, 9:14pm

I guess you could look at it like that, but the crdt tree ‘file’ is only being used to point to other files, in the same way that a directory structure is. The files it’s pointing to could be any kind of (mutable) data.

Perhaps I confused things by saying store the ‘words’ when these are really more akin to filenames in the way I’m thinking. In the search index example above, hash of (ANT + 34 + publicname) leads us to an address where the (mutable) file ‘ANT’ has a list of all the websites containing references to ants.

Obvious problem for a lot of use cases is that ‘34’ has to always be the value of that node, because the location of the mutable data file stays constant - if it was a file structure you would never be able to move a file within it. To put it another way, the location of a file is intrinsically derived from and linked to the ‘signpost’ file.

I was generating this idea when I’d given up on being able to get to addresses more directly by hashing. I think perhaps for the kind of use cases I have in mind, the following idea might be more promising: (quoted from my post above)

JPL · September 3, 2020, 7:54am

Thinking further on this, does the fact that every data object requires a unique address on the network mean that using namespaces become impossible because effectively every variable, or at least those that are written to disk rather than held in memory, is global?

Topic		Replies	Views
SAFE Network Dev Update - August 20, 2020 Updates	79	3665	August 29, 2020
SAFE Network Dev Update - August 13, 2020 Updates	31	2387	August 24, 2020
Safe Network Dev Update - September 10, 2020 Updates	31	2904	September 30, 2020
Safe Network Dev Update - September 3, 2020 Updates	23	2864	September 30, 2020
Safe Network Dev Update - September 17, 2020 Updates	38	3178	September 30, 2020

Safe Network Dev Update - August 27, 2020

Related topics