Naming non-immutable data

jlpell · May 5, 2021, 12:08am

From a very very very simplistic perspective ( that appeals to authority) 512bit hashes have use cases as outlined by NIST. A lot of smart folks have looked at the situation and deemed more than 256bit to be necessary to cover edge cases.

Take a look at page 8.

The collision resistance is half the bit length.

Here’s another summary

dirvine · May 5, 2021, 10:50am

Yes, performance was what attracted me to look.

Yes this is true tag types do allow human meaningful identifiers, but I wonder about it a lot. Sounds great, but does have some consequences that I feel we are not seeing just yet.

It’s really

Thinking about perpetual data, that data is stored forever and will use its address space over time. It’s a fear of the IP4 mistakes perhaps?

Hmm, yes

Lots of ideas come to mind here, but most of them remove the hash(content) = name or adds another seperator (tag like) to the struct.

I think it does. If the addressing was able to grow dynamically. Perhaps 64bit to start, then increasing as the space got full or there was a collision (how to mark data as in collision?) or similar? i.e. all nodes hold data with 64bit add. On any collision, they rehash with 128 bit etc. The original 64bit name is network signed, likely no need to re-sign, just alter the addressing scheme (even in real time)?

Interesting issue, instead of a magic number/size we work on a way to make it dynamic but retain it’s original security (hash content == name plus network signed hash of the hash size when it was stored).

[Edit As data is addressed by it’s hash the it won’t matter if we use a 64/128 etc. address. If the Xor address can be of any size (which it can, but breaks some properties we would perhaps not need (triangle)) then this could work. I feel though the min size will need to be min 32 bytes as the nodes holding data are on that network and to retain the closest nodes have the data we probably cannot simply have addresses smaller]

jlpell · May 5, 2021, 11:32am

That’s a clever mechanism but seems like a premature optimization right now, no? Are you sure hashing is where the majority of time will be spent? Whether its 64bit or 512bit, seems like the time to hash is insignificant compared to other network latencies. This will be more true as simd hardware support is more prevalent. Kiss? Why not just go with 512 bit now for simplicity and stability and not worry about it again for 50 to 150 years? The only tag you would need is which 512bit hash function was used. (sha2, keccak, whirlpool etc)

We might even need more than 512 bits to launch according to these guys

dirvine · May 5, 2021, 11:41am

It is premature, what I am trying to poke here is longer-term “fixes” to running out of address space. The hash time is not an issue really, more the address space.

Yea, a way to dynamically upgrade these things will be nice. There are things like multihash etc. but to me, it’s not the mechanics of representing the data (hash) it’s more the “can we do this with no side effects”,

jlpell · May 5, 2021, 11:55am

How do you detect collisions efficiently?

Seems like the best/efficient/automatic way to determine a collision for a 64 bit hash is to hash again at 128 bit and compare. Same on up the hash sizes to use 512 bit to check for 256 bit collision. In the end, the highest available hash wins, or some other routine/authority needs to step in and determine if the inputs are in fact different byte by byte.

If that’s the case, a proactive network upgrade policy may achieve what you are after in an easier manner. Sha512 is the highest available now, so it’s easy to just go with it. When and if sha1024 hash becomes available 20 to 50 years from now, the network nodes can get upgraded to use that.

Focusing on efficient network upgrades can pay dividends and generate a lot of excitement.

mav · May 6, 2021, 12:55am

To me this indicates a need for flexibility (multihash as you say) rather than a larger address space. Or to really get to the heart of the matter, a need for change management (which multihash does fairly elegantly imo).

Haha let me save a click:
sha512(April Fool’s day joke) == sha512(No, we didn’t find a collision)

dirvine · May 6, 2021, 11:21am

I agree, but not convinced multihash as is will be enough. But love to look into it.

Topic		Replies	Views
RFC - Naming of ImmutableData types RFCs	1	1275	March 28, 2016
Self-deleting immutable data Features	2	842	January 17, 2017
RFC: ImmutableData deletion support RFCs	72	5568	September 14, 2018
How does one setup the newfangled prefix_map and PublicKey(random)? Support	5	495	September 10, 2022
Infinite longevity requires data to be unique and recyclable Development	9	892	July 13, 2017

Naming non-immutable data

Related topics