Pre-Dev-Update Thread! Yay! :D

Looks like a simple subset 30 of the standard confusables, which is nice to see.

I wonder at some later time there could be option in NRS versioning to see user choice of how those resolve. The default perhaps assuming what is wanted… minor wrinkle aesthetic… so long as permanence is retained for future resolving what was… if domain name ownership can be passed to new owners they might jave different interest than default but file under ocd for now. :grin:

5 Likes

Still NRS is basically a APP level data and this restriction would be in the applications. So user choice can be built into the applications like the browser

3 Likes

Yes that’s easier… one point of reference for each confusion and apps get accomodate interpretation if they want.

1 Like

I would expect that there would be a reference data object/file (for people to look up) with the standard table in it and libraries for various languages to process the name. Applications can then change the standard behaviour as they see fit.

It is not a core network issue, which I guess I want to ensure anyone reading realised. But an application layer issue.

1 Like

The change in PR #699 is mostly about characters that should never reasonably appear.

It started with just the space char (See #696).

I tried creating nrs with a non-breaking space to see if it would work, and it worked.

Then I tried with tab char, it worked.

I tried with newline, it worked.

So this needed exclusion for all char::is_whitespace, no big deal, easy to do.

Then we also have control characters, easy to exclude using char::is_control and there’s no reason to have control characters in any normal url, so that was obvious to include.

But then there’s also zero width space which is neither whitespace nor a control character. So we need some special exceptions. There’s 30 of those that are pretty obviously never going to be part of any url. These were manually selected from the unicode character database.

This is the point the PR gets to. It has the ‘obviously not gonna be in a url’ chars.

But we can (and probably will need to) go further.

There’s the character U+00C0 Capital Letter A with grave À vs U+0041 + U+0300 Latin Capital A + Combining grave accent À - looks identical but hashes to different xorurls. But both are linguistically valid and useful.

There’s also homoglyphs, U+006F Latin Small Letter o vs U+03BF Greek Small Letter omincron ο - looks identical, but hashes to different xorurls. But both are linguistically valid and useful.

Some questions are

  • how are people entering these variations?
  • how do we display these variations to people?
  • how do we convert the unicode to bytes?
    • percent encode?
    • punycode?
    • upper or lowercase?
    • utf-8?
    • utf-16?
    • something else?
    • do we use unicode normalization? which one?
  • how do we convert these bytes to an xorurl? (this is already solved)
  • is it possible to convert to and fro each representation?
  • what are the risks and benefits of allowing confusables / homoglyphs / symbols / emoji etc? I always remind myself “I can only read English so would never be able to use Kanji urls, please do not limit your software to Kanji only”, then flip the two character sets to see that ascii is not adequate.

There’s a lot written elsewhere on this topic. This post is a bit of an intro to see that nrs beyond A-Za-z0-9 has some tricky situations. It could easily be 10x longer…

17 Likes

@joshuef it looks as though the push to adapt the new lexicon into the code is underway. There is the recent change from Money → Token but there is still mention of “safecoin” and MAX_COINS_VALUE etc.

// The maximum amount of safecoin that can be represented by a single Token
const MAX_COINS_VALUE: u64 = (u32::max_value() as u64 + 1) * COIN_TO_RAW_CONVERSION - 1;

Do all of these need to be changed? Would seem like a tedious and hefty overhaul.

What made me wonder in the first place is it would seem reasonable to change “safecoin” to “Safe Network Token” along with the transition from money to token.

4 Likes

These are v quick changes with the rust toolset. Basically in vscode or similar select symbol and rename. So we’re OK renaming symbols, small pita is it needs done in several libs at once, but your looking at an hour max work to do that.

3 Likes

as @dirvine said. Shouldn’t be too much bother. And where’s there’s no link it can be done piecemeal :+1:

5 Likes

SNT is already exist :slight_smile: Status project. Imho SAFE is a cool ticker for the new network

4 Likes

For fork sake

We need to acknowledge what exists.
Unicode confusables are formally defined and resolved with https://www.unicode.org/Public/security/8.0.0/confusables.txt

I wonder necessarily, NRS registration should follow the Unicode confusables table regardless and force many to one instance. … and then, everything else is app level aesthetic option for future.

Having just suggested “should” then …

Is an assumption - so approach with caution…

This is fair for urls

and the newline character too.

However, beyond those I would hesitate - even with whitespace! All characters in Unicode exist for a reason?.. so, U+205F [Medium Mathematical Space] might matter to some just for the asthetic?

Some questions were:

  • how are people entering these variations?

does not matter.

  • how do we display these variations to people?

as the owner of the url prefers… as the user of the url prefers (follows from suggestion below; and nicely in keeping with empowering users)… and/or a default, if there’s not an option to lookup that preference.

So, I wonder perhaps

safe:/fr-ca/ᑶeᑶᑶers
safe:/en/peppers

or similar could suggested at app level, for which context is preferred - or better perhaps some block level unique reference https://unicode-table.com/en/blocks/
… and safe:// perhaps the usual unicode default… available as an option for all.

  • what are the risks and benefits of allowing confusables / homoglyphs / symbols / emoji etc?

So long as each string resolves to one instance at registration via NRS, there is no confusion - it’s just difference of opinion on how that string is presented.

:thinking: … undecided about capital letters… always preferred lowercase myself but old unSafe internet has a legacy of confusion on that dependant on the host registrar… there is confusion in English even for I(i) and l(L).

Yeah we need an organic alternative. SAFE is available as a ticker? SN is another viable suggestion as well. People revert to SNT but Like you said it’s status network so I twinge a little bit every time I see that disseminated.

Probably best not to announce the ticker at all until formalised! :sweat_smile:

12 Likes

The boys are working late today.
No predictions from me though, gave that pastime up.

11 Likes

Lot’s of commits all around. Sure sign of progress, but then again there may not be enough time to check that the whole is stable enough for release. But maybe there is a stable point somewhere earlier and things happening now are already improvements for the future? Who knows.

2 Likes

Come on @Josh third time lucky :wink:

4 Likes

Maybe if @happybeing goes double or nothing :grin:

3 Likes

You’re on. Just hope you can pay up :muscle:

5 Likes

ill be holding @scottefc86 liable, i was clean for 2 weeks!

5 Likes

image

4 Likes