Data, information and knowlege: does AI change the game?

Well, then… what if I consider invaluable to store raw data. :stuck_out_tongue:

But seriously, what if I want to store the sequenced genomes of every species on Earth, as a backup I case we need to repopulate the planet?

It would be invaluable to have raw data stored in a decentralized storage network that will safekeep the digital ark of life.

I would want to seriously backup myself in my private storage, and it may take 860MB compressed.

Even having public raw data would be fascinating combined with the power of linked data and the semantic web. I thought Solid and the Safenetwork was a match made in heaven, I really hope we don’t deviate from that vision.

3 Likes

I could imagine a future SAFE-AI that you can feed raw data, which it then incorporates but does not store. It could then give you either the derived knowledge upon request or even an estimated copy of the original data based on the knowledge it derived from the data.

So could be a major advance in ‘data compression’ to use that term loosely.

edit: the data can be anything also - doesn’t have to be scientific data, could be even a fantasy book series. What you’d get back upon request would simply be the AI’s representation of it.

Given that the range of all data that could be stored in a data-only network is effectively all data one can EVER imagine as society moves every onward … it seems that we may want to adopt the knowledge only store model in the future. Even if we can store every bit of data, how useful will that be relative to the knowledge-only-store?

Once we are multi-planetary duplicating all data will become a chore, but duplicating a knowledge-base should be relatively easy.

edit2: in constructing a knowledge-only-store model … are we creating a god?

2 Likes

This is a key to my thinking right now. I see AI and in particular truly personal AI as inevitable, but I feel this is a massive moment for SAFE and more important now that ever. This focus is, in my opinion, much bigger than a massive data store. Not that a massive data store is not also important.

All about focus and quick to market really.

I think humanity will certainly create the new source of truth, and we need to make sure it’s our God and not a God as defined by some corporation.

8 Likes

We have come full circle here. Human memory is lossy and unreliable so we made computers to store everything perfectly 1:1, but the amount of data is too much, so we made AI, that does lossy compresion over the data.

I agree there should be some mechanism like this. There are huge amounts of data that are worth storing for a while, but became useless after short amount of time (logs, security videos,…). Storing that forever is not wrong, but the ability to “forget” will make the network more resilient in case it ever comes close to full and also it could play a role in the economics, make storing new data even cheaper.

4 Likes

I think that should perhaps be archive instead of forget, perhaps? The ability for the network to split data into current and tertiary (archive nodes) may become a requirement. Also though data storage tech is still improving a lot and there are many technologies who talk of storing the worlds data on a thumbnail size device. So archive could be a powerful mechanism to keep the data we don’t currently know is valuable?

All interesting stuff

5 Likes

Still brainstorming means not to much value in critique yet so I won’t respond to the what data is valuable aspect now.

Thinking for a moment about what this change might look like (who decides, how to charge etc) and how this changes the nature and philosophy, and perhaps even the fundamentals of SN. This change raises a lot of questions and creates some tricky problems, technical and non technical. So to do this now also creates risks.

Which leads me to ask why. What is the problem this is intended to solve, and how serious is it? Because it seems much harder to do this, and more work before launch, including implementation.

Maybe the problem is economics and or ease of adoption (competition?), but I’d like to understand whatever it is because it seems that it must be serious.

If it’s not important for a successful launch then perhaps it should be pushed down the road. All data being perpetual if stored before such a change is implemented and only data after that change can be none perpetual.

2 Likes

One of the reasons is psychological, I would king of feel bad for storing forever some temporary files even I have paid for it :smiley:

Other thing that comes to mind is all the people saying the economic model of perpetual storage is not viable cold be answered with “It will work, but we have failsafe even for this unlikely case.”

What is the current state? Does a node have some absolute or relative way to tell how old a record is, how many times it was accessed, or when was the last time?

2 Likes

I can’t imagine that much if any of this is pre-launch. I suspect David is speculating on what comes after launch.

1 Like

It would look like the safe network as of now, but with no NRS. So folk store what they want and share links to it, but registers cannot be stored at an address we choose.

Right now we can have NRS where say safe://internet.archive is stored. But that means

  • We create a publishing network (regulators all over us)
  • We have a situation where that address is a target for malice (takeover and replace)
  • Those registers are not cryptographically secured (i.e. the name is not the pub key or similar but instead a mapped name)

So far we are saying data is perpetual if stored on the network, however the thought some can be offloaded to archive is not saying it’s deleted, but perhaps not in primary storage.

If we make ALL data addresses deterministic and self validating then we are a stronger network in my opinion. What that means is

  • People store what they want and it’s private and secure
  • Folk can share xor url’s to whoever they wish
  • If 2 folk store the internet archive, the network deduplicates that
  • If people share links to data, we must ensure it can be done anonymously (this is where simplex may help us a lot)
  • The data shared (xor url’s) should not be able to be linked to another user unless they explicitly want to prove it was theirs (signed).

Mainly confusion and removing the confusion. What I See is conversations about the network itself in some way being the publisher of vast amounts of data, but in reality it is a network of humans who share data they wish in a secure and perpetual manner that cannot easily be broken. Also from a marketing viewpoint or messaging viewpoint I would prefer us to be extremely clear, this is not a network where maidsafe or company X publish data. It’s a network of humans who together protect the worlds data.

It’s subtle, but instead of maidsafe stored X zettabytes of data, we have the SAFE network users have stored X zettabytes.

Technically it may be all we do is remove NRS And let folks create something like that, if they wish.

Politically or message wise, we focus on the user private data and API, making that the almost sole focus of the dev API work.

Sorry if I am confusing matters here, it is still brainstorming, but I think these conversations are important.

6 Likes

I should say, the recent AI works have definitely made this focus on users much more of an issue. I am certain we want easy access to correct information and I am positive that is happening with AI. However I feel it’s now well beyond critical users own private data is kept well away from any place that a global AI can ingest that data.

If we think advertisers and bots influence us now, then I fear we are going to see the total manipulation of the human race in the not to distant future with AI. So the focus on that private data being private and users being anonymous is probably much more important now that ever before.

However, to balance this, personal AI that does ingest your private data in a.vault owned and controlled by you is a very powerful device.

So we protect users data from global and corporate AI data ingestion, but promote and enhance the use of personal AI tools.

It’s a changing game here and it’s one I think we need to be fully aware of and not only ready for, but solidly in front of it.

6 Likes

Right now, no, but holding data in an LRU cache or similar can mean when they fill to 90% or similar they offload the bottom 10% to archive. If they all do that then we are likely in a good place.

4 Likes

Interesting point about Maidsafe,NRS, and being seen as a publisher. I hadn’t considered that before. Personally I’d been in favor of NRS being a user-level browser plugin in the past, so that seems to be what we individual users can do for ourselves down the track.

1 Like

This seems different to what you were proposing earlier, which sounded like some data would be perpetual, that is private data, and some not, that is public data. Hence much confusion in early replies - might be worth clarifying the earlier posts.

So:

  1. all uploaded data is perpetual, private and public, though at some point archiving of some kind will probably be introduced (ie no change here).
  2. NRS is not implemented to avoid potential political issues for MaidSafe and anyone else using them.
  3. Added focus on anonymous communications to protect those wanting to share links to data, related to the risks behind 2.

Consequences:

  • network responsibilities are reduced to hopefully apolitical storage and communications reducing risks for Noderunners :tm: :stuck_out_tongue_winking_eye:, MaidSafe et al. I wonder how realistic that is, or if the network is seen as a threat which it surely will be, whether this has effect.
  • removal of NRS is designed to shepherd users away from risk and to reduce chances of sabotage/takeover of valuable human readable addresses, but also has the effect of making the network harder and less attractive to use and less functional (no browser, no publishing standard, no websites for the time being, and probably no versioning/accessible archive).

I imagine the implications are bigger than what I’ve written, as those are just first thoughts bashed out on my mobile, but I hope it helps to clarify what the issue, solution and implications are.

It’s still a massive change, less clear whether it adds work (chat) or reduces it (NRS/Browser). It simplifies the offer but reduces the value of the offer and risks dramatically reducing general appeal. For example, sharing xor links is going to alienate almost everybody IMO.

So still big risks and not yet clear if the problem it is intended to solve justifies this, or indeed would be solved. For example, won’t governments come after this base network just as much as one with NRS?

Is the issue that the network as designed cannot protect the envisaged NRS?

2 Likes

I think all of these are still in play. Folk will share (publish) to addresses that are not human readable so much, but perpetual and unable to be taken over.

I see it more of a solid move to ensure integrity of what we are addressing (in terms of web address).

It clarifies what you are thinking for sure :+1:

Will it though? I mean how many folk actually care these days about a particular web address when sharing content? i.e. you see in browsers the link shows only the TLD and not the whole address. Folk share short urls and similar too. I am not sure the whole human name addressable web is how the majority are finding stuff these days.

No answer, I am saying I don’t really know, it feels like it would be worse, but I suspect in reality it’s not as important as we think.

Security wise it’s a bit of a nightmare (phishing / takeover etc.) that we don’t seem to easily fix. So maybe not doing it means not fixing it.

The question will be, can you find what you are looking for if you are using the old school reading through links and papers, or will folk just ask their AI? Or perhaps even without AI do the newer generations share web links as names they say to each other, or do they just share links from app to app?

I think governments might still come after companies and developers, but maybe not so readily, but it’s really a secondary concern of mine at least.

4 Likes

Which browsers, all the ones I use show the whole URL? Regardless, it isn’t comparable because you are talking about the current web where search engines do most of the work for most people when Safe Network won’t have that for some time. When it does, it needs to be decentralised or we’re broken (same for “AI” in that role).

Browsers can help with this once you’ve visited a site, but until then you are stuck with something you can’t type in manually which means links have to be clickable or copy/pasteable and open in SN without leaking data, which means only from well built SN apps such as a Safe browser or Safe chat.

I also think you underestimate the value of memorable type-able URLs. Most use doesn’t involve this but an awful lot of sharing still does. I use it a lot when sharing information or sites. That can also be helped in the browser (e.g. I type a name and it translates from my ‘petname’ to the URL when posted).

If I see an xor URL in Safe chat or on Safe social media I have no idea what it is and whether I want to click on it.

One day maybe but we’re talking about from day one so it will be a hard sell, affect adoption badly and introduces powerful ways to capture and centralise.

Much of what you describe is valid for the current centralised web rather than Safe Network, which must be self contained and decentralised or people won’t be able to use it, and centralised services (e.g. Google) will tempt them into using it in an un-Safe way.

2 Likes

I agree with you 100% @happybeing

We need it full featured, and make it user friendly, otherwise I don’t see adoption happening anytime soon.

I think that @dirvine is worrying about use cases and risks of AI that are not really immediately needed, it may become relevant in a decade but it won’t really matter for the launch of the Safenetwork, and the worst part is that even if those risks are serious, I think that it will not move the needle in terms of fueling adoptions if we focus on building the narrative that it will “protect you from AI”.

We are falling again to the same problem of selling security as the main selling point, a value proposition that unfortunately people stopped caring about it long time ago. What most people care is convenience of use. If the side effect is that everyone who uses it gets the benefit of being ultra secure, it is an added value, but if it’s not, oh well, so be it. That is how most people think. They will never sacrifice convenience for the sake of a stronger security.

Let’s see the case of WhatsApp.
Whatsapp got massively successful when their chat was still sending it plain text. People clearly didn’t care about security, they were downloading it because it was practical, fun, useful and because other people used it. The news were reporting that WhatsApp was extremely insecure and people could use Wireshark to sniff any WhatsApp messages in any public wifi and read it all. People still didn’t care, their adoption still skyrocketed.
It was almost a decade after launch that they implemented the encrypted Signal protocol. Most people still don’t fully understand the significance of that and don’t care.
Moxie’s mission was to make security a standard and he made deals with the biggest companies out there because he was on a mission of securing communications.

On the other hand we have the actual Signal Private Messenger, it was really struggling to get any market share with their own app. Their main selling point was security above everything else, the chat app was quite dry, and it was such a niche that only security enthusiasts were using it. The tide started to change just slightly when Brian Acton quit Meta and joined Signal, that’s when some “cute” non essential user friendly features started to appear in their app, such as posting stickers and giphy gifs (without metadata).
But adoption didn’t increase because they announced that they scrapped metadata, it was because they were stickers, cute and expressive. The lack of metadata is just a cherry on the top (a cherry that most users are not even aware of)

Security does not drive adoption, but whatever drives adoption can be exploited to bring security to everyone.
And the thing that drives adoption is convenience, usefulness, practicality, intuitive UX, network effect and yeah, “cuteness” as well.

Using only XOR addresses to share files for the sake of privacy it is a nightmare scenario for adoption, no normal person will use the Safenetwork, other than security nerds like us.
It not only reduces the usefulness of the network, it becomes extremely user unfriendly.

Saying that this impractical network will save them from the evil data mining from AI companies, it won’t promote adoption, no one will care.

On the other hand, if you promoted it as a fun and easy way to share pictures and stories to your friends and family which, by the way, they are secure and private (among other features), that will be perceived differently.

Practicality and ease of use must never lose its priority, that is the only way to sideload security to everyone.

15 Likes

Perhaps if a separate person or group made a browser with built-in NRS, then the concern there goes away.

I get David’s concern that if Maidsafe is managing an NRS system then they are acting as publishers and that will invite regulation, complexity, and who knows what other troubles.

Also SafeNetwork isn’t going to just be Maidsafe’s baby in the future - people and orgs everywhere can build and develop on it, so not every feature needs to be on Maidsafe’s plate.

In any case, the team needs to be aiming for MVP and launch and afterwards additional features and tradeoffs can be considered.

Hot could it be said that MaidSafe are “managing” this any more than any other feature?

I’m not saying this doesn’t make them more vulnerable but calling it “managing” doesn’t make any sense to me, and David has said that this isn’t really a significant concern.

1 Like

Very interesting convo :+1: (bouncing about today but some thoughts, not for or against, but more thoughts)

Just to keep this moving along. Are we saying to share a file I need NRS and a made up NRS for the file to be considered a simple thing to share?

I find with URLs to share files I end up with https://mega.com/34095w4nbjs892ur4sdfhqsdf/ anyway? (I cannot see the filename)

Or do we mean to get to a “trusted” source of info like twitter.com or facebook. com etc.? (I use trusted sarcastically btw). Or perhaps even https://research.og (say an actual trusted site). Then if the link to the file has that TLD it is more trusted? (in a decentralised network)

Just looking for a deeper view of what the current DNS based “trust” actually looks like and what it actually means in a decentralised network.

If we consider this utility of human readable names as a good simple way of sharing a file, will anyone actually type in the whole filename or will we copy/paste it?

If it’s a way to say it’s cute (I agree with the cute part above btw), what makes long filenames cute, we know the name does not need to represent the content.

Now if we used XOR url and that was deterministic of the data, when you download alksdlskfdjsfd894843rnjsl and then do the same many years later, it’s guaranteed whp to be the same content. It won’t have moved, it won’t have been taken over etc. it’s certain to be that content, or in terms of registers the same root.

4 Likes

I think that the name can give one a sense of trust about the quality of content.

For example, if I have (somehow) established to my satisfaction that disney.usa is run by Disney, Inc then I can have reasonable assurance that any disney.usa/<foo> will not contain adult content. Likewise, I can create a filter that limits my children’s viewing to such trusted names.

whereas if I just receive a random <xorurl> from a global namespace, I have no a-priori indicator of the content, other than the source from which I received it.

So the distinction here is that NRS establishes ownership over a namespace, and third parties place trust in the owner to curate the namespace in a way they approve of. Without NRS, this publisher/trust element is missing.

10 Likes