Data, information and knowlege: does AI change the game?

There are a great many changes in data handling now, globally. There are also many large changes with AI type devices and platforms that also change the game. There is also a distinction now, I believe, between data and knowledge.

So do we want to save humanities data or humanities knowledge. They are not the same thing. It’s deep and interesting.

However, we have data working, we can protect peoples personal data, it’s always a goal and we can do that now. So a restricted network is unlikely and in next few testnets we will see data payments and helpfully node rewards etc. So the picture is almost complete.

Then we need to have a conversation on public data verses humanities knowledge.

My feeling now, and it’s a strong feeling backed by empirical evidence from these testnets, is that we no longer have a technical barrier to launch. The team are breaking all the barriers in an iterative and concrete manner. What we will have to finalise is

  • A clear market strategy (what are we providing to whom)
  • Understand the difference between data and knowledge
  • A much larger focus on personal and group data

A lot has to do with AI and the advances there, like it or not, it’s happening and happening very very quickly. I am hopeful, but that is another conversation.

I believe soon that much of our knowledge will come from AI and agian, not debating the right and wrong of that, but seeing the reality of that is important. AI will assimilate and regurgitate all of humanities knowledge in whatever form it will, hopefully honest and unbiased, but …

However we cannot have AI soak up and encode YOUR data, or your friends data or your children’s data. That has to be between humans and always between humans and unfettered. The Safe network needs to make a world SAFE for human collaboration and creativity and it has to keep human interaction free from manipulation of any kind, good or bad. That’s my feeling now after many many months of deep thoughts.

I see public knowledge being something that is not the massive unreadable and overwhelming tombs of files and directories, it’s much more likely a small (several Gb of some AI / LLM) that’s already likely happening right now. Personal data, groups of humans data is just that, it’s raw data, messages, photos, videos and more. That is where our creativity comes from, it’s our weapon that no AI can come close to understanding or enhancing. It’s our own foothold on this planet and that is something worth saving.

So google, apple, openai, Facebook and the rest cannot have our personal data in their silos for any AI to consume, we cannot have our personal information censored, manipulated or forgotten. This ability to save humanities creativity for humanity alone is, I believe, a fight we must not only embrace, it’s a fight we must win.

Sorry for the rant, but I have been up to my kneck in trying to understand the impact of AI and the speed at which it will almost certainly improve with regard to public data and discourse. To me the bottom line is we must not focus on storing every corpus of public data we can, but we must focus on the individual, the human groups of creative endeavour, the family, the friends network and most of all we must make that impenetrable to manipulation of any kind.

I feel we have always been a few steps in front and what I say above is not extra work, it’s likely less work, it’s likely easier fro consumer grade computers to handle, it fits with small nodes and many per large computer. It fits SAFE and I think it’s more essential for humanity now that it’s ever been.

We are close, we are so close but we need to get laser focussed on this launch of a network for the people and to enhance the creativity and knowledge of the people who can do what no AI will likely ever do and that is be creative, inventive and understanding of the evolutionary needs of humanity to move forward as a species.

/end of early morning rant :smiley:

36 Likes

My initial thoughts is that you need data stored before you can have knowledge.

Data is just that data
Knowledge is result of processing data
Wisdom is learnt - Application of Knowledge correctly

As you point out, personal data needs to remain personal safe from AI as well as other entities. Safe cannot come soon enough

13 Likes

I went into this a lot. I think knowledge exists in only a subset of data. It’s a bit weird, but I went back to thinking of the old days where we printed out thousands of pages in a dot matrix tractor fed printer and the boss came along, tor off the last page (summary) and the rest went eventually in the bin. It’s a bit like how AI encodes data into knowledge. It more or less compresses all data into a format (i.e. a vector database) and effectively de-duplicates all the data into knowledge. (I use knowledge loosely here for current AI, but perhaps not so loose in the next few months or years)

I should add. I think human creativity eventually publishes data publicly to be consumed in the corpus of human shared knowledge.

15 Likes

:star_struck:

This quote needs to be shouted from the roof tops.

16 Likes

:baby: I’m ready to go

9 Likes

I think I understand what you’re saying but you haven’t convinced me yet. Food for thought.

Firstly you need the raw data to produce the knowledge. And to verify that knowledge you need to test it against the raw data.

If you only preserve the ‘summary’ it could be garbage or propaganda and the rest won’t know.

The reason your boss could just read the summary and use that is that he trusted you to gather, validate, analyse and summarise.

He is a bit daft if he doesn’t have ways to ensure that. Some do it by reading the whole thing. Others do it by looking at the summary and challenging those who produced it to justify key elements in it etc. And so on.

In some areas it is vital to keep the raw data for follow up analysis. Revision when assumptions change, errors found in the gathering, methodology etc.

I don’t think it’s a good idea to just trust AI did a good job on that and not have the data on which to perform checks or make comparisons.

Being able to compare the output of AI on the same input is a simple and very useful test of the voracity. If models give different ‘summary’ output for the same input, you have identified a problem with the model and the output.

I don’t see a reason not to archive public data. In fact it still seems vital to me even if AI develops as you expect.

The perpetual web will be a great selling point because many key people already see the value of the internet archive.

So I see lot of value there.

Is there really a downside, does it really get in the way of protecting private data?

It sounds like you see a problem with the technology, or its adoption being hampered by the goal of archiving humanity’s data?

7 Likes

I reckon you could write a very nice manifesto for the first SAFE site. Something along the lines of Satoshi’s white paper. Could inspire a lot of folks I believe.

14 Likes

This sounds great, it also feels good based on observing the testnets. But also feel that much thoughts needs to go into the economic system of the network. The projects economics have always felt like a weaker spot but it is very important to make a successful network, it is the oil that makes the cogwheels turn.

The economic system of Bitcoin except from it’s energy use is one of it’s biggest flaws, that it can’t survive on low transaction fees and it is dependent on mining new coins and the inflation that follows.

The dream would be a network that is not reliant on rewards but cain survive and operate on the cost of storage were clients and providers meet in equilibrium based on pure supply and demand which will give a efficient and fair system that will benefit both providers and clients through competition.

6 Likes

Many factors might play a role, maybe more if you own a company server farm and if some regions are cheaper for warehouse, electricity, hardware, Mbit/s and so on vs home servers. But also the effect that you could run servers from home with no profit for motivation, just to store for free or support the network. Barriers to entry seems also to be low compared to gpu mining and others. Low hardware and bandwidth requirements will open for large decentralization with a huge number of participants for various reasons.

Would be nice to see something like Safe pi’s, rasberry pi’s preloaded with safe, also combined with a simple monitoring app for phones so the Safe pi’s becomes almost install once and forget. With a Safe pi you could upload and use Safe network with no cost and also maybe a profit. Problem might be that home servers gets occasional problems with internet connection due to weather or ISP maintenance and so on. Just some thoughts thinking out loud.

4 Likes

Where I am right now, is kinda in agreement. The raw data will need to be protected and there are many public data stores doing that. AI is not really summarising documents in any way, but it is compressing the knowledge. It’s a bit weird but it does work to a very high level right now and likely only get better.

So here is our thing, we could focus on storing and publishing public data, private data and all data or we could reduce that focus to ensuring the ultimate protection for personal private data, that can be shared and also groups of humans shared data.

With no NRS etc. we can still share web sites, blogs, twitter type things from our personally protected vaults. Just like sharing WhatsApp links or similar. So it’s not a cop out of that kind of thing, same for whistleblowers etc. still all good.

What I don’t think we need to do is look to store the worlds data sets on the primary layer of the network. I Suspect many archive nodes will store old raw data, but I feel the chances of AI being the goto place for knowledge is a non zero chance and really it has to be in the high % bracket, maybe 75% or more chance that public knowledge will be from AI(s).

Right now I believe protecting personal data from these AI(s) is vitally important though and instead of fighting on every field, we can focus on this with absolute vigour and be extremely narrow in our efforts. Get away from storing massive amounts of zettabytes of data sets to begin with and be really focussed on personal data, BLS and shared keys, threshold sigs and encryption. Make great apps for individuals to use, including personal AI fine tuned to the users own data (ingestion) and much more. With that focus we carve out a niche that is underserved right now, secure, encrypt by default personal data, outside of the reach of global and corporate AI’s.

It’s a developing story as is the AI move that is happening now at lightening speed. Folk showed they did not care google et all had their data, I think that changes with AI as it won’t just want to sell you advertising, it could be much more nefarious.

19 Likes

This is a key issue and one that will be super interesting to measure in the next few weeks.

5 Likes

AI may be the fuel that feeds the need for SAFE, safe from AI!

8 Likes

I think it will be easier to start focussed then grow to anything we want in time. I don’t think we need to worry right now about large data sets. They are not the threat we face at this moment.

So faster to launch, more focussed, easier to explain and build on this when the network is up and running and the economics are proven at a smaller scale.

Instead of all things to all people, we have the tech to handle massive amounts of data, if the economics are correct. The tech is fine and I would love to see it get out there asap, stabilise, grow and then extend to all valuable human knowledge. If there is to be a global AI lets work to also make that a SAFE AI and decentralise it and put the ownership of it firmly in the hands of us all, not in a corporate.

19 Likes

Fully agree, the time is ripe. Positioning SAFE as the safe harbour at a time where many are already weary of AI is a market fit made in heaven. Love it.

Edit: There is a premium here that who else can offer?

11 Likes

I’m not seeing why there needs to be a choice between securing public data and private data.

ChatGPT is deteriorating, but I expect LLMs will improve all the same. But without the raw data we won’t know.

So my point is both that:

  • if the tech is good, why can’t it secure public and private data?
  • isn’t it necessary to have the public data in order to know that LLMs are improving, or even accurate rather than broken or seeded with propaganda, ads, misinformation, bias and so on?

Leaving public data to be secured by centralised archives just leaves it vulnerable, so I don’t think that’s a satisfactory solution.

7 Likes

If you think of focus and speed to launch then you might see more of what I am saying (brainstorming here). We can do it all on day 1 and lengthen time to launch or focus on a current need.

I will explain a wee bit more (we are in brainstorm/debate mode and it’s good)

There are some projects out there that started after us and launched before us. They are racing to store huge corpuses of data at a reduced rate, sometimes at a significant loss. But it’s a race to fill hard drives. These are not centralised per se’ but decentralised.

They have farmers that require to invest 10,s of thousands of pounds in hardware to do it, some are closer to hundreds of thousands. But they are doing it.

They have regulatory issues and don’t encrypt data by default, adding to the woes.

We can do better I believe, but it will take a large network of small nodes to do it, a very large network.

It can, but we cannot wait on it all right now IMO, I think we need to have a very strong and succinct offering that folk can understand and satisfy an immediate need.

Actually it’s not necessary. There is a lot of ways to share parameters (weights and vector databases etc.) but even then it’s not all raw data (video streams, old photos, sensor data and so on).

I think of it in many ways, such as not caring who wrote a paper or the blurb in it, but I am keen on the actual parts that mean something tangible. So I could strip out a lot and so on.

However humanities knowledge, i.e. papers, books, blogs, website, songs, poems and so on does need saved. I have recently come to the conclusion though that not all data is knowledge and not all knowledge is even in data (much of it is in our heads and thoughts).

Therefor a way to get out fast, measure, grow and fit the needs of us all might well be to focus on the minimum right now, but make that minimum extra special, solid API, good apps, secure vaults and an economy we can test and ensure works.

It in no way means we don’t add to this quite easily as we get growth in node numbers and also in where nodes can run, like tiny devices, mobile phones and so on. If we can get small nodes that don’t need to transfer lots of data on churn in a single group then we can let the network grow and handle more and more knowledge.

i.e. Save the peoples knowledge first, let them communicate and share information, get it all running and then extend it to smaller devices and larger data sets over time.

16 Likes

Thanks David. I’m not clear on several things and so still don’t understand the need, or really what are the practical implications of what you are suggesting.

Maybe when it is appropriate we could thrash it out in more detail in a separate topic. Though I hear you say this is brainstorming as well, so not fully formed.

BTW AFAIK Safe Network is the only project attempting perpetual storage for a single fee, so other projects archiving remain vulnerable, whether technically decentralised or not. Or is there a solution to funding that perpetual archiving other than SN?

7 Likes

Vast quantities of scientific research is being thrown away because there is nowhere to store it. It seems that SafeNetwork could help out in this area as well.

10 Likes

Personally I’m not so impressed with “AI” as implemented by ChatGPT etc. If anything, to me it seems another gatekeeper for status quo thinking like wikipedia, and largely over-hyped. But I’m likely in the minority.

I wonder if it could be useful for Safe Network though…

I’ve wondered in the past if it will be a problem for Safe Network if people start storing huge, but low value data on it. For example, web server logs. Or even just random data. I’m not certain how the economics play out if the uploader pays a single fee and that very low value data is supposed to be stored forever by others.

So I wonder if AI could be leveraged to generate some kind of content-score based on analyzing if the content appears to be computer generaged or human generated, it’s quality, and its uniqueness. Maybe even auto-tag it into some kind of RDF/Solid ontology.

A higher score might mean that the content gets replicated to more nodes, or that it costs less to store initially.

Of course the devil would be in the details, and this seems like a potential vector for all kinds of politics and gaming of the system.

So I’m pretty sure it’s a bad idea, but throwing it out there anyway. :wink:

13 Likes

It definitely is over-hyped to some extent. When you use ‘AI’ in scare quotes, I think that’s right–at the moment it’s still just a sophisticated prediction engine. I’ve personally started referring to it using “LLM”, to make a distinction between what we have now and what will actually be real AI, whenever that happens.

Having said that though, even if LLMs are over-hyped, ChatGPT, and increasingly the emerging plugin ecosystem, is still an absolutely amazing tool. I’m using it on a daily basis, more than Google now. It’s just become so convenient to be able to ask a specific question and get a specific answer (and yes, even with the caveats about ‘hallucination’ and so on), in addition to many other things.

My concern is not about what ‘AI’ is now, but rather LLMs as the base for progression to real AI over the next five to twenty years. I am genuinely concerned that we are going to bungle into that world much too quickly and unleash things we are not ready for.

10 Likes