Appendable Data discussion

Wow, this has been quite a passionate thread! I suppose I might as well give some IMO/0.02$.

Implementation details aside, last time I read over the evolution of proposed data structures for the network I felt like maidsafe had ideally/perfectly distilled things down to two fundamental building blocks (Mutable and Immutable data). I also really like how they align with Rust paradigms. From a HPC perspective and looking forward to a perspective of safe as a general world computer (which may be getting ahead of ourselves and a bit and off track), the lack of a simple unversioned mutable datatype would be rather detrimental. Analogous to how you would program in Rust with no mut?

Appendable data also offers some nice features and is something I see good use cases for. It might also help safe get better industry adoption and certified for different compliance needs. However, wouldn’t it be better left for the App layer like ntp time stamps rather than network core? It seems like a good design principle would be to construct it from a combination of mutable and immutable datatypes with the multi-sig features and whatever else added in to complete the appendable datastructure and/or other future datatypes.

I think some of the community frustration found in this thread comes from everyone having a different set of expectations/views/wishlists/hopes/dreams on what the functionality of each datatype offers best. I’ll readily admit that my conceptual view of how it all fits together is rather limited.

It might be fun to brainstorm a list of what everyone sees as their understanding/preconceptions abount MD, ImD, and ApD. Then dirvine or a core dev can tell us how unrealistic we are being, or maybe we’ll give them some ideas to chew on.

5 Likes

I generally like the idea of having an Appendable data type. But I don’t understand why we can’t have both: One data type which is Immutable and one data type which allows modifications / deletions. Looking forward there will then be services which harness immutable data and others will use mutable ones. Let the people decide which services they want to use. E.g.: I can imagine that there will be two versions of a video sharing platform. One allows real deletion of content and the other not. You as a user then have the choice which one you want to use for your use-case.

Am I missing something here?

2 Likes

I think the SAFE network at launch will not be anything like a SQL database. It is a huge discussion, to replicate SQL on a server is unlikely, to replicate the function provided by SQL on a server is. SQL on a sever is generally faster as you do not worry about security and can use data locality etc. The cost of that is security and scalability. So then you look at Amazon etc. they do not use SQL servers, but decentralised systems like dynamo, it is more CRDT like as opposed to consensus driven ordering (PARSEC), but works at huge scale and secured behind a firewall. We do not need the firewall as SAFE has secured data. So yes this can be done and done at scale with te security of SAFE, but it will not be SQL, but it can provide the same end user results SQL does.

No, this will not be possible. A safecoin is a data element with an owner. The owner changes when the coin changes hands. No history, early versions had the last owner then we can have receipts, but simpler to have a single owner. That is metadata, i…e not perpetual, so no tracking.

I hesitate with safecoin though as it is not finalised, but I ā€œfeelā€ it is possible to exist purely in client accounts, backed by PARSEC. So not even a data element at all, that means a very fast transfer of millions of coins and very simple divisibility, but there you go the cat is out the bag of my thinking there.

I think yes discussion is great it helps us all.

I think the RFC process is good for this. We need to be aware every data type is a data subset, so easier to guess or a smaller catch group for data. Then if you make some chunks read only and some mutable, the network needs to identify those and apps will need to be able to read and say what of all the content of a thing is mutable and not mutable. That makes apps harder and user experience harder, say a video has a billion chunks and 1 is mutable? There are many more edge cases. What if you can mutate stuff, would you want history, if so appendable data does exactly that. So what we are looking for/asking for here is 2 things as far as I can see.

  1. Mutable data that scrubs history
  2. Deleteable data

I suspect they are 2 very distinct types for different purposes and both with side effects on the network. so RFCs are good. I worry Devs will be taken from launch though to work on all of these parts, whereas we have alpha 2, apps got created, it expands the API and more apps happen, it’s moving to RDF/SOLID integration and all with pseudo appendable data. all of that still happens, but more efficiently with appendble data. If you see what I mean anyway?

11 Likes

Phew. I honestly figured but hearing the details is always relieving.

I won’t hold your feet to the fire but :exploding_head: mind blown. That would be next level for sure and a helluva way to show off the power of a little ABFT consensus protocol called PARSEC.

Interesting about Amazon I thought they had some distributed refumdancy but did not know that. Also reassuring to me personally.

Just an aside if you don’t mind. How far along is the integration of threshold crypto into PARSEC? I noticed you and @anon86652309 forked it quite awhile ago but don’t see it in the Maidsafe repo. :yum:

7 Likes

It is all happening, some tests already working :wink:

10 Likes

So, if all data is essentially immutable (appendable or otherwise), does this mean that caching is similar and as effective in all cases?

5 Likes

I know that you and the team don’t leave things up to chance and that there is a well thought out master plan behind it all. I just hope that we will have the coolest network that the world has ever seen and that it gives people as much freedom as possible, and that people will be in controll of their data, as much as technology allows in a fair development time perspective.

I hope that ideology never compromises the functions of the Network, or how cool it will be or to give people controll over their data. To give the world security, privacy and that the people own their data is what the network should do, to end dictatorship in countries or other things

I hope will be an effect of people using the network, but that it is not ever what it was built for, if it compromises functionality. Just promise to give us the coolest network the world has ever seen, that is all I wish and hope for. Facebook was never written to overthrow dictators but the ability for people to connect and start groups, allowed for dictators to be overthrown.

If it is possible for people to choose if their data should be forgotten or not that I believe very strongly would be a good thing, if it is possible don’t only let people own their data, let them also be in full control over their data, if it don’t compromises functionality or security or other things of higher importance too the network.

3 Likes

I don’t agree with or even really understand this ā€œonce public - up foreverā€ -principle. It sounds very unforgiving and cruel to me - and I think the world needs network, that allows forgiveness and kindness.

Sure, I would like to hold bad actors accountable, but I would also like to give them the opportunity to limit the scope of their bad acts if they come to regret their actions. Or to give people a chance to try to limit the consequences of the mistakes they do when, young, drunk, or stupid - or all these at the same time.

I think it is important, that you can publish stuff anonymously and not be forced to take it down, thinking about whistleblowers here. But I really don’t see reason to be forced to not to be able to take down what you want - thinking about ex-schoolbullies, young girls seeking attention etc. here.

Of course once you publish something, there is a chance that it will be public forever, because someone else can copy and republish it. And you just have to live with that. But that is not necessarily the case, and it actually should be less the case in SAFE Network, because no one else is owning the platform where you publish your stupid stuff.

I also expect that there would be people or organizations working as watchdogs, keeping book of the stuff the powerful and influential people say and do.

I know that there are some smart people that think that people should not be protected from their own stupidity, but - they are smart. It’s like powerful people saying that weak should not be protected. And accidents can happen to anyone. Anyone can accidentally publish something that was meant to stay private. Why not give us a chance to correct our mistakes?

Ok, just the existence of public immutable data is something that I see as risk, but I’m willing to accept that. But I’m not willling to accept that all the public data should be undeletable. Now I’m uncertain of the technical details, but if it is the case that datamap must be public and thus public data becomes undeletable, would it be possible to make a public site so that there is public data map pointing to another ā€œmapā€ (or something like that) that I actually can retract? So that if the basic layer of public data is permanent, there might another layer of doors where you can point to from permanent layer, but I can choose to lose the keys?

3 Likes

If it is public - owned by everyone - what gives anyone the right to unilaterally delete it?

5 Likes

Hmm… if there is a public data in a forest, but no-one has seen it, is it really public? :wink: I mean I can publish something by accident, but that doesn’t mean it is yet public, if no-one has seen it, and I think I should have a possibility to try to correct my mistake.

And on the other hand, if I publish something and someone else sees it, why it should be anyone elses - or the network’s - responsibility to keep a copy of it for them?

4 Likes

I am struggling to understand how deleting something once it has been made public changes that copies have most likely been made if it is even of the slightest interest to another person.

2 Likes

If there is no data element … what happens when my node drops offline … obviously I don’t lose my coins … so where are they stored - how is there no data element? Sorry for my stupidity, I just don’t get it. Also if you can do this magic with safecoin, then why not temp data?

In the end I don’t have enough understanding to get a feeling on what is better. However @neo has raised some points that I feel haven’t been addressed – maybe they can’t be addressed until the code is written and tested? For instance: speed (reconstruction cost), data growth (and data storage cost) of worthless data – essentially and IMO I think we all hope that the Safe Network can compete with the clear-net overall in the end (with cost, speed, and security issues all taken together into account).

I believe the concern here is that without some sort of ephemeral data storage we won’t be able to compete and we will miss out on a lot of growth … of course there is no way to know this, so all just gut feelings on one side versus gut feelings on the other – but that isn’t to say that rational arguments aren’t being presented here or that we shouldn’t do all that we can to close a perceived gap (again, in overall cost, speed, and security) between the Safe Network and the clear-net.

Yeah - great idea …

So,

  • Immutable Data for me is also the Perpetual Data (I’ve thought these were the same two things - am I wrong). I imagine this data type being used to store really valuable information - family photos, diaries, historical info.

  • Appendable data I sort of assumed was just an offshoot of Immutable data allowing for version-control.

  • Mutable data (or what I thought was mutable data, which it seems I was wrong about), IMO, this should be an ephemeral data store that may or may not be private could be used for storing temp data and is accessible to all sharing it. Similar to Appendable data but deletable and would hopefully have less overhead in both speed and storage cost.

Cheers

1 Like

IMO, all data is ā€˜public’ on the Safe Network, but you’d never find it unless it was shared with you. So effectively all data is private. So deletion should still be a thing.

EDIT: never say never Tyler … given enough time, all data is NOT private! So again, being able to delete seems important for some people.

FORGET SQL. It seems the focus is on SQL which really would be converted into another database type for SAFE anyhow.

The very fundamental processing of data by only having appendable data means that ANY collection of data records will have increasingly long access times as the data is mutated because you can only append the changes. The implications of that is

  • to reconstruct the data you either
    • have to ā€œnullifyā€ previous record and search through the appended data for the actual record. This maybe many network requests away since too big for one AD object
    • OR append each change to any field(s) of the record and reconstruct as you process the AD and subsequent ADs (as the record changes are too big for one AD
  • This constitutes a forever increasing processing time and network accesses for each and every record as they changed.

Solution
Just have the promised MDs with mutation (modify == change in place) and append functions.

  • version-kept. A flag in the MD denotes if a copy of each version kept (perpetual data) or temporary/changing. 90+% of APPs will either reject MDs not keeping a copy of the version as temporary data. (<-- this could be at the api level and defaults to retrieving only perpetual MDs but allows other APPs like text editors, database etc to allow the other)
    • Thus application temporary files (eg text editors) can reuse MDs without adding (notes on paper as you called it) extra MDs containing encrypted (once only keys) data unreadable after the editing session. And the actual files (previous & new) are kept thus keeping to perpetual data.
  • Browsers and most (90+%) APPs will take note of the version-copy-keep flag and appropriately deal with it.

Isn’t this trying to have your cake and eat it too?

The owner of data (any AD data object) and the history of ownership is just as important as the data itself.

For instance comment fields in a forum. You can completely change the flow of comments if you change the owners of the comment ADs.

For instance on a blog site

  • you make a blog entry about privacy and
  • some authoritarian makes a blog about the authorities must have ALL knowledge of its citizens,
  • Now the owners of the ADs containing the authoritarian should have all knowledge changes the owner of his/her AD to you.
  • Now the blog site has two entries owned by you with very contrary views.

Ownership information is definitely a part of perpetual data

Thus if you make allowances for safecoin then you already broken the flawed model of appendable data only.

I can tell you that having done data mining for 3 years in a job, taught me a bit about what is history and what is not. And ownership is very definitely an important aspect of perpetual data

Again that is a very flawed argument.

Videos are immutable files and no chunk can be mutable.

So what are you implying here, that immutable chunks are being brought into the AD type and we no longer have the specific immutable chunks (immutable files)?

Except the argument had a flaw (see above)

Its stored with you account information.

4 Likes

So is that on the network or locally on my computer? Must be on my computer I guess otherwise would be in some sort of data type right?

Personally I don’t see anything wrong in having permanent data. In fact, I welcome it, and from a consumer point of view, keeping a data history is what initially interested me in this project. Classically History is always edited, and is written by the victor. Having data histroy means a lot to historians, beacuse they will have uncensored sources. Way back in the early 90ties, I had one of the few listed websites on the WWW, called: The Naked Truth, on Yahoo, along side the Captain Kirk Sing-a-long page, and the Evil Clown page. Today that site is gone, and so are the original files. So my point: I wish I could have preserved history.

everything is on the network.

Yes in a data field of your account record that only the network can modify.

Mind you append only ADs means that this set of ADs holding your account info will keep growing and growing because every time you do something account wise the account record has data appended and next time those appendices have to be processed to recreate your account information.

1 Like

Yes that is fine. As long as it is not using appendable data objects.

I suggested keeping each version of the MD as it is mutated. This speeds things up by not requiring the reconstruction of the data after processing the appends. And makes a simple network access since the copy is just like an MD that cannot be modified. (ie read only).

4 Likes

@dirvine David, I understand the keeping of files. data and web data/sites in perpetuity and fundamentally fully agree with this in principle and practice. The issue that most concerns me is the use of Appendable Data as the ONLY way to achieve this.

You mentioned an issue of perspectives and how we are potentially misreading yours. This may well be true but if its a fact that the direction is to implement perpetuity via ADs then it is a direction change from the MD that has been discussed many times in the past. Its OK to change things but I feel it is a change for the worse in a practical sense.

Can you confirm if the following questions are true or not and please provide some thoughts on the issues/questions contained in each point.

  1. immutable file chunks and AD are somehow being combined as one as per your implication further up, or is this a misunderstanding? You were talking of having only one type of data and that immutable video files are somehow also able to contain the MD (now becoming AD) data types. This is very confusing when you make such arguments. And if I am confused as to what is happening then I am sure others will be too

  2. The new AD (appendable data) type which is replacing the MD (mutable data) type means that whenever a change is made to one or more bytes then a new field is created with the changes in that field. How are the changes to stored?

    • will the change be something along the lines of bytes xxx-yyy are changed to whatever OR
    • The previous field is flagged as old and the complete data is stored in the new field.
    • And I imagine that just adding (appending) to the data results in the new field just holding the new data to be appended.
    • What happens when the changes exceed the 1MB size of the AD?
    • What happens when there has been hundreds of changes been made. eg a web site or document?
      • will the system have to add an AD when a particular change cannot fit in the current AD?
      • how will it determine the address of the continuing AD? what if the ADs at the calculated addresses are in use? How will the linkage be done.
      • What happens when the web page is 900KB long and its is changed 50 or 100 times? How many ADs will be linked for this one? How long will the processing take to get the actual page to display? In other words will the browser have to retrieve the 50 or 100 ADs that this one AD has grown to?
  3. In your talk with @fergish, you say at 45:36-> that

when we talk of data in perpetuity we don’t mean every single scrap of nonsense that you write on a scrap piece of paper or draw something in a window in your brain or thought … when we … talking about data we are talking of data you want to keep and you value that data enough that you’re prepared to make a tiny tiny tiny small payment - almost like - if you get out the good writing paper ?? want to keep and make data last forever - just not something we have as a fulffy goal that would be ?? This is something that we feel is for humanity we must do that must be able …

  • so why aren’t the temporary files that editors use (either via safe-drive or native safe app) not considered as digital scraps of paper?
    • the temp file is a jumble to anything other than that editing session. The context to the editor is lost once you start a new session or start on a new file
    • The original file is still there in immutable data (or in your new one combined ADs)
    • the new version of the file is still there in immutable data (or …
  • I put it to you that the temp file is that digital version of the scrap of paper that is used to keep the changes for undos and unexpected app close, till the new file is written. After the new file is written the temp file loses all context and its content cannot be used again even by the editor since there is no context for it.
  1. As I alluded to before - By using append only data types (AD) there has to be a method to provide the data as it now is.
  • how is that process to run? Especially once the changes (appendixes) cause the AD to overflow into one or more additional ADs
    • is is done by some manager network side?
    • is it done by the client code?
    • Is it done by the application?
  • how will the implementation prevent the access time of a particular AD increasing as more and more appendixes are added due to changes and multiple ADs need to be read to get the data for that original AD? I am especially thinking of changes adding appendixes that cannot fit on the original AD and new additional ADs are needed to store the appendixes.
    • what will be the additional time (lag time etc) to access these additional appendix ADs do to the performance? For example the 900KB web page that undergoes multiple changes that adds maybe 300KB to 900KB to each appendix and the AD now is spread across many ADs.

Also can you comment on my proposal above to keep the essentials of the MDs as outlined when MDs were being introduced. You know that an MD can be mutated/appended to/deleted

  • The proposal mainly introduces keeping the previous version of an MD as a history. The changes to an MD causes a new version of the MD which essentially is a new data object to be written to the vaults.
    • this means that history is kept - ie perpetuity of data
    • Owner is also kept which is an essential part of the data. (who wrote what is important)
    • Effectively the version # becomes the lower bits in the final adderss of the actual data object to be read. This can be done by either reducing the #bits in an MD addess or increasing the address space of these objects.
  • The one exception is to allow a ā€œtempā€ MD where the ā€œkeep-a-copy-of-old-versionsā€ flag is set to false.
    • This allows apps and web browsers to know if the data from the MD is temporary and can either flag it or the app running in the browser can reject it or not.
    • OR the api to retrieve a MD can have a ā€œAllow-Tempā€ option which defaults to no. Thus allows the 90-99% of APPS that do not trust these MDs to not even see them. And the other APPs (utility or say data handling APPs) can access them if they want. So tehn editors can have their temp files if they wish.
  • even data handling APPs or database APPS can use the version keeping MDs without penality of access speed slowdowns due to expanding appendixes to trwal through.

So by using this proposal you can pervent any loss of speed due to trawling through appendixes which over time would cause continually increasing access times for any data that is changing. But keep the benefit of perputal data.

If you decide that making the keeping of copies optional (for temp files/data) is bad or can be abused then it would not affect the APPs using temp files or temp data or data base type apps to lose that ability to turn off history since they are just accessing the latest version of the MD.

11 Likes

An additional question:

Is this correct?:
An MD always has been append only, in that there are ulong.MaxValue number of entries in each of the 1k MD entries (one for every version), but that the API has only exposed the most recent version (and calculated size only for the most recent versions? Otherwise you’d soon have no more space, even though size of visible data was not that big).

4 Likes