Discussion topic for RFC 54.
From Unresolved Questions:
It is not clear if the special OwnerGet RPC is worthwhile enough. The data is stored publicly by the vaults so is retrievable by them anyway. The Network doing extra checks and work to make certain data get-able only by the owner(s) (or keys authorised by the owner(s)) for something that is stored in public by the vaults and can be worked around with not much effort (e.g., published on clearnet etc.) seems like an artificial constraint. Also there isnāt much incentive for the vaults to adhere to this behaviour. Can it in-fact, over a period of time, might become a norm for the vaults to just return data irrespective of the asker (which is the case now) for GETs and earn rewards for doing such work anyway ?
Yeah this is tricky!
I reckon the check adds a good āextra levelā of security which is certainly imperfect but still probably quite robust in practice.
One thing that niggles me here is more complex ownership structures in the future may cause a lot of friction here.
eg time-based-locks or hierarchical multisig or oracle-based checks etc. may impose a lot of work on the ownership checking.
This might reduce the usefulness of the datatype or introduce attacks on the verifiers.
I donāt like to over-optimise for merely a āmaybeā future problem, but this seems worth further reflection.
My current thinking is the āprotectionā of OWNER-GET is not enough to justify the possible future difficulties it will bring or the additional network load caused by longer routes.
Still gotta ruminate on it for a while.
Iām wondering if the āUnpublishedā part of UnpublishedSequencedMutableData
and UnpublishedUnsequencedMutableData
isnāt redundant.
The distinction between āUnpublishedā and āPublishedā only applies to AppendOnlyData
and ImmutableData
, as MutableData
will never be published. So āUnpublished Mutable Dataā is clearly redundant and can confuse people into thinking there is a published version of the type ā on the other hand it is a bit more explicit.
What do people think? Should the redundancy be removed?
Yes and as Lionel mentioned ImmutableData is the opposite, always published. I agree with you here @marcin letās remove the redundancy and go explicit.
When youāll get this to work in Maxwell, this will be an amazingly flexible and powerfully structure!!
Yup!
Iām also having a feeling there might be some simpler scheme to all of these names. Not sure but
UnpublishedAppendOnlyData
PublishedAppendOnlyData
UnpublishedImmutableData
PublishedImmutableData
SequencedMutableData
UnsequencedMutableData
is a lot of textā¦
I am wondering if itās worth the descriptive names here. I myself am very keen on descriptive explicit names. But I think these components are so fundamental that they might deserve something shorter like a unique name, where their use and capabilities are implicit knowledge.
For example: within the event sourcing ubiquitous language, a āstreamā is an append only sequence of events. This is perfectly well known and as normal to any āevent sourcerā (uhm sorcererā¦ ) as the use of āstreamā in disk IO context, or āarrayā in data structures context.
So, the appendonly-ness of it isnāt needed in the name when sorcerers ( ) talk to eachother. Or in the names in code.
Just spitballing here:
OpenStream
HiddenStream / ClosedStream
OpenData
HiddenData / ClosedData
Set
SequencedSet
(I havenāt managed to bring it all the way home with this example, Iām just exploring here. Maybe someone can pick it up.)
The SAFENetwork is itās own context, and data storage within it is itās own sub context. The ubiquitous language of this context, allows for this approach I would say.
We would here be saying: Open means it can be accessed, a bit more directly said (published is a bit more technical way of saying the same).
Hidden or Closed means it can not be accessed.
Just saying āDataā, establishes that Data on the network is immutable. We donāt need to call it immutable - it is the norm.
Calling it stream is also distinguishing that this is not the actual data, it is rather a stream of pointers to data.
With this, weāre saying: We are creating the language here, we are setting the norm. We decide what this will mean, and since this will be widely used, everyone will follow and it will be established [within this context].
Using these long unwieldy names is short-sighted IMO.
OK, so this is not a complete idea/answer, Iām trying to inch over to a perceived possible something - not yet known.
What does hidden mean? Hidden from whom?
Iāll clarify
ClosedData / HiddenData (merely two examples of names) would replace UnpublishedImmutableData.
āClosedā means it is not accessible by others, in the same way as unpublished means. āHiddenā is the same thing, just a little different word (and I might say: is closer to what it really is).
The difference here is that we are defining new names, which carries the implicit knowledge of use and capabilities, instead of technically describing what it is and does. This, I am arguing, is supported by the fact that these are so fundamental parts, and assuming wide spread use (as we surely are) it is quite natural to do so, as it is usually what happens. We load new words with meaning in our created contexts.
Unpublished
Data could still be shared with others though, itās just not accessible to everyone.
Hence differentiating between something that is Public/Published or not.
Hidden could also be shared with others. Not just
by default accessible by everyone.
The act of sharing it is in fact āshowingā the address edit: errā¦ the data. (But less technically expressed)
Iād find a list of types without names (eg just type 1, 2 etc) alongside a description of the properties and uses of each.
It might make it easier to see what naming schemes are suited because at the moment these descriptive names are obscuring rather than revealing IMO.
I can see why sequencing would be necessary for a BTreeMap
, which is the underlying structure of Mutable Data
, especially if in the future it is to be published, thus transforming it into Append Only
.
Why do we need sequencing for Append Only
, the underlying data structure being a Vec
?
My only thought is that this may have to do with ownership, which can change over time.
Something like, āAt data version X, it was owned by owner(s) Y, and the state was Zā?
Anyway, this would be good to add to the RFC.
I.e., if some user stored a
PublishedSequencedAppendOnlyData
object at a XOR address X and type tag Y, and another user wants to storeUnpublishedUnsequencedAppendOnlyData
at the same XOR address X and type tag Y, there will be a conflict and an error will be returned.
This seems problematic to me. Correct me if Iām wrong, but users donāt normally have a choice for XOR address, right? The address is derived from the original file contents? Iāve read āWhat is self-encryption?ā on the FAQs page and watched the video but some things are still unclear to me.
For instance, say:
- User Y backs up a directory of files as UnpublishedUnsequencedAppendOnlyData. They are doing this for backup purposes and have no intention of publishing public data on the SAFE network. One of the files is jquery-3.4.1.min.js.
- User X decides to create a public website available on the SAFE network.
- User X wants to upload the asset files as PublishedSequencedAppendOnlyData (for example, they want to upload jquery-3.4.1.min.js).
- User X gets an error. Iām not sure what happens next:
(a) they arenāt informed why they canāt upload their file,
(b) the SAFE network discloses that the data exists as someone elseās private UnpublishedUnsequencedAppendOnlyData, or
(c) they are able to use the file. But if (c) is the case, doesnāt User X run the risk of User Y deleting their data?
What is a ātype tagā? I can see that it is an unsigned 64 bit integer, but what is it for and what are valid values for it? Is there some documentation that I should have read prior to reading this RFC?
This is only true for Immutable Data. For published one address is derived from content, for unpublished one it is derived from content and owner. This means that publishing a chunk created as unpublished in a first step is possible.
For all other data structures (Mutable Data and Appendable Data) address is chosen by user. For them if he wants to publish a chunk he created as unpublished in a first step, he will have to either delete it or change its name.
Hey @bytes, @tfa perfectly answered most of your questions, so I will chime in on this one:
You can think about this as a file extension: you can store two files with the same name but with different extensions, like report.pdf
and report.xls
- and they may or may not represent the same thing. Same idea with type tags: you can store different MutableData / AppendOnlyData objects under the same name but with different type tags, and there will be no conflicts.
In addition to that, certain type tags can be treated in a special way on the Vaults side. For example, type tag 0
is reserved for user accounts and they are handled as such both on the Clients and Vaults side.
This RFC is largely based on RFC 47 - Mutable Data, so reading that one might help.
I like the implementations, but the naming is clunky to read. IMO the following renaming scheme flows easier:
Published*Data ā Public*Data
Unpublished*Data ā Shared*Data
Private*Data ā Private*Data
*Sequenced*Data ā *Versioned*Data
*Unsequenced*Data ā *Data
In the safe-nd crate I see this idea has been implemented and that āsequenceā has been shortened to āseqā for even shorter names.
All this is OK and Mutable data naming conventions are logical: there are 2 structs (SeqMutableData and UnseqMutableData) and one trait (MutableData) that groups common functions in both structs.
For appendable data this is a little more complex with published/unpublished sub-cases and so there are 4 structs (SeqAppendOnlyData<PubPermissions>, UnseqAppendOnlyData<PubPermissions>, SeqAppendOnlyData<UnpubPermissions> and UnseqAppendOnlyData<UnpubPermissions>)
But for immutable the naming doesnāt follow these conventions with 2 structs (UnpubImmutableData and ImmutableData) and one enum (Kind) that encapsulates one of these structs.
āKindā doesnāt mean anything, and the āPubā prefix seems missing in the second enum to me. Following the same logic of the conventions I would have called:
- the 2 structs: UnpubImmutableData and PubImmutableData
- the common enum: ImmutableData
Wouldnāt this be more sensible?
Yes, it would be I think. Speed means we will miss some of these points so nice catch @tfa lets ping @ustulation and @nbaksalyar to comment here. Seems like sense to me though.
Yes, we discussed this problem internally and weāll be making the naming consistent sometime soon. Thanks for the suggestions!
The following changes have recently been made to this RFC:
-
Requester field has been removed from the Requests.
-
BLS-PublicKey has been replaced with the PublicKey enum.
-
Missing RPCs have been added for AppendOnly Data.
-
Missing index field has been added for AData owners and permissions manipulation.
-
Common response type will be used for mutations.