Decentralized Metadata - Calling all developers

Hey everyone,

So for my desktop app I will be implementing a decentralized file metadata system.

I had previously planned to allow the user to specify tracker servers in the application, to which they would upload metadata during the file upload process. On reflection I decided to instead have the users upload this metadata as another file to the autonomi network, and link them via a smart contract.

This ensures the information stays decentralized and accessible to everyone. Indexers can still index this data from the blockchain if they wish to perform specific private computation or search algorithms.

The main selling point of this method is that client applications can query the blockchain via ethers.js or web3js and get the metadata without the need for any servers. The smart contract simply ties two files together, the file itself, and the metadata file. The cost for storing both these xornames depends on the blockchain chosen, but for a L2 it should be low. Each xorname is 32 bytes.

By storing all the metadata in a file it saves from the cost of having it in the smart contract itself.

This is all optional so applications can decide to use it or not, or just to upload the metadata to their own servers.

Here is the smart contract below:

As for the metadata files, it would be good for applications that use this method to agree on a standard so the creation and usage of this metadata would work for all applications.

My proposal would be something along the lines of a json object:

name: string
description: string
extension: string
size (bytes): int
hash: string
additional: array of key/value pairs

Would love some feedback.

11 Likes

This tech on layer 2 base is doing some interesting stuff with the metadata for the ERC20 standard. It is called ERC20i and is completely interoperable.

I collect all the coins on base that have them because of a the unique art autonomously generated on chain.

3 Likes

Unfortunately this would only survive while the erc20 L2 token remains. With the native token planned to replace the ERC20 (on 1:1 w/burn) it would be better to perhaps look at another way.

Having looked at the datamap, it seems this is the perfect place to put the meta data. It is a simple table of chunk address/size/prehash and one overall size value. It is required since it defines the locations of the chunks.

I would look at adding to the code that generates the datamap to place the meta data. Maybe in a way that can co-exist with clients not expecting it.

Another way is to look at a type of directory structure since that is how meta data is stored on disks.

4 Likes

My plan of development is to use the chain as a decentralized clock timestamp for specific data and sync AI dApps.

1 Like

Personally i feel it should be all on the network infrastructure.

Like what happens to the chain when there is like a billion files and growing in the coming years. Like 1000 files for each of the billion people storing their photos etc

3 Likes

Although autonomi only plans to use the L2 for a limited time until the native token is complete, this system would still work as it is only for storing data in a way where everyone knows where to look for it. I don’t see a way of that working on a DHS system.

The option could be enabled to let the user decide what chains to publish this information, and perhaps many of them at once if they were willing to pay the different fees required by those chains.

You mentioned about the datampa for storing the metadata itself, for files I would like to store titles, descriptions, the file hash etc. All of these take up a lot of bytes, and even on a L2 this adds up to high fees. At least with just storing the xornames to both file and metadata file, it’s just 64 bytes which should be manageable.

As mentioned also by LOvisWaTer, there is the added benefit of having a timestamp with a chain that enables some cool features.

I know the plan to to eventually stop using a L2, but it would be really cool for nodes to be able to run a chain node and have this L2 for autonomi metadata and clock/timestamp related features. Of course the token itself should be part of the top layer and not blockchain related.

2 Likes

In the datamap, that is married to the file. That is the place.

Also putting it on the blockchain you violate people’s desire for privacy with their private files

2 Likes

In my application I have a checkbox for metadata that is disabled by default. Should the user choose private upload, or public upload with no metadata it will just upload to autonomi with no metadata. It’s all optional and up to the user.

I’m confused by the line:
In the datamap, that is married to the file. That is the place.

Do you mean the datamap in the solidity contract?
Or some datamap in autonomi itself?

2 Likes

Clear separation of the metadata stored as a file to be stitched together by a Smart Contract imo is conceptually the right way forward in order to scale up any type of capability and in this case keep the capability distributed. AS usual with software success is always in the details. :wink:

What are your thoughts on reducing the attack surface posed by just one Smart Contract running on an EVM to do this?

2 Likes

Initially I envisaged a L1 blockchain that autonomi node runners could opt-in to run. Basically a way where we could ensure ownership and a focus on what would benefit the autonomi network itself.

I know the community has a distaste for blockchains, so should that not happen, then at least at the application level we can add the ability to upload this metadata-linkage to whatever chains the user wants. As long as the application itself can interact with these chains via web3.js or ethers.js or whatever tooling the chain provides it should be possible to keep it distributed without the need for centralized servers.

The more chains used the better of course, but the cost increases.

One concern I had is people uploading files and then seeding fake metadata. I’m weighting up two solutions to this.

1: For this I plan to allow ‘comments’, basically an array of comments that users can submit for each file. This way users can alert if the metadata seems ‘off’.

2: For each file xorname, just allow multiple metadata objects where users can provide metadata on existing files.

As part of the file metadata I will be adding the files sha256 hash, but this of course could be set to anything by a user. On the client side I simply hash a file when it’s downloaded and compare it to the metadata to inform the user if it matches, but it’s not ideal. One way around this, if the autonomi node runners did actually run chain nodes, was to have a system where they would all hash the file uploads and via consensus verify the sha256 of each file that is uploaded.

1 Like

Take a close look at spiderchain and how Botanix uses multisig rotation per epoch to get some ideas.

A similar ‘rotation’ concept could be employed for Autonomi Smart Contracts running in an EVM where the uploading client keeps changing and applying different multi-sigs they control. At really high level its randomized, meaning the user might have a six pack of multi-sigs setup, but they are randomly applied, say two per upload, per upload ‘epoch’ that the user determines (ie hourly, daily, ) to reduce the odds to only say 1hr or 1 day of data uploads having the possibility of being compromised by some hacker breaking the key encryption to that ‘epoch’ of files.

3 Likes

Ethereum and Smart Contracts running ERC20 protocol on the EVM are really a distributed ProofofStake financial Services layer, not really a layer 1 store of value, the latter better equipped to do big deal/balance of settlement, a development direction where a lot of the BTC community wants to see Bitcoin go.

As Such EVM running on Autonomi using ERC protocol imo needs to be architected as a Layer 2 Financial Services capability,

running over top of what will be the Autonomi Native token with L1 settlements stored on the DAG .

Such a design would support multiple L2 side chains with their own VM Types and SCs protected by L1 security (Perhaps using rotating multi-sigs per epoch deriving their security from hashes of L! BTC blocks per L2 epoch for dPOS, the way Botanix is doing it over BTC) handling many different forms of financial services, each with their own fee structure, where inter-chain settlement happens over L1.

Imo it’s not an ‘either or situation’ software architectural decision facing Maidsafe and Autonomi Community.

The software architecture decision is imo a ‘how do we do both situation’ challenge of getting the architecture right so Autonomi does indeed support in the future, many L2 Financial Service chains with their VMs and SCs running over top of an Autonomi Network L1 Store of value facilitated with its own native token, in a private truly distributed way.

A design the bitcoiners will never accomplish to service private interests who WILL want to be in charge of any audit in a permissioned way.

Frankly the BTC fate is sealed (and has been sealed from the start), The USA incoming admin. regime post Nov 5th 2024 is all over it, ready to take over their BTC experiment and make it the public L1 settlement layer and place Eth and Solana over that as L2 Financial Services, and re-rig the US $Stablecoins as Layer 2 as well over Bitcoin.

BTC blockchain can be with AI audited precisely in public at great speed

One can see that if they pay close attn to the BTC chatter all over the web these days, the above is BTC’s fate and it is sealed. :wink:

Autonomi Network’s ‘fate’, and future prospects is not yet sealed,

Therefore Autonomi Network and Maidsafe designing the ERC20 as Level 2 Financial Service layer, which can bridge easily into the new FAST growing world of BTC DEFI Layer also running ERC20 on EVM, is an easy decision to make imo.

Doing the above means the Autonomi Network is still ERC20 gateway ‘firewall’ bridge preserved,

keeping the the autonomy of the Autonomi project in palace,

especially if different types of VMs and SC are also enabled by an L2/L1 architecture which runs L2 VM/SCs over the Autonomi L1 private ‘permissioned audit capable’ DAG and

These L2 Services use the latter to do Inter L2 settlement with ‘permissioned audit’ capabilities (By default everything is private) to keep the operator of these L2 services in control, separating their ‘money from borderless state intervention’.

3 Likes

Still the big question to me is scale. What happens when 100 million or a billion people want to use it for their 1000 to 10’s of thousands of files. What will happen to the blockchain?

Seems to me that scalability will not match Autonomi’s scalability

2 Likes

Well as the metadata storage is optional I doubt a user would decide to pay to upload thousands of files metadata, asi’d imagine the majority of these files would be private. However if they would like their public files to be searchable then I think it’s something they wouldn’t mind paying the small additional fee for.

Smart contracts already store a lot of information for other projects that have tested high throughput like nfts etc. So i’m not sure storing 64 bytes of information (32 per xorname) for each file would be too crazy. And as stated it’s all optional so it will have to be on the applications to opt-in to using a standard.

I just don’t see another alternative for now that can compete in terms of being distributed, and visible to everyone. Should we run into scaling problems, then it means autonomi has succeeded, and i’m sure other solutions will emerge then.

My primary goal in this post it for other application developers to come together and agree on a standard if they are to use this method of metadata storage. This way we can ensure uploaded metadata could be read on all applications.

4 Likes

why would one want to store parts of the files externally and introduce additional costs due to a smart contract needing execution
→ okay - I assume data discovery is the issue that is being solved here :smiley:

guess since you’re the one doing something @safemedia you’re right in what you’re doing and I’m just a confused idiot standing at the sideline xD

1 Like

Yup you got it, it’s primarily about metadata discovery, and to know which file it’s associated with.

The additional costs should be a few cents, depending on the L2. Questions are good!

1 Like

Have you looked at Registers? I’m not getting into the detail of what you are trying to do, but don’t see anything that should not be feasible using the data types on the network.

It may need some creative thinking, but as I say I don’t want to get into detail. I can though tell you that I’ve already created a metadata format for storing websites, and extended it slightly to allow for a simple filesystem. This uses registers to give a fixed entry point, which can be updated by writing a new metadata entry (e.g. for a directory or in my case a directory tree) which includes all the metadata you might want. This provides a history of updates so you never lose a file and can browse back and forward. It uses one chunk per update at the moment, so not expensive to store a new directory listing or directory tree, provided the metadata fits in a single chunk.

Using this you can already publish and view websites on the network. Using the awe CLI you can also inspect the registers that it uses, display the files stored along with the (basic) metadata using (see awe --help and awe registers --help etc.).

I have not published an update of awe for the current network because it is only up for a week. But if you are interested you can clone it, inspect code and run it with a local test network: GitHub - happybeing/awe: A Website Publisher/Browser for Autonomi (demo)

Registers are also still buggy which is another reason I’m not spending time on that atm.

7 Likes

From my limited understanding of registers, I thought they were a system in which to have a single address for a file to share with users, however should the file be updated it would create a new file and update the link to show the latest file address, almost like a versioning system.

I’m not sure people are understanding why this smart contract metadata storage system would be useful. The main thing is discoverability, a central yet decentralized place that everyone can see all the metadata. This would enable search engines and decentralized metadata retrieval.

Because of the way DHT/Kademlia works, being able to search all this metadata would not be feasible as it’s all distributed over countless nodes and would require DHT indexers/crawlers that still couldn’t ensure everything is found, and even still these services would need to be on peoples private servers.

2 Likes

But don’t you just need a list of register addresses somewhere?

Indeed, although with the current design you can’t store a register address in a register. So to do that you need to create a chunk of metadata to store the address and point the register at that.

Things like that make it hard to argue against storing data outside the network for some use cases. So @safemedia has a point even though I’d really struggle to support such a hybrid approach.

Autonomi aren’t even looking at issues like this even as they prepare to release the API which is years old and yet still virtually untested in the world of app development.

2 Likes