Below you will find a breakdown of what we’ve been focused on this week, with some additional context for clarity where needed. Related to clarity, alongside the continuing dialogue concerning emissions, there have been some questions raised concerning the automatic node upgrade work stream that’s currently underway. As we hope you already appreciate, the team are aiming to get to a stage, as soon as is feasible, where MaidSafe is no longer having to do upgrades (automatic or otherwise).
For the network to be truly autonomous it simply cannot have a dependency on a specific group or entity, no matter who they are, no matter their motivations. But until we arrive at that place (which we will), the team at MaidSafe want to ensure that smaller changes and updates can be easily and readily deployed, with as little disruption to node operators and the network as possible (these changes will all of course have release notes in order for them to be openly tracked).
At this early stage of the network’s life, large node operators hold a lot influence over the network’s performance, automatically staging their updates (as opposed to them happening all at once via a script) will also go a long way to supporting ongoing network stability. You can read more regarding the node upgrade process and design in the relevant section below.
Core Network (Performance Improvement)
Following last week’s release, it has been observed that the performance of the network has improved, this is based on only circa 15% of the nodes having upgraded (at time of this being published).
Performance improvements have meant that upload speed has increased. In more general terms, the network is also responding faster across the board - this is with the exception of quoting, which appears to have slowed due to the small number of nodes running the latest version.
While the size of data chunks (being reduced) has had a direct impact on performance, the team also note that there are a number of factors contributing to the improved speed - from the previous bug fixes to replication range changes that are now in and working together.
With the above in mind we would be grateful if those who have yet to upgrade would consider doing so over the coming few days - noting that emission (for now) are still able to be received by the previous node version.
Node Running (Data Hosting)
Testing for the automatic node upgrade went to larger scale this week, with the team running testnets of 1,000 and 5,000 nodes. The results from both these fully followed and satisfied the requirements/criteria that have been set in place to protect the network.
Just to ensure we’re aligned with the community on how auto node upgrading will work, the design is as follows:
Nodes will check every 3 days for updates, if one is available, using a deterministic but random distribution they will begin to start to update over a period of 3 days.
This will help smoothly upgrade the network to newer versions, keeping churn to a minimum
Merkle Tree (Data Upload Payments)
This week we created a local dev net in order to do a payment test for Merkle live on blockchain (we made reference to this last week). This means the team were able to make and track a Merkle tree payment and see that it was successful. A real contract and a real payment.
Previously we had a limit of 4GB per tree, which meant files were restricted to 4GB. We have now updated the architecture, so that many trees can work alongside each other. This means the size of files can now be limitless, a user pays per tree, i.e. for each 4GB. By way of example an 8GB file, would equate to a total cost, equivalent to 2x trees (gas payments).
We are now moving onto our penultimate testing phase this will be conducted on a testnet of circa 10,000 nodes, using the Sepolia chain. It is being deployed to confirm that:
Merkle payments work at scale
the upload error rate is extremely low (we will collect errors as/if they occur)
the node cpu/mem load is similar after the upgrade
large files (> 4GB) can be uploaded
gas fees/payments have been eviscerated
Indelible (Organisational Tool for Data Uploads)
User account management is progressing well, this covers all access to the Indelible application, including UI and API
The configuration API is still underway, there will be a security audit scheduled in the coming week or so (post this we will be able to release a demo) - the generation of documentation (open API spec) is also being worked on - currently in review.
Dave (Prototype Product for Development Updates)
Latest version (0.4.1) was released on Tuesday, December 02. You can find the link for GitHub HERE
Mobile Bindings (Mobile Application Building)
While the first phase has been completed (please see last week’s demo), the team is continuing to work at pace on data streaming, archives and vaults (file directory upload and download). Another notable element that is underway is the push to get the work ready for release, which includes automating the deployment to a package manager for Android (JitPack). iOS deployments will be worked on once Android version is available.
The network has no centralised concept of time, but the nodes use time: intervals, delays, callbacks, timeouts. All of these use time for internal operations, but there is nothing syncing up time across nodes.
I came up with thread::sleep(Duration::from_secs(259200)); but this caused the devs to scream, so I left them to do it properly.
Oh, interesting! What size are they being reduced to?
Presumably, this is because lots of small chunks doesn’t impact the cost of the merkle tree, with the gas prices remaining constant? I’m assuming smaller chunks have more general network performance benefits too (I remember pre-launch test nets implying this too, but not sure it was conclusive)?
This could also benefit the upload costs for smaller files, as nodes start to fill up (and charge more, relative to gas prices).
Sounds like a lot of progress on many important fronts.
The fact that network performance has been improving despite only 15% nodes upgrading is certainly promising… let’s hope it keeps improving as more nodes upgrade.
Lots to look forward to with Merkle Tree payments, Indelible, and mobile bindings all on their way in the coming weeks/months.
I am now confused since last week you said (or implied) record (chunk) size was not being affected. Now you say its being reduced? Can you please provide a brief explanation of what is the difference between stream size and also how it relates to record size. Chunks are records after all. If its being reduced then that is a good thing in my book, and increase it in 5 or 10 years when the run of the mill hardware the ISPs supply can better handle it.
Still has the issue of a very common and actually normal practise of file linking so that the binary will only exist once on a machine and all the nodes would use that rather than a copy in each node’s directory. The link makes it appear as if there is a separate copy but its all the one physical copy. Thus when one node updates all will actually update. And due to a running node binary no longer having a consistent binary on disk it will likely crash within a shortish time after the physical binary changes. Unintended consequences and will see maybe 5K nodes crash within shortish time of each other.
SOLUTION Tring to bloody well help here
Solution the binary has version as part of the name so on update there will be 2 copies, the new one the updated nodes will use and the ones not yet updated will continue to use the old physical copy. And if the operator wants to continue to use file linking then they can relink it and all good, everyone happy and no large numbers going offline super quick.
I still support good practises which forced upgrading no matter what is bad practise and a problem that will happen in the future at some time. Better have disaster recovery plans. Such as when a node comes back up (prob prev version) it can recover its data to the network.
I wonder the names of all the people who said 4MB was a tad too large for a hell of a lot of home routers. Hmmmm.
@Nic_Dorman Can I ask the status on having nodes declared bad if the version of the Autonomi protocols they are using is too old? That used to be a thing that was going to be done, but have not heard anything for some time.
The idea could be to upgrade to a version and when enough have upgraded for a viable network then push another update that declares the version before as bad.
EG
version x,01 - now old version that you want gone
version x.02 - the version that has important fixes and the network needs to upgrade to
some days later/weeks change the emissions to not pay version x.01 any emissions
when enough people have upgraded to version x.02 to have a viable network then push out version x.03
version x.03 has the only change to consider any node running version x.01 or before as a bad node after making sure to replicate/churn the data off it if the records are in close neighbourhood to the node. This process of course would take time to do any replication/churn needed before marking it as bad.
Remember even forced upgrades can be bypassed in either code changes, host file changes to exclude the host hosting the upgrade binary, and other tricks which are many and one LLM request away for those who don’t know them.
This is a 2 stage process and once the code/procedure is in place it will be easy to implement with future upgrades and ensures no data is lost because of the replication/churn done prior to consider an old version is bad.
It does not need to be one version apart and could be a normal procedure built into nodes for all upgrades where there is a declared minimum protocol version required to be considered a good node.
I have quite a lot of data chunks already forced offline. Meaning that those data chunks who uploaded to my server already gone(data lost), maybe around 1-2TB data chunks(I haven’t count). No matter those are good nodes with plenty of space. I would like to build the hardware to simplify the node operations, so more users can run nodes with well designed routers in much cheaper, low power consumption and running 24/7 without much hassle. Just hard to work with these changes.
Get the head out of the butt and stop playing around with Risk-V which is not ready for several more years in the future and do something which really is productive for the network.
I wonder if reducing max window size would not have also had a much wider benefit that just for streaming.
Its a lot to do with ISP routers not having enough memory to handle multiple 4MB record transfers at once. For some 3 records of 4MB at same time will cause failures of transfers, and it will affect every transfer including file downloads.
Reducing the max window size will prevent all the initial errors and consequential retries. As data fills the network this will become more pronounced and worse effect than what they saw with streaming.
Shame no one listens. This issue was pointed out a long time ago from experience and since max window size was never changed then streaming got hit and in the future only high performance routers will be able to effectively download files or have nodes behind them.
On the slightly bright side it seems now that it’s OK to have various maximum window sizes on the same network. So in the future it might be possible to tune this parameter and distribute separate node version, that’s good for everyone.