Uploads to the network

success after uploading with persistent :slight_smile:

safe files upload -p linuxmint-21.2-cinnamon-64bit.iso -r persistent

if anyone else would care to have another bash at it ?

safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 -r persistent
1 Like

from my main pc @ home behind router:

:~$ safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 -r persistent
Logging to directory: "/home/patosh/.local/share/safe/client/logs/log_2024-09-20_08-27-34"
safe client built with git version: 08b0a49 / stable / 08b0a49 / 2024-09-09
Instantiating a SAFE client...
Connecting to the network with 25 peers
šŸ”— Connected to the Network                                                                                                                                                   Downloading "linuxmint-21.2-cinnamon-64bit.iso" from 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 with batch-size 16
Error downloading "linuxmint-21.2-cinnamon-64bit.iso": Chunks error Chunk could not be retrieved from the network: e1b28c(11100001)...
Completed with Ok(()) of execute "Files(Download { file_name: Some(\"linuxmint-21.2-cinnamon-64bit.iso\"), file_addr: Some(\"36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1\"), show_holders: false, batch_size: 16, retry_strategy: Persistent })"

let me try with a lower batch size

1 Like

-p is public, iirc, which would be needed for others to see the chunks.

1 Like

yes it is uploaded with the -p to make it public

download is now failing for me :frowning:

Tried to download 4 times:
3x failed on 5771/5773 and chunk e1b28c(11100001)
1x failed on 5772/5773 and chunk fdf561(11111101)

1 Like

tried on lower batch sizes, always fails on the same chunk e1b28c for me

2 Likes

sha256sum → 116578dda0e03f1421c214acdd66043b586e7afc7474e0796c150ac164a90a2a

is the image I’m currently downloading for trying and adding the failing chunks … since I thought the sha256sum is the network address … I’m not sure it’s the same image you uploaded …?

md5sum linuxmint-21.2-cinnamon-64bit.iso
98213481fef82b9337de96469edb5089  linuxmint-21.2-cinnamon-64bit.iso
3 Likes

and then it stops for a couple of minutes

but eventually completed and I got a good md5 out of it

safe@ubuntu-16gb-nbg1-1:/var/safenode-manager/services$ md5sum linuxmint-21.2-cinnamon-64bit.iso 
98213481fef82b9337de96469edb5089  linuxmint-21.2-cinnamon-64bit.iso

thats on a Hetzner VPS

on my dedicated Hetzner box, it works as well now


safe@wave1-bigbox:~$ md5sum linuxmint-21.2-cinnamon-64bit.iso 
98213481fef82b9337de96469edb5089  linuxmint-21.2-cinnamon-64bit.iso

and also from home :slight_smile:

willie@gagarin:~/Downloads$ md5sum linuxmint-21.2-cinnamon-64bit.iso 
98213481fef82b9337de96469edb5089  linuxmint-21.2-cinnamon-64bit.iso

Thank you for your persistence in this @aatonnomicc, hopefully we will see the same rigour applied in exploring just why it was failing over the past day or so?

Cos problems like these simply cannot be waved away 5 weeks before ā€œlaunchā€.

@chriso @joshuef we see that chunks e1b28c(11100001) and 529e72(01010010) feature most prominently in the error reports above.
What strings should we grep for to best assist you in getting to the bottom of this?

5 Likes
šŸ”— Connected to the Network                                                                                                                   "linuxmint-21.2-cinnamon-64bit.iso" will be made public and linkable
Splitting and uploading "linuxmint-21.2-cinnamon-64bit.iso" into 2491 chunks
**************************************
*          Uploaded Files            *
**************************************
Uploaded "linuxmint-21.2-cinnamon-64bit.iso" to address 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1
Among 2491 chunks, found 2432 already existed in network, uploaded the leftover 59 chunks in 74 minutes 0 seconds
**************************************
*          Payment Details           *
**************************************
Made payment of NanoTokens(62) for 59 chunks
Made payment of NanoTokens(61) for royalties fees

…the nano eater stole me around 250 nanos :open_mouth: … for a file that should have been online anyway …

4 Likes

Can somebody ELIF me on just why after successfully downloading this fabled Mint iso and checking with md5sum, then uploading it again, there was a chunk missing?


**************************************
*          Uploaded Files            *
**************************************
Uploaded "linuxmint-21.2-cinnamon-64bit.iso" to address 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1
Among 5777 chunks, found 5776 already existed in network, uploaded the leftover 1 chunks in 72 minutes 34 seconds
**************************************
*          Payment Details           *
**************************************
Made payment of NanoTokens(1) for 1 chunks
Made payment of NanoTokens(1) for royalties fees
New wallet balance: 0.000004157
Completed with Ok(()) of execute "Files(Upload { file_path: \"linuxmint-21.2-cinnamon-64bit.iso\", batch_size: 4, make_data_public: true, retry_strategy: Persistent })"

If you have any nanos and recently downloaded (and checked with md5sum) the Mint iso, please try uploading it again.
It should say all chunks are already on the network, but if not I expect only a couple will be missing and it should not be an expensive exercise. It would be good to get some other data points on this problem.

PS - chunking a 3Gb iso took the best part of 72 mins on 24CPU Ryzen 9300 with 128GB RAM. Though to be fair it did have 370+ nodes running at the same time.

1 Like

Not so much in my opinion. If a couple of thousand zealous node operators can get around 100K nodes running, which in real terms with 20 nodes average at home is around 5000 home users, we should see that with a growing network the 100K node size will be seen as a very tiny network.

The reason why that is important is that with most operators running 20 to 100 nodes and a growing network that the issue will reduce to a miniscule amount with the probability of it happening reducing dramatically.

At this time running 1000 nodes will see the chance of the 5 closest nodes behind the one NAT router as 0.01 ^ 5 (0.0000000001) chance, but those running 5000 it is 0.05 ^ 5 (0.0000003125) chance which is significantly higher. For small files we see that the chance of that file being affected if something goes wrong is tiny. Only for larger files like 4GB (8000 chunks) does the chance get significant.

Now lets factor in the to be introduced, or has already, upgrade to use any node around the 5 closest that has the chunk to supply it we see the above probability reduce a lot more. Instead of a power of 5, it becomes anywhere upto a power of 10 or 20 because for many chunks more nodes will be holding that chunk. This becomes truer as the network sees churns.

Also as the network grows past 100K to like 1 million then those running 1000 nodes will be a even smaller %age of the network. So the 1000 node runner in a small network (use 1 million nodes) is now only 0.1% giving a 0.001 ^ 5 change of having a 5 closest node group of a chunk. IE 100000 less chance. Factoring in the recent upgrade it becomes such a small chance to be considered better chance of willing the lottery (or lotto) over any node operator having all 5 nodes behind a single NAT router

3 Likes

though i would try a test and mint downloaded fine in 3 min 4 seconds batch size 64 and checksum is good

ubuntu@t1:~$ safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 -r persistent --batch-size 64
Logging to directory: "/home/ubuntu/.local/share/safe/client/logs/log_2024-09-23_20-35-37"
safe client built with git version: 08b0a49 / stable / 08b0a49 / 2024-09-09
Instantiating a SAFE client...
Connecting to the network with 25 peers
šŸ”— Connected to the Network                                                                                                    Downloading "linuxmint-21.2-cinnamon-64bit.iso" from 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 with batch-size 64
Saved "linuxmint-21.2-cinnamon-64bit.iso" at /home/ubuntu/linuxmint-21.2-cinnamon-64bit.iso
File downloaded in 3 minutes 4 seconds 658 milliseconds
Completed with Ok(()) of execute "Files(Download { file_name: Some(\"linuxmint-21.2-cinnamon-64bit.iso\"), file_addr: Some(\"36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1\"), show_holders: false, batch_size: 64, retry_strategy: Persistent })"

ubuntu@t1:~$ md5sum linuxmint-21.2-cinnamon-64bit.iso
98213481fef82b9337de96469edb5089  linuxmint-21.2-cinnamon-64bit.iso

6 Likes

With so many chunks, there’s a chance that on restarting a client (which has a new address and forms a new network view), that it differs in the view of the network, and die to some churn, it doesn’t see the data there.

This should be improved with GetRange work, which will (be in main so soon hopefully), and will sample a larger network space for replication and storage (so more like ā€œminimum 5 peersā€, but actually we need to cover a % space of the network, and will ask that larger space too)

There’s quite a lot of reasons we could not find a chunk at a specific time (Packets can be dropped all the …). If it’s been uploaded, the chances are we’ll be able to get it after some replication, or retries. If there’s been heavy churn, it may take a bit.

(the unwanted :wave: i know! but I don’t see this as a fundamental network issue so much as tweaking of tolerances both network and client side; we’ll firing up more tough churn tests with home nodes too and sampling things to check and assert tolerances over such things. So more rigor here is inbound)

Not necessarily see this thread for more

5 Likes

The Linux Mint iso that @aatonnomicc uploaded is still working! I didn’t even have to use the -r persistent option.

Even though this network has been abused and clearly from the spiralling costs a lot of nodes have been stopped the data seems to be surviving. This is very nice to see!

time safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1
Logging to directory: "/home/ubuntu/.local/share/safe/client/logs/log_2024-09-28_15-04-50"
safe client built with git version: ba09d62 / stable / ba09d62 / 2024-09-24
Instantiating a SAFE client...
Connecting to the network with 25 peers
šŸ”— Connected to the Network                                                                                                               Downloading "linuxmint-21.2-cinnamon-64bit.iso" from 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 with batch-size 16
Saved "linuxmint-21.2-cinnamon-64bit.iso" at /home/ubuntu/testfiles/linuxmint-21.2-cinnamon-64bit.iso
File downloaded in 8 minutes 41 seconds 955 milliseconds
Completed with Ok(()) of execute "Files(Download { file_name: Some(\"linuxmint-21.2-cinnamon-64bit.iso\"), file_addr: Some(\"36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1\"), show_holders: false, batch_size: 16, retry_strategy: Quick })"

real	8m43.627s
user	9m45.416s
sys	3m13.776s
3 Likes

Another couple of datapoints.

From a Hetzner datacentre box

safe@beta2nodes:~$ time safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1
Logging to directory: "/home/safe/.local/share/safe/client/logs/log_2024-09-28_15-27-58"
safe client built with git version: ba09d62 / stable / ba09d62 / 2024-09-24
Instantiating a SAFE client...
Connecting to the network with 25 peers
šŸ”— Connected to the Network                                           Downloading "linuxmint-21.2-cinnamon-64bit.iso" from 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 with batch-size 16
Saved "linuxmint-21.2-cinnamon-64bit.iso" at /home/safe/linuxmint-21.2-cinnamon-64bit.iso
File downloaded in 2 minutes 37 seconds 959 milliseconds
Completed with Ok(()) of execute "Files(Download { file_name: Some(\"linuxmint-21.2-cinnamon-64bit.iso\"), file_addr: Some(\"36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1\"), show_holders: false, batch_size: 16, retry_strategy: Quick })"

real    2m38.873s
user    6m34.629s
sys     3m6.365s

and from home

willie@gagarin:~$ time safe files download "linuxmint-21.2-cinnamon-64bit.iso" 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1
Logging to directory: "/home/willie/.local/share/safe/client/logs/log_2024-09-28_16-28-10"
safe client built with git version: 08b0a49 / stable / 08b0a49 / 2024-09-09
Instantiating a SAFE client...
Connecting to the network with 25 peers
šŸ”— Connected to the Network                                                                                                                   Downloading "linuxmint-21.2-cinnamon-64bit.iso" from 36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1 with batch-size 16
Saved "linuxmint-21.2-cinnamon-64bit.iso" at /home/willie/linuxmint-21.2-cinnamon-64bit.iso
File downloaded in 5 minutes 20 seconds 104 milliseconds
Completed with Ok(()) of execute "Files(Download { file_name: Some(\"linuxmint-21.2-cinnamon-64bit.iso\"), file_addr: Some(\"36dcf903ccd7d4ff67c097eb4d62063bed6be8448baead8d86c3fa0548bb8aa1\"), show_holders: false, batch_size: 16, retry_strategy: Quick })"

real	5m20.737s
user	8m26.652s
sys	3m12.311s
2 Likes

@chriso , per @southside 's latest upload challenges and ERROR throw just posted on DISCORD…

Here is a bit a of logic below that perhaps might help:

The root cause of the ERROR throw (panic code not there)

might be the antnode operator’s use of OS /SWAP config of VRAM

which perhaps is related to node operator use of /swap in further bid

to oversubscribe Disk Space

so as to get more nodes operating in RAM

to win more rewards:

perhaps Maidsafe should consider not throwing an Error in this Upload instance, as it appears to be potentially a

/swap from VRAM(disk) induced by OS

ā€˜wait for the code I need to load’ moment…,

Work around?,

Possibly consider adding to the cli api and echo UI handle with ’ Waiting to Upload’ msg from the closegroup antnode Gossip consensus process responding to an upload client request for quote,

That is,

the consensus closegroup process must check , are their enough antnodes present ā€˜ready’ to provide a quote,

Therefore causing the closegorup, to signal to all upload client (cli or Dave UI App) attempts at that moment a gossip message to the current uploaders requresting quotes,

to signal a ā€˜wait’ message to the current set of upload file/I need a quote process requests triggering a ā€˜quote consensus forum’ to be formed among close group nodes ,

The Gossip msg proposed must be received

through the client cli api which causes the local client to generate

an echo via the cli prompt via the cli instance api

WITH a wait state added to accept a cli (or DAVE UI api) choice

to trigger

local uploader cli or UI uploader logic to:

IF (above msg received)

run a function to echo message and wait for response

to branch to the uploader choices

'Wait 1 minute, or Retry from Upload Start Now, or Stop and upload later, please enter Wait/Retry/Stop ’ (enter W, R or S then confirm w/ Yes or No Y/N ) , with same msging passed up the stack to a UI client api?

Just ā€˜thinking out loud’ here…,

as its likely

the lack of activity asking for quotes from antnodes in the close group

perhaps has had one or more of antnodes in the close group

being slow to respond,

given

more than one of the antnodes in the close group may have their Linux OS configured

to push that object to VRAM, if /SWAP is enabled,

meaning/resulting in:

fewer consensus ā€˜quote forum’ closegroup antnodes responding within the given time (less than 5 antnodes) to provide the five quotes necessary.

That said,

it might be a programmatic consideration for Maidsafe as well,

to keep that quote object in memory by prioritization it
(making it immune to OS VRAM /SWAP actions)

so all antnodes remain responsive to providing quotes as their first priority
( VERSUS every other antnode process/function, including: shuffling chunks around or ; writing and reading copies of chunks to local storage (slightly lower priority); etc.)

As a general observation of node system operator configuration behaviour,

one can see how some node operators might be making deliberate use of the OS VRAM SWAP to further cram more antnodes into system Memory to win more rewards with more nodes operating, by further oversucribing their disks in this /SWAP assisted manner.

which could be contributing to causing the above ERROR behaviour to upload clients…

AND/OR there is another exception use case which comes to mind which may be a contributing factor:

ANOTHER Exception USE case causing same ERROR?:

consider the case, where:

members are leaving the close group and,

there is a temporary shortage of antnodes providing quotes,

Caused by rapid new antnode ā€˜joins’ or ā€˜leaves’ to/from the Autonomi Network

AND, At the same time, (concurrently)

perhaps coinciding exactly at that moment,

There is a surge of upload client requests for quotes to service their uploads within the affected/changing closegroup membership responsbile for providing the quote,

that is causing a drop on responsive ā€˜quote forum’ closegroup member count,

when those closegroup antnodes leaving their old close group to be re-ordered in a new closegroup,

before new assigned close group members join the old close group TO RESPOND QUICKLY ENOUGH TO THE UPLOAD REQUEST

In either exception use case, the bit of logic expressed above,

will also handle both,

to better serve the cli api or UI client api interface handling the upload action,

to provide better overall UI/UX service to the uploader.

The other alternative in the latter exception use case, is to make 'quote forum formation and response the main priority,

before shuffling close group antnode members to another existing close group or the creation of a new closegroup.

Anyway, food for thought, trying to trouble shoot this latest upload ā€˜ERROR’.

(imo the current ā€˜ERROR’ condition expressed by the system during upload to the uploader is indicating the current prioritization of different system behaviour functions facing the uploader client needs to be reordered, to 100% always first serve uploads as the FIRST priority to the uploader, to make this upload ā€˜ERROR’ problem go away, but also deal effectively with ’ OS /SWAP node operator configuration behaviour as well

I hope the above helps.

2 Likes

There is a lot of text there.

I’d suggest a good shunning if a node can’t divvy up the data in time. Over subscribed nodes need to be ejected and replaced by better nodes.

Tbh though, I’ve not seen these sort of issues in a long time. The posts above are from when the CLI was still called ā€˜safe’ even.

For upload issues, that seems more local router related than anything else.

3 Likes

Yeah its hard to describe these two exception use cases that might be causing the problem ā€˜in short hand’, part of the challenge is the taxonomy used to describe the network behaviour is well, transforming form a long running research project into a commercial effort fast… thus straddling both worlds…, plus how the network really works (lately) is strewn over many trailing posts… :expressionless:

Yeah that’s a pretty severe antidote imo,

when the root causes,

and I think there are more than one exception use case contributing to this "ERROR’,

need to be clearly Identified.

This is more about prioritizing uploads in the code imo, before any other function, as that is the main purpose and value add of the network.