I am runing it on one of your standard github releases so no but it is heavier on resources than heaptrack (i think)
Is there something that prevents us from building our own safenode I wanted to run --release with debug = true but no go, I also tried regular debug and regular --release none would join.
I used source for sn_node-v0.91.16 no issues with the build, just none of them would join. firsrt try with the github release and no problem.
Anyway in the kerfuffle I killed the first one I had running for a few hours. If you wanted to see the output. @chriso valgrind_output.zip (37.9 KB)
Interesting. Looks like the new version errors out earlier than it should!
Okay, no worries. Interesttting. Will be digging in here over my cuppa.
edit: Exactly the same error here @happybeing. You are definitely on the new version, right? Really odd. I wonder what’s up here then
If it’s erroring out then it’s a transient error we should be ignoring/retrying. The progressbar rejig has got us erroring when we shouldn’t
Each chunk has its own address on the network, uncorrelated to other chunks address’ in the same file. So they go all over the network.
Ah nice, are you seeing anythign thus far? Heaptrack is indicating there’s no real leak. So much as connections require a lot of mem. (ie. Perhaps we’re connecting too much?) But no over leak thus far. I’ll grab an updated sample file and post here in case anyone fancies a poke about.
I am able to upload small files (<100MB) no problem on clients 0.83.26 and 0.83.27, but nothing bigger. Also faucet works for me fine.
Memory consumption on nodes looks more stable than in previous testnets, but I also see a lot less traffic, probably not many people are able to upload.
Aye, it’s a file concurrency issue we have. Excacerbated by removing the redundant concurrency semaphore it seems.
I can repro locally and am working on a fix here.
(This is something we’ll need to tackle in a more comprehensive way for multiple terminal sessions and using existing CNs etc too). Right now though I’m just hoping to unblock single client large uploads.
You probably know this but if safenode frees memory on exit heaptrack won’t report it as a leak, so killing the node may be necessary, or doing multiple runs of different duration to compare where memory allocations may be accumulating over time (but which are released on exit).
I found that with vdash. Another learning was to review all TODO markers in my code before deciding to use heaptrack because when I wrote the code I had made a note to limit memory use in the area heaptrack helped me locate
I’m sure I’m using the new version of client but will check. During the night I woke and realised I forgot to use SN_LOG=All so would you like me to repeat with that?
So it looks like one assumption we’ve made w/r/t RecordNotFound coming from libp2p is off. We thought that meant there was no record at all (vs an error showing how many copies were found). But it looks like that’s not the case.
That would be why the faucet is occasionally failing with could not find errors where it used to be more stable, and it looks like that this is one reason we’re bailing during uploads (bigger files just more likely to hit this).
I’ve some code looking a fair bit more stable on this front going up for review now
Yeh, the leaks reported are likely not leaks as its just ongoing comms. It’s only ~30mb, which is reported to be so on startup and after a couple days as we see (it climbs a bit, but not that much).
We’ll need much longer timeframes to establish a real leak here vs just increasing network knowledge/topology, I feel.
Right now, as the mem seemingly has reduced vs last testnet (again, due to trace instrumenting I think), I’m not overly concerned.
Yesterday was the correct client, confirmed today and trying again:
$ safe -V
sn_cli 0.83.27
$ export SN_LOG=all
$ time safe files upload Videos/
Logging to directory: "/home/mrh/.local/share/safe/client/logs/log_2023-10-06_11-49-56"
Using SN_LOG=all
Built with git version: 3d130db / main / 3d130db
Instantiating a SAFE client...
[snip]
Chunking 129 files...
Input was split into 53451 chunks
Will now attempt to upload them...
⠠ [00:01:54] [>---------------------------------------] 40/53451 Cannot get store cost for NetworkAddress::ChunkAddress(c511b0(11000101).. - - c511b0c7633a16066017633c598da0e9f12b3cdc9025c682362904b30bfae547) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
⠠ [00:02:00] [>---------------------------------------] 40/53451 Cannot get store cost for NetworkAddress::ChunkAddress(4ff0c3(01001111).. - - 4ff0c33e4ef49abadd3ffd08a961a7fbcfcb3992c7782bf8f8664a974d4eed09) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
⠈ [00:02:51] [>---------------------------------------] 58/53451 Cannot get store cost for NetworkAddress::ChunkAddress(17a955(00010111).. - - 17a9557b928f929e952aceec50327a848e0c98d8bc671c41a0ab779dd8ff37e0) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
Error:
0: Transfer Error Failed to send tokens due to Network Error Could not retrieve the record after storing it: fd1c3b3124ede5c312e7be6935760476ddef60431b385ec23ee4c8f35761ba31..
1: Failed to send tokens due to Network Error Could not retrieve the record after storing it: fd1c3b3124ede5c312e7be6935760476ddef60431b385ec23ee4c8f35761ba31.
Location:
sn_cli/src/subcommands/files.rs:210
The above took 16 minutes but is not the issue I had before so I’ll start it again.
The above is also the first time I used SN_LOG so worth noting the different error might be due to that. Previously it hung rather than exit.
Going again…
EDIT: @joshuef same result after 18 minutes so I’ll wait for 808 and try again then.
Error:
0: Failed to repay for record storage for {failed_chunks_batch:?}.
1: Failed to send tokens due to Network Error Could not retrieve the record after storing it: ed98bbd8669f4a5e1d9df76138523aad3a103c481d8b728369bb30a344840dcf.