[MemDebugNet] [4/10/23 Testnet] [Offline]

I am runing it on one of your standard github releases so no but it is heavier on resources than heaptrack (i think)

Is there something that prevents us from building our own safenode I wanted to run --release with debug = true but no go, I also tried regular debug and regular --release none would join.
I used source for sn_node-v0.91.16 no issues with the build, just none of them would join. firsrt try with the github release and no problem.

Anyway in the kerfuffle I killed the first one I had running for a few hours. If you wanted to see the output. @chriso
valgrind_output.zip (37.9 KB)

1 Like

It’s always the same address :slight_smile:

80744b3d25bab269cab54e8baccf4f54f1aa01615230b99171bc3576c1ca7230
4 Likes

I can supply one:-

time safe files download BegBlagandSteal.mp3 80744b3d25bab269cab54e8baccf4f54f1aa01615230b99171bc3576c1ca7230

and it is the same address every time.

Interesting. Looks like the new version errors out earlier than it should! :eyes:

Okay, no worries. Interesttting. Will be digging in here over my cuppa.

edit: Exactly the same error here @happybeing. You are definitely on the new version, right? Really odd. I wonder what’s up here then :thinking: :male_detective:

If it’s erroring out then it’s a transient error we should be ignoring/retrying. The progressbar rejig has got us erroring when we shouldn’t

Each chunk has its own address on the network, uncorrelated to other chunks address’ in the same file. So they go all over the network.

Ah nice, are you seeing anythign thus far? Heaptrack is indicating there’s no real leak. So much as connections require a lot of mem. (ie. Perhaps we’re connecting too much?) But no over leak thus far. I’ll grab an updated sample file and post here in case anyone fancies a poke about.

2 Likes

Yesterday, with Cli 0.83.27, I was able to upload some relatively small files (less than 80-100 chunks) and the faucet worked.

Today any upload of any size fails.

1 Like

I am able to upload small files (<100MB) no problem on clients 0.83.26 and 0.83.27, but nothing bigger. Also faucet works for me fine.

Memory consumption on nodes looks more stable than in previous testnets, but I also see a lot less traffic, probably not many people are able to upload.

1 Like

Aye, it’s a file concurrency issue we have. Excacerbated by removing the redundant concurrency semaphore it seems.

I can repro locally and am working on a fix here.

(This is something we’ll need to tackle in a more comprehensive way for multiple terminal sessions and using existing CNs etc too). Right now though I’m just hoping to unblock single client large uploads.

3 Likes

Not sure if it is relevant, but it fails with various --batch-size settings, even with --batch-size 1.

2 Likes

You probably know this but if safenode frees memory on exit heaptrack won’t report it as a leak, so killing the node may be necessary, or doing multiple runs of different duration to compare where memory allocations may be accumulating over time (but which are released on exit).

I found that with vdash. Another learning was to review all TODO markers in my code before deciding to use heaptrack because when I wrote the code I had made a note to limit memory use in the area heaptrack helped me locate :man_facepalming:

I’m sure I’m using the new version of client but will check. During the night I woke and realised I forgot to use SN_LOG=All so would you like me to repeat with that?

1 Like

So it looks like one assumption we’ve made w/r/t RecordNotFound coming from libp2p is off. We thought that meant there was no record at all (vs an error showing how many copies were found). But it looks like that’s not the case.

That would be why the faucet is occasionally failing with could not find errors where it used to be more stable, and it looks like that this is one reason we’re bailing during uploads (bigger files just more likely to hit this).

I’ve some code looking a fair bit more stable on this front going up for review now

4 Likes

Yes please.

Yeh, the leaks reported are likely not leaks as its just ongoing comms. It’s only ~30mb, which is reported to be so on startup and after a couple days as we see (it climbs a bit, but not that much).

We’ll need much longer timeframes to establish a real leak here vs just increasing network knowledge/topology, I feel.

Right now, as the mem seemingly has reduced vs last testnet (again, due to trace instrumenting I think), I’m not overly concerned.

5 Likes

Yesterday was the correct client, confirmed today and trying again:

$ safe -V
sn_cli 0.83.27
$ export SN_LOG=all
$ time safe files upload Videos/
Logging to directory: "/home/mrh/.local/share/safe/client/logs/log_2023-10-06_11-49-56"
Using SN_LOG=all
Built with git version: 3d130db / main / 3d130db
Instantiating a SAFE client...
[snip]
Chunking 129 files...
Input was split into 53451 chunks
Will now attempt to upload them...
⠠ [00:01:54] [>---------------------------------------] 40/53451                Cannot get store cost for NetworkAddress::ChunkAddress(c511b0(11000101).. -  - c511b0c7633a16066017633c598da0e9f12b3cdc9025c682362904b30bfae547) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
⠠ [00:02:00] [>---------------------------------------] 40/53451                Cannot get store cost for NetworkAddress::ChunkAddress(4ff0c3(01001111).. -  - 4ff0c33e4ef49abadd3ffd08a961a7fbcfcb3992c7782bf8f8664a974d4eed09) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
⠈ [00:02:51] [>---------------------------------------] 58/53451                Cannot get store cost for NetworkAddress::ChunkAddress(17a955(00010111).. -  - 17a9557b928f929e952aceec50327a848e0c98d8bc671c41a0ab779dd8ff37e0) with error CouldNotSendMoney("Network Error Not enough store cost prices returned from the network to ensure a valid fee is paid.")
Error: 
   0: Transfer Error Failed to send tokens due to Network Error Could not retrieve the record after storing it: fd1c3b3124ede5c312e7be6935760476ddef60431b385ec23ee4c8f35761ba31..
   1: Failed to send tokens due to Network Error Could not retrieve the record after storing it: fd1c3b3124ede5c312e7be6935760476ddef60431b385ec23ee4c8f35761ba31.

Location:
   sn_cli/src/subcommands/files.rs:210

The above took 16 minutes but is not the issue I had before so I’ll start it again.

The above is also the first time I used SN_LOG so worth noting the different error might be due to that. Previously it hung rather than exit.

Going again…

EDIT: @joshuef same result after 18 minutes so I’ll wait for 808 and try again then.

1 Like

Pr is looking healthier:

Uploaded ubuntu-18.04-desktop-amd64.iso to 6cfa28d385d5af711893744362aaa32e9116aacce06287614163c20e1b5064df

9 Likes

That one shooould be fixed in 808 :crossed_fingers:

4 Likes

Hey, thanks a lot, they are super helpful, especially for us CLI noobs.

Maybe the post should be splitted to it’s own topic, so that they are easy to find, @neo?

1 Like

For just one, I’d suggest someone starts the topic as a wiki and anyone can add their one liner. I’d suggest Community as the category.

4 Likes

Loving how the client is able to evolve, forced by the node simplicity, but also enabled by the same quality … go ants!

4 Likes

Error even on repay…

Error:
0: Failed to repay for record storage for {failed_chunks_batch:?}.
1: Failed to send tokens due to Network Error Could not retrieve the record after storing it: ed98bbd8669f4a5e1d9df76138523aad3a103c481d8b728369bb30a344840dcf.

Location:
sn_cli/src/subcommands/files.rs:430

Of course it is- what magnitude of brainfart was I suffering when I asked that question?

Note that although I was being stupid, it was not a stupid question.

Now the fact that a certain piece of data will always have the same XOR address is reinforced for many <— A Good Thing

1 Like

pr 808 has just been merged, the next client release should improve things here. (in ~45 mins perhaps)

8 Likes