I’ve not tried this command but think it will do the trick:
grep -e "5a1c9e\|73b703\|6be651\|63594e\|ab8031" /tmp/safenode/safenode.log
Although really you will need to decompress all the logfiles and change the .log
to *
I’ve not tried this command but think it will do the trick:
grep -e "5a1c9e\|73b703\|6be651\|63594e\|ab8031" /tmp/safenode/safenode.log
Although really you will need to decompress all the logfiles and change the .log
to *
Can anyone explain the shape of memory graph for my node?
This looks suspicious, I doubt that it can be explained by “chunk names”:
memory_used_mb
:
total_mb_written
:
total_mb_read
:
raw data:
snmetr.zip (246.0 KB)
I have idea regarding memory leak:
Here is chart for OutgoingConnectionError
:
Looks like we have correlation with RAM amount.
Probably initial nodes started crashing. Which led to memory leak and further crashing. It is more guess, but worth checking.
raw data:
sn_oce.zip (9.2 KB)
KISS
grep -re "5a1c9e\|73b703\|6be651\|63594e\|ab8031" .
WFM
safe@ubuntu-2gb-nbg1-1:~/.safe/node$ grep -re "5a1c9e\|73b703\|6be651\|63594e\|ab8031" .
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.720153Z TRACE sn_networking::msg] Received request with id: RequestId(40229), req: Cmd(StoreChunk { chunk: Chunk { address: ChunkAddress(63594e(01100011)..) }, payment: None })
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.724630Z TRACE sn_node::api] Handling request: Cmd(StoreChunk { chunk: Chunk { address: ChunkAddress(63594e(01100011)..) }, payment: None })
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.728729Z DEBUG sn_node::api] That's a store chunk in for :63594e(01100011)..
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.728906Z INFO safenode] Currently ignored node event ChunkStored(ChunkAddress(63594e(01100011)..))
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.742260Z TRACE sn_record_store] Wrote record to disk! filename: 63594e7024e857853eca6b68cc13ce04f16c96557af88e59495fbd78fc62e6ce
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:51:32.298346Z TRACE sn_record_store] Retrieved record from disk! filename: 63594e7024e857853eca6b68cc13ce04f16c96557af88e59495fbd78fc62e6ce
./safenode_7/safenode.log.20230609T161523:[2023-06-09T16:13:35.301860Z TRACE sn_record_store] Wrote record to disk! filename: 99b1676629f6ac7384b20afcc2226063594e2f6514febfc
So am I the baddie storing (or not) one of the missing chunks (63594e(01100011)…) ?
./safenode_20/safenode.log.20230609T165646:[2023-06-09T16:50:45.728729Z DEBUG sn_node::api] That’s a store chunk in for :63594e(01100011)…
think ill be the one to jinx it !!
well done to @dirvine and all the team this test net has taken everything i had to throw at it !!
40 gb of mp3’s and movies at 2Gb have all succeeded so it has surpassed anything we had on the old code.
but there is a but here as i am testing on a few different connections
oracle cloud arm instance
40gb mp3’s amd 2Gb crow movie 100% success
Retrieving speedtest.net configuration...
Testing from Oracle Cloud (xx.xx.xx.xx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by M247 Ltd (Manchester) [240.72 km]: 7.228 ms
Testing download speed................................................................................
Download: 1524.03 Mbit/s
Testing upload speed......................................................................................................
Upload: 1239.16 Mbit/s
virgin media fiber broad band
mp3’s were a 100% success for what i tried not the full 40Gb but crow movie at 2Gb was a fail
Retrieving speedtest.net configuration...
Testing from Virgin Media (xx.xx.xxx.xx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Structured Communications (London) [6.96 km]: 31.257 ms
Testing download speed................................................................................
Download: 364.39 Mbit/s
Testing upload speed......................................................................................................
Upload: 52.27 Mbit/s
Vodafone broadband
mp3’s were succeeding at around 10% success rate and the crow was a fail
Retrieving speedtest.net configuration...
Testing from Vodafone UK (xx.xx.xx.xx)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by Wildcard Networks (Newcastle upon Tyne) [399.19 km]: 32.864 ms
Testing download speed................................................................................
Download: 53.49 Mbit/s
Testing upload speed......................................................................................................
Upload: 13.07 Mbit/s
so looks like things are very internet connection dependent from my side id like to see the ability to increase the time out for the next iteration
well done to the team you have surpassed the old code that took years to get to a point that you have now surpassed!!
Updated the charts to include the following:
record_store
folder for each filerecord_store
over timesafenode
pid spawn upObservations:
record_store
in past 24 hrs than the first 24 hrsrecord_store
are the same even after few hrs since last modifiedrecord_store
directory)safenode
pid have risen but that is expected (explained earlier by Maidsafe Team)I haven’t had time to look for odd errors or other interesting messages in the logs yet.
Thanks chap, that’s very kind and I do believe it to be the case.
Yes, this timeout is interesting in my book it should not exist in our code. It’s really a client app thing if it exists at all. I Will explain more
Super helpful
Again, this is why being open is so important. We could not apply the resource needed fro this kind of detail. It’s massively helpful.
We are all building Safe here, every one of us. That is how it always should have been
For my results I was meaning for uploading I’ll do more testing over the weekend to see how I get on with downloading to the relatively slower connections.
Now I have a good bag of uploads to play with thanks to my oracle instance that was able to upload it all.
Seriously impresed with how this is going since the latest reboot
Yes, I believe also uploading we should not timeout. Basically, if you have paid (eventually) then the network must safe the data, it may take time, but it will save it, so we should not timeout but wait as long as it takes. Probably need to analyse the upload code part much more and take that into account in more detail now we seem to have found stability here IMO
It started to oscillate:
Correlation with OutgoingConnectionError
is not so obvious now, but I still think it may be related to memory usage:
I wonder what is the fate of nodes like 12D3KooWHs2FuFcuSHtkt1KdCAKDnXp35EDbkrcx559rp7TMrj9n
.
Will developers share such information or it’s a secret?
@Shu it is interesting that your node missed events mentioned above.
I added tracking the record_store
folder’s total file count & total bytes in real time now.
I also decided to go back to some LXC level - TCP stats and its pretty fascinating just how many TCP connections are established (min / max / mean) even over a course of 90 minutes (the oscillations).
No idea if ~250 TCP connections on average in an established state is considered okay or not for the current size testnet with its X # of total nodes… either way what a dynamic environment! .
Amount of connections is ok I think.
Tor node have ~6k, I2P node have ~4k.
What looks not ok is how often connections are dropped and reestablished.
Such noise also happens when downloading and uploading files with client.
But it’s probably just not optimized yet.
Assuming the time scale is UTC in the images above, yes it seems my node did not experience a rapid spiky oscillations in memory during the same time window.
I have never tried running one of those services at home, so good to know for future reference.
Last memory chart is from 2023-06-09T10:38:07.613266Z
to 2023-06-10T05:57:14.654755Z
. So x
is not exactly time, but conclusion will be the same anyway I think.
I’ve had a node running since nearly the start and I confirm it was a slow start for getting chunks. Then I got a few and they just dribbled in. Things really picked about this time yesterday and last night it went bananas! There were hundreds at 0340 UTC today so maybe there was a big upload or big disconnection?
I now have 734 chunks taking up 310MB.
I’m amazed at the recent progress!
i am still uploading mp3’s going since yesterday early afternoon total uploaded successfully 329Gb
Today my 600MB file:
Not all chunks were retrieved, expected 1243, retrieved 1241
And 1.2GB file:
expected 2233, retrieved 2229
I’m gonna give them anohter try.
And a couple smaller files, that I uploaded on Wednesday, downloaded just fine.
Do it means that for some chunks all 8 nodes holding copies were crashed?
Also such crash should happen at the same time providing no time for nodes to make fresh copies of data.
Two of my nodes now have 1024 chunks. Coincidence, or is that a hard limit?