Updates:
- Added labels for PUT validation and store chunk stages:
- VALIDATE_AND_STORE_CHUNK
- STORING_CHUNK
- VALIDATED_AND_STORED
- Added label for
CHUNK_DELETED
- Added panel for generic duration for the Request ID where # of entries == 2 entries (last minus first )
- Added panel for generic duration for the Request ID where # of entries > 2 entries (last minus first) (high duration due to reason explained below)
- Ignore the 2nd half off the file count panel (data was not being gathered for that panel for few hours), however, it does show oscillations in total # of chunks in
record_store folder in the first half off the node’s lifespan, even after hitting 2048 max record entries
Observations:
- It seems certain Request IDs are being generated again, but likely coming from a different component within
safenode pid in terms of auto-incrementing #s etc. This would make it more difficult to triage or trace something if #s are overlapping in a small time span possibly, unless off-course this is all part of a single request chain, in that case it took 66000 seconds to finish, which I don’t think is the case here. Or, the id originated from external node for the first batch, and at same time, the same # was originated as part off a new request later on within the current safenode. 
[2023-07-09T13:26:35.513186Z TRACE sn_networking::msg] Received request with id: RequestId(73737), req: Cmd(RequestReplication(NetworkAddress::PeerId([0, 36, 8, 1, 18, 32, 97, 141, 115, 255, 181, 244, 206, 81, 142, 146, 231, 153, 104, 10, 213, 212, 241, 10, 163, 2, 198, 13, 195, 190, 63, 153, 71, 255, 59, 67, 8, 71])))
[2023-07-09T13:26:35.666447Z TRACE sn_networking::msg] ResponseSent for request_id: RequestId(73737) and peer: PeerId("12D3KooWGPApMpUPuyjxo2Qzz2kKy2SLgrhtZ7k2sdBFQ7aujH82")
...
[2023-07-10T07:47:40.049038Z TRACE sn_networking::cmd] Sending request RequestId(73737) to peer PeerId("12D3KooWGXuo2vzYHWwbwHg45BuLstegGcyHHyWEzruSQYqRUN6x")
[2023-07-10T07:47:40.722538Z TRACE sn_networking::msg] Got response for id: RequestId(73737), res: Cmd(Replicate(Ok(()))).
-
VALIDATE_AND_STORE_CHUNK (53454) → STORING_CHUNK (53448) → VALIDATED_AND_STORED (53332) → CHUNK_WRITTEN (48538). Should some of these have been exactly equal #s?
-
There were some immediate failures associated with no such file or directory following a PUT verified record entry (291 times). Is that expected?:
[2023-07-09T02:05:55.981021Z TRACE sn_networking::msg] Got response for id: RequestId(172), res: Query(GetReplicatedData(Ok((NetworkAddress::PeerId([0, 36, 8, 1, 18, 32, 9, 200, 155, 91, 60, 252, 64, 90, 20, 62, 248, 218, 201, 14, 128, 12, 150, 40, 221, 152, 168, 38, 15, 29, 138, 78, 224, 3, 225, 173, 57, 105]), Chunk(ChunkWithPayment { chunk: Chunk { address: ChunkAddress(004193(00000000)..) }, payment: None }))))).
[2023-07-09T02:05:55.981150Z TRACE sn_node::api] NetworkEvent::ResponseReceived Query(GetReplicatedData(Ok((NetworkAddress::PeerId([0, 36, 8, 1, 18, 32, 9, 200, 155, 91, 60, 252, 64, 90, 20, 62, 248, 218, 201, 14, 128, 12, 150, 40, 221, 152, 168, 38, 15, 29, 138, 78, 224, 3, 225, 173, 57, 105]), Chunk(ChunkWithPayment { chunk: Chunk { address: ChunkAddress(004193(00000000)..) }, payment: None })))))
[2023-07-09T02:05:55.981180Z DEBUG sn_node::api] Chunk received for replication: 004193(00000000)..
[2023-07-09T02:05:55.981190Z DEBUG sn_node::put_validation] validating and storing chunk 004193(00000000)..
[2023-07-09T02:05:55.981243Z DEBUG sn_node::put_validation] Storing chunk 004193(00000000).. as Record locally
[2023-07-09T02:05:55.981258Z DEBUG sn_networking] Writing Record locally, for Key(b"\0A\x93\x06\x85\xa7x\x8au\xc3\xf59\xb3\x8a\x13\xaf\xca\xeaU\xf0\x95\x90 \xd7J\xf1\xfdO\x88\xc7\x150") - length 2055
[2023-07-09T02:05:55.981288Z TRACE sn_node::api] ReplicatedData::Chunk with ChunkAddress(004193(00000000)..) has been validated and stored. StoredSuccessfully
[2023-07-09T02:05:55.981319Z TRACE sn_networking::record_store] PUT a verified Record: Key(b"\0A\x93\x06\x85\xa7x\x8au\xc3\xf59\xb3\x8a\x13\xaf\xca\xeaU\xf0\x95\x90 \xd7J\xf1\xfdO\x88\xc7\x150")
[2023-07-09T02:05:55.981405Z ERROR sn_networking::record_store] Error writing file. filename: 0041930685a7788a75c3f539b38a13afcaea55f0959020d74af1fd4f88c71530, error: Os { code: 2, kind: NotFound, message: "No such file or directory" }
- For all Request IDs where there was only 2 entries per given unique Request ID, the duration was between 1ms and 342 seconds… wide range, but this would have to be further broken down by type of request to see if there is underlying performance issue here or not (TBD).
- Connection refused and operation timeout are expected on and off as part of the dynamic nature of this network, so I don’t see anything off a concern here (the #s seem pretty low in general)

- CHUNK_WRITTEN minus CHUNK_DELETED roughly equals the *Last file counts under record_store: ~674+

Note: Will dig in further in a few hours from now, but the above was a first pass on my current safenode logs.