Sorry, what I meant was, when we had --json-logging
(or whatever the flag was) on the old node, was this the format of the logs?
I’m just asking because it doesn’t seem very useful to just have one message
field in a JSON document.
Sorry, what I meant was, when we had --json-logging
(or whatever the flag was) on the old node, was this the format of the logs?
I’m just asking because it doesn’t seem very useful to just have one message
field in a JSON document.
Oh no! First error I’ve seen with this testnet:-
time safe files upload testfile
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: ef92331 / main / ef92331
Instantiating a SAFE client...
🔗 Connected to the Network Storing file "testfile" of 19 bytes..
Did not store file "testfile" to all nodes in the close group! Network Error Outbound Error.
Writing 8 bytes to "/home/ubuntu/.safe/client/uploaded_files/file_names_2023-06-20_19-38-17"
real 0m16.102s
user 0m0.488s
sys 0m0.080s
Tried it 3 times with the same result.
Just tried to get a previously working file and got this:-
time safe files download 1.jpeg 361adff80ea7efb05cbf865803a7bfa134b9137fd02efec724f6cad2cf20414d
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: ef92331 / main / ef92331
Instantiating a SAFE client...
🔗 Connected to the Network Downloading file "1.jpeg" with address 361adff80ea7efb05cbf865803a7bfa134b9137fd02efec724f6cad2cf20414d
Did not get file "1.jpeg" from the network! Protocol error Could not Serialize/Deserialize RecordHeader to/from Record.
real 0m16.692s
user 0m0.516s
sys 0m0.138s
Have we broked it?
No, I didn’t enable --json-output or any --json flag in the past few weeks. Only parameters I added to safenode
was --port, --rpc, --log-dir, and --root-dir when spinning them up along with SN_LOG=all.
The format of the logs generated over past 2 weeks for myself at least never contained "message":
, this is the first time I am seeing the above.
Hope, I understood your question properly .
Sorry, what I’m referring to is this.
I’ve interpreted this as meaning “the old node had --json-output, will this get added to the new node?”
If so, I wanted to know if the output I just showed you was the same as the output from the old node if you used --json-output
.
Have I misunderstood things here?
I have been getting the same.
I tried 4 files and all of them downloaded correctly for me.
Yes, I think there is a misunderstanding. I am currently not telling safenode
pid to produce native JSONs.
For, the ReplicationNet (never used --json-output), and for this InstallNet (never used --json-output) or any variant flag for forcing JSONs by safenode
.
My initial post on this topic regarding the statement on extracting a single JSON was only tied to the process metrics line in the existing log format, where the embedded JSON had a PID:
prefix, and ,
suffix on the same line as that of the embedded JSON causing my extraction process to break on that specific line only.
This entire inquiry started with re-using the same dashboard + existing ETL for ReplicationNet logs against InstallNet logs without changing anything on my end, other than safenode
version. The only thing I did was adjust the parsing code on my end to extra the JSON metrics from the normal / default log format, because it had broken earlier (reason noted above).
While letting the safenode pid run for 12+ hrs, and more, when it eventually stored a chunk (successful disk write), I was expecting to see other lines in the logs indicating the RequestId, ChunkAddress, and the StoreChunk command etc, and I haven’t been able to find any of this.
The code above seems to indicate that those are under a TRACE level, but I haven’t dug further as to why the existing raw logs on disk don’t show them, and therefore the dashboard is also empty for those other panels.
OK, thanks for the post.
However, it still doesn’t quite clear things up for me. Where did you originally use --json-output
? Your post implies that it existed at some point, then was removed.
Was this an argument on the current safenode
or on the old sn_node
?
Oh, I was just referring to bringing that specific feature, --json-output
to safenode
, but I believe I saw this before only on the old sn_node
, and thats why I recall that feature from back in the day.
From my understanding, the current safenode
doesn’t support --json-output
at the moment, so I never tried it with the current safenode
versions .
FWIW, I believe the current testnet
indicates another option called --json-logs
, though I never tried that as that is likely an invalid parameter at this time (I am not sure):
./testnet --help | grep json
Specify any additional arguments to pass to safenode on launch, e.g., --json-logs
I have now run out of things to upload and am very impressed so far.
increasing the default time out from 10s to 30s seems to haven fixed all the problems i was having on the previous test net.
just downloading 10k Mp3’s now to see if i can now download them
Cool, thanks.
Yeah, you’re right in saying that the current safenode
does not have --json-output
. I have a branch that’s just added in the argument, and the output I posted earlier is a result of that.
What I wanted to know was, when you used the --json-output
argument on the old node, was the output format roughly the same as what I just posted? I was wondering because, it doesn’t really seem very useful to have a JSON document with just one message
field in it. However, if you do find that useful it’s cool. I just wondered if we needed some extra configuration to produce something else that matched the format of the old node.
I hope that makes sense.
I surprisingly found some old json logs from 01-28-2023 & 02-25-2023 timeframe, and depending on the line, they have one base key fields
, along with sub keys such as message
, and target
etc.:
{"timestamp":"2023-01-28T17:22:17.316901Z","level":"DEBUG","fields":{"message":"updated for section change"},"target":"sn_node::node::messaging::anti_entropy","span":{"name":"handle_anti_entropy_msg"},"spans":[{"origin":"Peer { name: f55020(11110101).., addr: 206.189.27.71:12000 }","send_stream":"None","wire_msg":"WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MsgId(4898..41ef), kind: Node { name: f55020(11110101).., is_join: false } } }, dst: Dst { name: ad36fd(10101101).., section_key: PublicKey(187f..833d) } }","name":"handle_msg"},{"name":"handle_node_msg_with_ae_check"},{"name":"handle_anti_entropy_msg"}]}
{"timestamp":"2023-01-28T17:22:17.316910Z","level":"DEBUG","fields":{"message":"[NODE READ] Latest context read"},"target":"sn_node::node::messaging::anti_entropy","span":{"name":"handle_anti_entropy_msg"},"spans":[{"origin":"Peer { name: f55020(11110101).., addr: 206.189.27.71:12000 }","send_stream":"None","wire_msg":"WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MsgId(4898..41ef), kind: Node { name: f55020(11110101).., is_join: false } } }, dst: Dst { name: ad36fd(10101101).., section_key: PublicKey(187f..833d) } }","name":"handle_msg"},{"name":"handle_node_msg_with_ae_check"},{"name":"handle_anti_entropy_msg"}]}
{"timestamp":"2023-02-25T20:54:40.637803Z","level":"TRACE","fields":{"message":"[NODE READ]: flowctrl node context lock got"},"target":"sn_node::node::flow_ctrl"}
{"timestamp":"2023-02-25T20:54:40.639243Z","level":"TRACE","fields":{"message":"Taking cmd off stack: TryJoinNetwork"},"target":"sn_node::node::flow_ctrl"}
{"timestamp":"2023-02-25T20:54:40.639260Z","level":"TRACE","fields":{"message":"Processing for TryJoinNetwork, id: [0]"},"target":"sn_node::node::flow_ctrl::cmd_ctrl"}
{"timestamp":"2023-02-25T20:54:40.640041Z","level":"INFO","fields":{"message":"[NODE READ]: getting lock for try_join_section"},"target":"sn_node::node::flow_ctrl::dispatcher"}
{"timestamp":"2023-02-25T20:54:40.640059Z","level":"INFO","fields":{"message":"[NODE READ]: got lock for try_join_section"},"target":"sn_node::node::flow_ctrl::dispatcher"}
{"timestamp":"2023-02-25T20:54:40.661534Z","level":"TRACE","fields":{"message":"Cmd \"TryJoinNetwork\" took 21.502869ms"},"target":"sn_node::node::flow_ctrl::dispatcher"}
Note: This is slightly different than your sample JSON for just message
, here the entire line itself is a valid JSON etc, including timestamp, level, and fields all part of a larger parent JSON per line (which I think makes it easier to extract out) and ingest. Granted, the free floating text in areas such as message, would need more input on further classification with sub key/value fields, but it all depends on the overall organizational hierarchy to target (in the long run).
Thanks.
When I enabled JSON in the tracing, that was more what I was expecting to see. A JSON document with all those fields seems far more useful.
It might just be a case of configuring the tracing library further.
I haven’t actually been able to locate the configuration in the old code yet though.
Edit: these also look like the traces from Open Telemetry too.
I fully agree, and the more organized the JSON is with as minimal free floating text strings (could turn some of them into an organized set of ENUM values etc depending on the data being recorded under a proper set of key fields/labels), the easier and faster it will be able to ingest in a ELK stack, or influx/grafana, or any type off collector / forwarder such as telegraf etc.
FWIW, the 01-28-2023 logging came from sn_node v0.72.38
, incase the branch or history still exists there . Thanks!
Did they download?
Downloading now on two separate boxes.
One is home based and one is cloud based
So far 100% success on the downloads but are still in progress.
Just to start on a movie downloading binge so I have more content to test with
You flatten the entry to get the whole log as a JSON document. Here’s what I have now:
{"timestamp":"2023-06-20T21:42:11.782133Z","level":"WARN","message":"No peers given. As `local-discovery` feature is disabled, we will not be able to connect to the network.","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.782446Z","level":"INFO","message":"\nRunning safenode v0.83.40\n=========================","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.786261Z","level":"INFO","message":"Starting node ...","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.787000Z","level":"INFO","message":"Generating a new keypair","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.787759Z","level":"INFO","message":"Node (PID: 232352) with PeerId: 12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.788005Z","level":"INFO","message":"PeerId: 12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX has replication interval of 412.94886394s","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.794507Z","level":"INFO","message":"Local node is listening on \"/ip4/127.0.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.794908Z","level":"INFO","message":"Local node is listening on \"/ip4/192.168.1.86/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.795097Z","level":"INFO","message":"Local node is listening on \"/ip4/172.18.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.795269Z","level":"INFO","message":"Local node is listening on \"/ip4/172.17.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:47.801631Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:43:16.804728Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:43:52.806788Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:44:15.808815Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:44:52.811693Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:45:13.815161Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:45:35.818037Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:46:11.821652Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
I’ll PR this.
The more elaborate data you have in the snippet you posted I think is actually because those are Open Telemetry traces, which are a bit of a different thing.
had first failed download and it was due to running out of hard drive space so id say that was a successful run at it
Failed to create file "New Order - The Rest Of - Confusion [Pump Panel Reconstruction Mix].mp3" with error Os { code: 28, kind: StorageFull, message: "No space left on ^X^C
ubuntu@oracle:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 6075232 0 6075232 0% /dev
tmpfs 1222692 9612 1213080 1% /run
/dev/sda1 101425200 101408820 0 100% /
tmpfs 6113460 0 6113460 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 6113460 0 6113460 0% /sys/fs/cgroup
/dev/loop1 50304 50304 0 100% /snap/core18/2788
/dev/loop0 50304 50304 0 100% /snap/core18/2755
/dev/loop2 60672 60672 0 100% /snap/core20/1895
/dev/loop3 60672 60672 0 100% /snap/core20/1953
/dev/loop4 6400 6400 0 100% /snap/curl/1758
/dev/loop5 94208 94208 0 100% /snap/lxd/23999
/dev/loop8 23296 23296 0 100% /snap/oracle-cloud-agent/55
/dev/loop9 23296 23296 0 100% /snap/oracle-cloud-agent/57
/dev/loop6 6400 6400 0 100% /snap/curl/1583
/dev/loop7 94208 94208 0 100% /snap/lxd/24065
/dev/loop10 47616 47616 0 100% /snap/snapd/19365
/dev/loop11 47616 47616 0 100% /snap/snapd/19459
/dev/sda15 99801 298 99503 1% /boot/efi
tmpfs 1222692 0 1222692 0% /run/user/1001
I try in a windows 10 machine. Install was ok but always the Outbound error on upload:
Storing file “lauren.txt” of 945 bytes…
Did not store file “lauren.txt” to all nodes in the close group! Network Error Outbound Error.
We previously had some LogMarkers
, basically a type that could be used to programatically ensure we were matching the log string (and one place to have all those important log strings that may be useful). This is probably something we should bring back as it makes it clearer what are the ‘used’ log messages, by third parties and gives a bit of a more concrete place to tweak things there.
Hmm, what’s your git commit hash? I wonder if the Recordheader
work (separating the address space for chunks/dbcs etc) might have come in here (as we’re always installing the latest v above, and that was added after we started the testnet). (cc @roland, what logs do we expect now? ).
Also worth noting, if we do have that, a put
could come from DBCs now was well as chunks, and soon registers too. Records
will be all of those things