InstallNet [2023-06-19 Testnet] [Offline]

Sorry, what I meant was, when we had --json-logging (or whatever the flag was) on the old node, was this the format of the logs?

I’m just asking because it doesn’t seem very useful to just have one message field in a JSON document.

1 Like

Oh no! First error I’ve seen with this testnet:-

time safe files upload testfile
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: ef92331 / main / ef92331
Instantiating a SAFE client...
🔗 Connected to the Network                                                                         Storing file "testfile" of 19 bytes..
Did not store file "testfile" to all nodes in the close group! Network Error Outbound Error.
Writing 8 bytes to "/home/ubuntu/.safe/client/uploaded_files/file_names_2023-06-20_19-38-17"

real	0m16.102s
user	0m0.488s
sys	0m0.080s

Tried it 3 times with the same result.

Just tried to get a previously working file and got this:-

time safe files download 1.jpeg 361adff80ea7efb05cbf865803a7bfa134b9137fd02efec724f6cad2cf20414d
Removed old logs from directory: "/tmp/safe-client"
Logging to directory: "/tmp/safe-client"
Built with git version: ef92331 / main / ef92331
Instantiating a SAFE client...
🔗 Connected to the Network                                                                         Downloading file "1.jpeg" with address 361adff80ea7efb05cbf865803a7bfa134b9137fd02efec724f6cad2cf20414d
Did not get file "1.jpeg" from the network! Protocol error Could not Serialize/Deserialize RecordHeader to/from Record.

real	0m16.692s
user	0m0.516s
sys	0m0.138s

Have we broked it?

No, I didn’t enable --json-output or any --json flag in the past few weeks. Only parameters I added to safenode was --port, --rpc, --log-dir, and --root-dir when spinning them up along with SN_LOG=all.

The format of the logs generated over past 2 weeks for myself at least never contained "message": , this is the first time I am seeing the above.

Hope, I understood your question properly :smiley: .

1 Like

Sorry, what I’m referring to is this.

I’ve interpreted this as meaning “the old node had --json-output, will this get added to the new node?”

If so, I wanted to know if the output I just showed you was the same as the output from the old node if you used --json-output.

Have I misunderstood things here?

I have been getting the same.

I tried 4 files and all of them downloaded correctly for me.

Yes, I think there is a misunderstanding. I am currently not telling safenode pid to produce native JSONs.

For, the ReplicationNet (never used --json-output), and for this InstallNet (never used --json-output) or any variant flag for forcing JSONs by safenode.

My initial post on this topic regarding the statement on extracting a single JSON was only tied to the process metrics line in the existing log format, where the embedded JSON had a PID: prefix, and , suffix on the same line as that of the embedded JSON causing my extraction process to break on that specific line only.

This entire inquiry started with re-using the same dashboard + existing ETL for ReplicationNet logs against InstallNet logs without changing anything on my end, other than safenode version. The only thing I did was adjust the parsing code on my end to extra the JSON metrics from the normal / default log format, because it had broken earlier (reason noted above).

While letting the safenode pid run for 12+ hrs, and more, when it eventually stored a chunk (successful disk write), I was expecting to see other lines in the logs indicating the RequestId, ChunkAddress, and the StoreChunk command etc, and I haven’t been able to find any of this.

The code above seems to indicate that those are under a TRACE level, but I haven’t dug further as to why the existing raw logs on disk don’t show them, and therefore the dashboard is also empty for those other panels.

1 Like

OK, thanks for the post.

However, it still doesn’t quite clear things up for me. Where did you originally use --json-output? Your post implies that it existed at some point, then was removed.

Was this an argument on the current safenode or on the old sn_node?

Oh, I was just referring to bringing that specific feature, --json-output to safenode, but I believe I saw this before only on the old sn_node, and thats why I recall that feature from back in the day.

From my understanding, the current safenode doesn’t support --json-output at the moment, so I never tried it with the current safenode versions :smiley: .

FWIW, I believe the current testnet indicates another option called --json-logs, though I never tried that as that is likely an invalid parameter at this time (I am not sure):

./testnet --help | grep json
            Specify any additional arguments to pass to safenode on launch, e.g., --json-logs

I have now run out of things to upload and am very impressed so far.

increasing the default time out from 10s to 30s seems to haven fixed all the problems i was having on the previous test net.

just downloading 10k Mp3’s now to see if i can now download them

11 Likes

Cool, thanks.

Yeah, you’re right in saying that the current safenode does not have --json-output. I have a branch that’s just added in the argument, and the output I posted earlier is a result of that.

What I wanted to know was, when you used the --json-output argument on the old node, was the output format roughly the same as what I just posted? I was wondering because, it doesn’t really seem very useful to have a JSON document with just one message field in it. However, if you do find that useful it’s cool. I just wondered if we needed some extra configuration to produce something else that matched the format of the old node.

I hope that makes sense.

I surprisingly found some old json logs from 01-28-2023 & 02-25-2023 timeframe, and depending on the line, they have one base key fields, along with sub keys such as message, and target etc.:

{"timestamp":"2023-01-28T17:22:17.316901Z","level":"DEBUG","fields":{"message":"updated for section change"},"target":"sn_node::node::messaging::anti_entropy","span":{"name":"handle_anti_entropy_msg"},"spans":[{"origin":"Peer { name: f55020(11110101).., addr: 206.189.27.71:12000 }","send_stream":"None","wire_msg":"WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MsgId(4898..41ef), kind: Node { name: f55020(11110101).., is_join: false } } }, dst: Dst { name: ad36fd(10101101).., section_key: PublicKey(187f..833d) } }","name":"handle_msg"},{"name":"handle_node_msg_with_ae_check"},{"name":"handle_anti_entropy_msg"}]}
{"timestamp":"2023-01-28T17:22:17.316910Z","level":"DEBUG","fields":{"message":"[NODE READ] Latest context read"},"target":"sn_node::node::messaging::anti_entropy","span":{"name":"handle_anti_entropy_msg"},"spans":[{"origin":"Peer { name: f55020(11110101).., addr: 206.189.27.71:12000 }","send_stream":"None","wire_msg":"WireMsg { header: WireMsgHeader { version: 1, msg_envelope: MsgEnvelope { msg_id: MsgId(4898..41ef), kind: Node { name: f55020(11110101).., is_join: false } } }, dst: Dst { name: ad36fd(10101101).., section_key: PublicKey(187f..833d) } }","name":"handle_msg"},{"name":"handle_node_msg_with_ae_check"},{"name":"handle_anti_entropy_msg"}]}

{"timestamp":"2023-02-25T20:54:40.637803Z","level":"TRACE","fields":{"message":"[NODE READ]: flowctrl node context lock got"},"target":"sn_node::node::flow_ctrl"}
{"timestamp":"2023-02-25T20:54:40.639243Z","level":"TRACE","fields":{"message":"Taking cmd off stack: TryJoinNetwork"},"target":"sn_node::node::flow_ctrl"}
{"timestamp":"2023-02-25T20:54:40.639260Z","level":"TRACE","fields":{"message":"Processing for TryJoinNetwork, id: [0]"},"target":"sn_node::node::flow_ctrl::cmd_ctrl"}
{"timestamp":"2023-02-25T20:54:40.640041Z","level":"INFO","fields":{"message":"[NODE READ]: getting lock for try_join_section"},"target":"sn_node::node::flow_ctrl::dispatcher"}
{"timestamp":"2023-02-25T20:54:40.640059Z","level":"INFO","fields":{"message":"[NODE READ]: got lock for try_join_section"},"target":"sn_node::node::flow_ctrl::dispatcher"}
{"timestamp":"2023-02-25T20:54:40.661534Z","level":"TRACE","fields":{"message":"Cmd \"TryJoinNetwork\" took 21.502869ms"},"target":"sn_node::node::flow_ctrl::dispatcher"}

Note: This is slightly different than your sample JSON for just message, here the entire line itself is a valid JSON etc, including timestamp, level, and fields all part of a larger parent JSON per line (which I think makes it easier to extract out) and ingest. Granted, the free floating text in areas such as message, would need more input on further classification with sub key/value fields, but it all depends on the overall organizational hierarchy to target (in the long run).

Thanks.

When I enabled JSON in the tracing, that was more what I was expecting to see. A JSON document with all those fields seems far more useful.

It might just be a case of configuring the tracing library further.

I haven’t actually been able to locate the configuration in the old code yet though.

Edit: these also look like the traces from Open Telemetry too.

I fully agree, and the more organized the JSON is with as minimal free floating text strings (could turn some of them into an organized set of ENUM values etc depending on the data being recorded under a proper set of key fields/labels), the easier and faster it will be able to ingest in a ELK stack, or influx/grafana, or any type off collector / forwarder such as telegraf etc.

FWIW, the 01-28-2023 logging came from sn_node v0.72.38, incase the branch or history still exists there :smiley: . Thanks!

1 Like

Did they download?

1 Like

Downloading now on two separate boxes.

One is home based and one is cloud based

So far 100% success on the downloads but are still in progress.

Just to start on a movie downloading binge so I have more content to test with

4 Likes

You flatten the entry to get the whole log as a JSON document. Here’s what I have now:

{"timestamp":"2023-06-20T21:42:11.782133Z","level":"WARN","message":"No peers given. As `local-discovery` feature is disabled, we will not be able to connect to the network.","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.782446Z","level":"INFO","message":"\nRunning safenode v0.83.40\n=========================","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.786261Z","level":"INFO","message":"Starting node ...","target":"safenode"}
{"timestamp":"2023-06-20T21:42:11.787000Z","level":"INFO","message":"Generating a new keypair","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.787759Z","level":"INFO","message":"Node (PID: 232352) with PeerId: 12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.788005Z","level":"INFO","message":"PeerId: 12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX has replication interval of 412.94886394s","target":"sn_networking"}
{"timestamp":"2023-06-20T21:42:11.794507Z","level":"INFO","message":"Local node is listening on \"/ip4/127.0.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.794908Z","level":"INFO","message":"Local node is listening on \"/ip4/192.168.1.86/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.795097Z","level":"INFO","message":"Local node is listening on \"/ip4/172.18.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:11.795269Z","level":"INFO","message":"Local node is listening on \"/ip4/172.17.0.1/tcp/42079/p2p/12D3KooWL3dYbhftJqpPKXgXia3UfeJnc4Jht2oPWeKoMHocJ7AX\"","target":"sn_networking::event"}
{"timestamp":"2023-06-20T21:42:47.801631Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:43:16.804728Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:43:52.806788Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:44:15.808815Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:44:52.811693Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:45:13.815161Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:45:35.818037Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}
{"timestamp":"2023-06-20T21:46:11.821652Z","level":"WARN","message":"Not enough peers in the k-bucket to satisfy the request","target":"sn_networking"}

I’ll PR this.

The more elaborate data you have in the snippet you posted I think is actually because those are Open Telemetry traces, which are a bit of a different thing.

1 Like

had first failed download and it was due to running out of hard drive space so id say that was a successful run at it

Failed to create file "New Order - The Rest Of - Confusion [Pump Panel Reconstruction Mix].mp3" with error Os { code: 28, kind: StorageFull, message: "No space left on ^X^C
ubuntu@oracle:~$ df
Filesystem     1K-blocks      Used Available Use% Mounted on
udev             6075232         0   6075232   0% /dev
tmpfs            1222692      9612   1213080   1% /run
/dev/sda1      101425200 101408820         0 100% /
tmpfs            6113460         0   6113460   0% /dev/shm
tmpfs               5120         0      5120   0% /run/lock
tmpfs            6113460         0   6113460   0% /sys/fs/cgroup
/dev/loop1         50304     50304         0 100% /snap/core18/2788
/dev/loop0         50304     50304         0 100% /snap/core18/2755
/dev/loop2         60672     60672         0 100% /snap/core20/1895
/dev/loop3         60672     60672         0 100% /snap/core20/1953
/dev/loop4          6400      6400         0 100% /snap/curl/1758
/dev/loop5         94208     94208         0 100% /snap/lxd/23999
/dev/loop8         23296     23296         0 100% /snap/oracle-cloud-agent/55
/dev/loop9         23296     23296         0 100% /snap/oracle-cloud-agent/57
/dev/loop6          6400      6400         0 100% /snap/curl/1583
/dev/loop7         94208     94208         0 100% /snap/lxd/24065
/dev/loop10        47616     47616         0 100% /snap/snapd/19365
/dev/loop11        47616     47616         0 100% /snap/snapd/19459
/dev/sda15         99801       298     99503   1% /boot/efi
tmpfs            1222692         0   1222692   0% /run/user/1001

8 Likes

I try in a windows 10 machine. Install was ok but always the Outbound error on upload:

Storing file “lauren.txt” of 945 bytes…
Did not store file “lauren.txt” to all nodes in the close group! Network Error Outbound Error.

2 Likes

We previously had some LogMarkers, basically a type that could be used to programatically ensure we were matching the log string (and one place to have all those important log strings that may be useful). This is probably something we should bring back as it makes it clearer what are the ‘used’ log messages, by third parties and gives a bit of a more concrete place to tweak things there.


Hmm, what’s your git commit hash? I wonder if the Recordheader work (separating the address space for chunks/dbcs etc) might have come in here (as we’re always installing the latest v above, and that was added after we started the testnet). (cc @roland, what logs do we expect now? ).

Also worth noting, if we do have that, a put could come from DBCs now was well as chunks, and soon registers too. Records will be all of those things

6 Likes