Vdash - dashboard for Autonomi nodes

Super! Thank you. RAM is still missing. Not a disaster as it’s easy to see in ‘top’ etc.

1 Like

Did someone, maybe you Rob (@neo) say that RAM was still in the logs. I can’t find it so if it is there please someone post a copy of the logfile entry.

1 Like

Actually that was what to search for.

It is still in the /metrics though

1 Like
{"physical_cpu_threads":4,"system_cpu_usage_percent":6.2774954,"process":{"cpu_usage_percent":0.67681897,"memory_used_mb":115,"bytes_read":0,"bytes_written":8192,"total_mb_read":584,"total_mb_written":907}}

This line I believe has been dropped from being recorded on disk for production binaries.

The data for antnode pid (cpu & mem) & libp2p bandwidth metrics are available at /metrics endpoint.

In addition, future release will increase the decimal precision from 0 to 4 for the cpu & memory counters found at /metrics endpoint.

3 Likes

It would be great (in my opinion) if these log entries can be kept in the code and available to be included with a custom build.

It will be a shame if this rich debug info is lost over time because it is not included in the production builds.

1 Like

Thanks @Shu, I appreciate you taking the time. The idea of vdash is to only use the logs as a simpler alternative to the endpoint.

I always anticipated it would eventually be superceded but for the time being people are still finding vdash an easy way to monitor nodes outside launchpad so I’m trying to keep it showing whatever it can from the logs.

If there’s a way to keep those performance metrics available, and ideally also the storage cost, that would I expect be appreciated by die-hard vdash users! Maybe gated by an environment setting on antnode?

4 Likes

The particular logging line was consuming 20% of all logging activity produced per antnode, and with it CPU cycles.

Along with it being a metric that really should be consumed from /metric endpoint was why it was decided to be removed. I am not really sure for how long the team wants to maintain backwards compatibility here as the long term goal was to reduce logging activity and shift certain data points into concrete accessible metrics via /metrics.

@roland would need to comment here if running non production binaries, the log shows up say if running debug builds? I haven’t confirmed this.

I think the team in general is trying to reduce the number of feature flags supported as well, and turn them into runtime settings, if still required.

3 Likes

There’s no need for it to be a frequent output so the system load shouldn’t be an issue if there’s a suitably infrequent output it can be coupled to.

I’m not suggesting a feature flag but an environment setting that ideally would work with production binary at least for the time being. The point here is an alternative to the metrics endpoint since that isn’t available easily outside launchpad. Being able to run a CLI like vdash still has a role and until a similar but metrics based alternative is created I think it is useful to maintain.

Someone could fork vdash to use the endpoint but I have other priorities so unlikely.

BTW Does/will metrics include storecost via antnode?

3 Likes

I can’t find the post on the forum, but a reason was given why storecost metric has disappeared (not in use) by Qi to @neo , as neo asked same question a while back to me.

/metrics included storecost but its not populated as its no longer in use.

1 Like

My own feeling is logs will die out in production, but I am more than happy to help with a switch here @happybeing and also logs can still be available via an env var, but mostly for debugging. So we can try and accommodate wherever possible, but my desire is we get dash to use metrics if possible and also in the python modules I was able to real time read the routing table entries and much more. So we can also make that available as well. We won’t leave dash behind, we should evolve it with the codebase.

It’s a busy time right now but we will get there. Vdash should have a long term lifetime IMO, it’s been great for folk here for sure. However the logs do need to go back into debug land, as you seem to be saying as well or at least expecting. Another thing I want to make happen is that the ant-node library is available. So it should be easier for dash to also run nodes direct, like antctl etc.

Anyway, we are getting close to being very focused on devs, better late than never, but I am very keen we get involved and help cut over not only dash, but to the new data types that will help other projects like awe/jams etc. We do need to get to that place quickly.

9 Likes

The logs still provide a lot of useful info. If all that was in the /metrics then all good, but one thing the logs give is a historical line of what happened. Is the historical needed? Well only when i was doing some stats on things like contacted node xor addresses and other such things. Also my contacting a node to see if its alive and could give a quote used logs from the client as well.

@Shu I can understand trying to reduce the feature flags and its a good thing. Although there are classes of features. Like debug, logs are one sort and then other node functional features are another. Debug type vs Functional types

Debug types of “functions” are separate to functional types. Having them as a flag for build allows the code to be there only when required for testing or as a special binary used for network testing/monitoring.

@happybeing The /metrics is just reading a file. You execute a curl to the node (127.0.0.1:metrics-port/metrics) and store the results in a file then read it or process the curl output directly. At the moment you continue reading the log file, whereas metrics means an extra step to execute the curl then read once the metrics file. My script greps the ^sn_ (well now ^ant_)

Although one downside is that the person must start the node with metrics enabled

2 Likes

I have never disagreed with that, but I fear people do not understand the cost of logs. They are very expensive. So the issue is how much are we prepared to pay for logs, they do mean nodes are less efficient, disk space is used and possibly security information leaked.

As a debug tool then great, but as a production thing then I really dislike them a. lot. IF folk want to use RUST_LOG And run nodes with it on then it’s fine, but I would imagine they get shunned much more quickly. That cost must be paid for sure

5 Likes

Agree with that. Having a build flag solves that

1 Like

@neo - my initial concern is the type of data being logged vs going to /metrics. Some is actually metrics and measurement key/value pairs, and some is un-formatted and unstructured lines. I do not think we need to duplicate structured data in both places especially for metrics and measurements between logs and /metrics endpoints for production.

I only stated these are not found on production binaries, but a debug build will likely still have them (to be confirmed by Roland).

As for the time series, and historical needs, other tools should be used to process and capture the data from /metrics to store and forward it. Taking shortcuts to put all that data from /metrics and log it at certain frequency in logs for production binaries is definitely not the right route in my opinion.

I agree with David’s response above, while logs has its purpose, it should be used sparingly and appropriately.

3 Likes

And I also agree and why I only ask they are not removed but in a “feature flag” so they can be built. Just as long as the log entries are not removed completely unless they make no sense anymore.

Wasn’t there another “/” to get some unformatted data like port numbers etc? I forget the URL for it. Maybe the routing table could be included in that, be good for statics and something I get from the logs currently.

While I am not arguing for anything here, just noting that polling /metrics only gives snapshots unless one is polling with a very small interval. Logs give the event when it happens and thus all changes are captured. Also more processing to continually poll /metrics just to capture changes in case any changes happen.

The build flag to allow logs solves that anyhow if required.

The current routing table and couple of other things would be enough for my needs at the moment. I am sure I am forgetting something, but since this is a longer term thing then I can ask later.

1 Like

I think we will have better ways of doing that, at least I hope so. i.e. real time query of the routing table.

As I see the discussion it seems to me there re some things handy to expose for some different reasons or apps. Good to see everyone poking though. The issue will always be humans want more stats though :smiley: :smiley:

2 Likes

Engineers gobble up stats like its heaven’s manna

1 Like

Short term that’s not the best approach - long term maybe.

Short term we don’t want people having to do custom builds for the convenience of using vdash, one cancels out the benefit of the other (hence my suggesting this be controlled by an environment variable).

Should vdash be migrated to support metrics then it would no longer be and issue (and it could still maintain the option of logs if that was deemed worthwhile).

5 Likes

Sorry, yes I was thinking the future of it all.

And I agree that the writing is on the wall that vdash will need to include reading /metrics Maybe a hybrid of /metrics and logs if need be. I’d add metrics and see if it matches logs as an interim step which will either show up bugs or confirm its right.

2 Likes