Looks like we still have some memory leaks. After 12 days of running my 32 GB RAM nodes are barely running 100-130 nodes.
Its a problem I highlighted a while ago. I found that as I start up nodes they use more and more memory for each new one.
Also since I have over 100GB of memory the first nodes are using like 300-400MB, yet on my SBC with 8GB the nodes are only using between 100 to 200 MB (closer to 100MB) each
Back then we worked out the most likely reason the first nodes on the large machine were using 300 odd MB was that with more overall system RAM the initial allocation of RAM by the OS (or RUST startup) was higher, significantly higher.
The reason why the memory used increased when there were many nodes already running was never worked out. And something they were going to look into after everything else was sorted out. We are talking of first nodes around 300MB and by node 150 its 800MB
Then a few versions ago there was a bug where there was a real memory leak and occasionally a node would start using well over 1 GB of memory. I have not seen that bug for a while now
I know about those problems, but this is different. It was fine for few days after startup, then I think nodes grow on network activity peaks and never shrink back, but I donât have enough data to document it.
All the more reason to have a proper test schedule before TGE.
If all looks fine and then 3-4 weeks in we start running out of memory, its not going to look good at all.
In fact its going to look downright amateurish.
I have been running the safenodes since at least Nov 18th (400 nodes) on a 512GB RAM Alpine LXC.
Memory usage has stayed flat between 350GB to 365GB, and so has CPU & Network Traffic.
Note: However, I do switch out the default memory allocator within safenode binary, and switch to mimalloc
(GitHub - microsoft/mimalloc: mimalloc is a compact general purpose allocator with excellent performance.) for home use, by recompiling the source code off the production branch with each release (personal preference so far).
I also havenât updated to the latest production branch yet at home, though I should, but I havenât had time yet to do so.
memory leak in Rust? hmmm typically such malloc programming errors are a C (or C++) thingâŚ, I get the explicit memory allocation thing, just leaving it up to Rust or the OS, well both are piggish in default mode
Itâd be more like a subprocess not terminating itself and keeping the memory it was allocated
This is a huge surprise to me, I canât talk on RAM because I need to routinely restart nodes because of CPU creeping long before I would need to be concerned with RAM.
You should limit the amount of monitors to 3. The monitors are constantly fighting for consensus and having that many monitors causes significant cpu and networking overhead. It will get exponentially worse when the cluster is in trouble.
CephFS is not HA. When the MDS fails, all your cephfs filesystems will be down until the newly elected MDS has combed through the filesystem and loaded all metadata into ram.
Youâd be better of mounting rbd images. Theyâre automatically sort-of-thin-provisioned.
I disagree. If 3 of the 13 machines go down that contained the fixed monitors than there are no monitors running.
This is why there is more than 1 MDS, including one or more on standby.
HA as in in the sense as long the majority of machines remain up (with manager / metadata, and monitors), Ceph as as service will remain up including RBD and FS access (over the long run).
Obviously, if I am hit with a power outage and battery backups which donât last for ever, then there is no HA. The idea was to be able to horizontally scale capacity if needed and handle machine level failures, which is why I went Ceph.
Agreed 13 monitors is tooo many, 3 would give me sleepless nights 5 is the sweet spot on that few hosts - new releases of ceph will refuse to upgrade with 3 - you have to run it non-production mode.
Also best way to manage monâs is to use âtagsâ on hosts - you tag a host as âmonâ target, then you write a rule to say âmax 5 monâs on mon:targetâ then ceph will manage the placement dynamically for you.
same for managers, you need a master and a backup, and using tags is the way there againâŚ
never again
Sure, between 5 and 13, I am sure there is a sweet spot. Its easy to scale down, but I am on an older version at the moment and where they run need to be picked ahead off time. I also have to ensure enough run on different circuit breakers (physically), incase 1 circuit breaker trips and wipes out 1/3rd off the machines even with battery backup.
Thanks for the suggestion on the tags, however, not too worried about the current assignments of different daemons on different machines ahead of time.
The setup has been working well for me for many years.
Anyhow, lets stick to discussion on the different setups folks have for safenodes, as oppose to Ceph optimizations on this thread.
What is the current procedure for specifying an external drive for running a node?
I am on an older Intel E5 v4 CPUs, and am getting about 325 watts for 585 antnodes on 1 host, and 350 watts for 400 antnodes on another, though second host is identical in hardware and OS configuration, but is running a few other workloads too.
So basically, 325 / 585 = .55 watts per antnode (thats the lowest I have been able to get to), and pretty satisfied with those statistics at the moment given the CPU dates back a few years.
Just curious, if any folks are running on the newer 5nm CPUs such as the Ryzens or EYPC CPUs, whats the watt per antnode like on those systems?
Made a quick chart for power usage (watts) vs antnodes (# running) on a per host basis for home use.
Update:
Managed to make more tweaks and confirm steady state at 335 watts for ~600 nodes = .55 watts per node.
Though, without any antnodes and anything else running on the system, it consumes 235 watts as is already (I will need to look into disabling other features in the hardware to reduce this down a bit more (TBD)).
Basically, with antnodes running, the watts used jumps from 235 to 335 (+100 watts) to support the 600 antnodes:
Early on paper calcs on our x2 Geekom Intel CORE i7 gen12 lab NUCs suggest its possible in a 64GB RAM setup x4 512GB NVME SSDs to get below <.4 Watt per Antnode⌠running 288 nodes with a claimed peak draw of 110 Watts running Ubuntu Linux 24.04, without the onboard Intel GPU doing anything (running headless, SSH in config)
We are going to fire this config. up next week and see what is actually happening with an in wall Watt MeterâŚ, to see if that type of <.4 Watts/antnode level is achievable while keeping all nodes healthy/unshunned.
We are still working on some related LXC container provisioning and backup configuration/integration stuff for what is a 1GE connected to ISP router/modem/4port local 1GE switch accessed headless test build, plus some other ISP and IP address related stuff , which include our own âlow write ampâ in memory FTL LKMâŚ
its a journey, we will post results here as it happensâŚ
Very cool!
I am hitting about .49 watts now per antnode (345 watts for 700 nodes) and this is on an older CPU (14nm) than your target Intel i7 12th gen (10nm Enhanced SuperFIN aka Intel 7), so I suspect you should be able to hit that target!
Roughly 4-5% of my 700 nodes have a shunned count > 0 (no idea why yet), but keeping an eye on it.
So we are talking of 167mW to run a node on your system.
If we use that for a familyâs PC running 5 nodes its still under 1 watt. Or 1200 hours (50 days) per KWH. Even at todayâs prices that is like 1 token every 50 days to pay incremental costs on older h/w.
Now I am going to have to measure current going into my SBCâs when I get them up again (doing a lot of house rearrangements ATM) and see what the incremental increase is. The SBCs are also good for streaming services or general browsing/word processing etc so they do not need to be dedicated to node running, and likely run HA in addition to nodes. Seeing as they draw <7.5W running nodes I suspect they too will be low incremental wattage for 10 nodes running. If things improve then maybe 20 nodes for that power.