Can't get my mikrotik router to run anymore than 150 nodes

Hey my internet keeps cutting out/bogging down when I have 150 nodes running and I know my mikrotik rb5009 should be able to handle a lot more nodes I changed the udp-timeout and udp-stream-timeout to be 40 seconds as storage guy has said in the advanced routers thread. But IDK what else to do I upgraded my isp speed to 1200mbps dl 200mbps upload and it still bogs down and cuts out maybe someone will see this and have a solution?

Thanks

2 Likes

So, 150 nodes is allot, I think that’s in the region of 50k-100k sessions, that’s definitely more than “homelab” territory.

Each node is roughly 3 Mbit/s, but not much is happening… From playing on the previous testnets, it can go allot higher if you have a hot records, watch the GETS in vdash.

so 150 nodes is roughly 450 Mbit/s

Ideally you would need 500 symmetric (up/down)or greater for that number of nodes, and leaving something left over for basic internet use - 200mbps up, seems like not enough for 150 nodes, although my math could be wrong.

To stop the internet bogging down, I would Vlan off the nodes, and apply QOS at the Vlan level. Have separate Vlan’s for home and stuff, and give them a guaranteed 20% link burstable to 80% - you will probably find some of your nodes get shunned though…

In my view, 80 - 100 nodes feels like a good place for that link, have you tried reducing the numbers by 10, and see at what point the internet feels stable ?

2 Likes

Ok thank you for the tips :slight_smile:

The thing is I upgraded my ISP modem yesterday because I upgraded my plan from 800/100 to 1200/200mbps and on the “worse” modem 150 nodes was actually somewhat stable the internet was bogged down yes but I was able to watch movies and use the internet with the new ISP modem (x8 they call it by xfinity) the internet is completely unusable with 150 nodes. I also have the modem in bridge mode so everything should be running through the mikrotik.

But as far as doing a Vlan off the nodes and apply Qos I’m not really sure how to do that can you explain or is there a tutorial anywhere on how to set that up?

Thanks

Remember that the lower of the 2 figures down/up determines the bandwidth available. Nodes have an approximate equal up and down b/w requirements.

I thought you had 1200/1200, but since its only 200 up then 150 is pushing it honestly. Especially when the nodes are fuller since GETS will exceed PUTS and GETS are upload. So if you can run 100 nodes today with the bandwidth you won’t be able to run near 100 when the network gets going and people are downloading stuff.

At the most idle time I am seeing 0.3Mbps per node but any node can go up to 1 or 1.5Mbps and dunno what it spikes to when a GET is done on a 1/2MB chunk since only tiny chunks are being uploaded at this time.

As @Jadkin said the connections for 150 nodes can be very high. Like over 75,000 simultaneous connections then there are the connections still waiting to time out. The connections waiting to time out is what the settings were to help reduce.

The people running a 1000 nodes are using things like pfsense where they can set the connection limits to over 1,000,000.

I have yet to explore the rb5009 to see if the nat table (connections) can be increased beyond its current setting.

And segmenting the network is the way to go helping to tame your nodes a little from hogging every bit of connections/bandwidth. Typically they will survive, unless you have too many nodes then they will cascade into failure (ie shunned by too many)

BTW did you look at the specs and note what the routing ability of the rb5009 is? I have only got 40Mbps upload speed so that is why I selected it since it will be way more than I can use. But at 200Mbps, it won’t be as terrific. That routing packets per second determines the absolute maximum number of packets it can route and nodes is asking the router to route packets through it.

[EDIT 2] For large packets you’ll be fine, but once the packets are small like nodes send/receive all the time in large quantities then the absolute maximum tested rate is around 400-500 Mbps with 25 simple queues. 150 nodes will mean its even slower. So you maybe are exceeding the capabilities of the rb5009 and be better going to commercial grade router or get someone to build you a pfsense box to use.

If your nodes are asking the router to route at its maximum then other devices will be suffering. Again the bridge/vlan of the nodes off to its own network will help with this.

The RB5009 says it has a limit of over a million entries in its NAT table:-

[admin@MikroTik] /ip/firewall/nat> /ip/firewall/connection/tracking/print
                   enabled: auto
               active-ipv4: yes
               active-ipv6: yes
      tcp-syn-sent-timeout: 5s
  tcp-syn-received-timeout: 5s
   tcp-established-timeout: 1d
      tcp-fin-wait-timeout: 10s
    tcp-close-wait-timeout: 10s
      tcp-last-ack-timeout: 10s
     tcp-time-wait-timeout: 10s
         tcp-close-timeout: 10s
   tcp-max-retrans-timeout: 5m
       tcp-unacked-timeout: 5m
        loose-tcp-tracking: yes
               udp-timeout: 40s
        udp-stream-timeout: 40s
              icmp-timeout: 10s
           generic-timeout: 10m
               max-entries: 1015808
             total-entries: 5866

But I suspect it would fold up before that and that even if it could handle that many the number of packets with that number of safenode connections would have overwhelmed its routing ability.

But anyway, I think @neo is completely right that it’s your limit of 200Mb/s upload that is the limiting factor. There are spikes in node activity that will be trying to be above that with 150 nodes.

This is what the port looks like on the RB5009 for the RPi4 with 10 nodes on it:-

Spiky.

If I were running a few more the spikes would be over the upload limit and safenodes would suffer. Before that general internet use in the house for things that involve a lot of sending like Zoom or Teams for work for me and the missus and I’d be in the doghouse.

@moderators Suggest moving this topic to

Then that should not be the problem. So i would say its the routing performance for small packets with so many local nodes queueing up very small packets to send (messages are small) and this below

As I said I chose that router because my upload speed would never allow me to get close to exceeding its abilities. But that routing speed on small packets is what catches many people dazzled by the other headline metrics

ah alright so I’m really limited here by my upload speed then, so would vlan and or bridging really help my situation or am I really just kind of stuck bc of my upload speed?

From what others said bridging (whatever this is in context) or vlan would help.

It would allow the rest of the house to be set up to have priority over your nodes. This makes their computers/tv be more responsive even when using a lot of the link or most of small packets router capabilities,

How much it helps is unknown, but should be noticeable.

As an aside I think people in general need to tone down their expectations of how many nodes they can run without having issues. I love pushing technology limits and have done so for over 5 decades and even I have to admit that what I thought I could do is not really feasible, especially if I want quality internet for other things.

A certain portion of the troubles we have to help with is this expectation that the internet will work perfectly till that magical point where it doesn’t and they can just use upto that magic point or even sit at it. But the reality is that the quality degrades when you approach the 70-80% of the packet switching and bandwidth of the interconnect connection (Internet connection)

We always aim for never using more than 50% of expected capacity when doing specs then when getting to the 75% after some growth upgrading becomes a priority. Planning is started before that

2 Likes