Optimal firewall rules for port-forwarded nodes

I have port forwarding set up, and now I’d like to lock down my firewall.

What’s required to be open for nodes to have optimal network access? I’m asking because it appears, though I could be mistaken, that nodes require more than UDP to the ports that are being forwarded to the nodes. I’m seeing many HandshakeTimeOut errors in node logs, and much blocked traffic in to UDP ports outside of the range of ports that are being forwarded to nodes.

Only UDP is used.

The handshake errors happen at times, and unless its hundreds/thousands in short time then you’ll be fine if the nodes are earning

If you have a lot of other blocked traffic to other ports then maybe check you have port forwarding setup right in the router and in starting the nodes.

That doesn’t answer the question of whether or not UDP ports outside of the ports I’ve forwarded are used. Do you know?

Read the post, I go on to say if you have too much other traffic to ports then check over things. IMPLIES that it should not be happening

Yes a node only needs its listening port. Absolutely right and its only udp

The RPC & metrics ports are supposed to be local to the machine.

Notes to self, likely not helpful to anyone else.

Running with port forwarding and no firewall.

  1. Kept port forwarding enabled, disabled firewall rules.

  2. Started one node with port forwarding, ran it for five minutes, then stopped it.

  3. Node’s antnode.log is 562K in size.

  4. Phrase “Relay manager is disabled for this node.” in antnode.log, from line 573 in file autonomi/ant-networking/src/driver.rs of the source code, confirms node was running with port forwarding.

  5. Phrase “HandshakeTimedOut” found 246 times. (`grep -o ‘HandshakeTimedOut’ antnode.log | wc -l``)

  6. Word “error” (case insensitive) found 1,612 times. (grep -io 'error' antnode.log | wc -l)
    Example:
    [2025-02-10T08:24:03.999445Z WARN ant_networking::event::swarm 423] OutgoingConnectionError to PeerId(“12D3KooWRby5SLZ1UzShbTdSmFPQFPAo1SuRAYcf6XzibpBhNUYD”) on ConnectionId(532) - Transport([(/ip4/167.160.92.130/udp/7205/quic-v1/p2p/12D3KooWJK4KzXTZHwxNcYzFC7ntgxRf5WSErcRLmo4dNYbKGVko/p2p-circuit/p2p/12D3KooWRby5SLZ1UzShbTdSmFPQFPAo1SuRAYcf6XzibpBhNUYD, Other(Custom { kind: Other, error: Left(Left(Left(Connect(NoReservation)))) }))])

  7. Word “failed” (case insensitive) found 201 times. (grep -io 'failed' antnode.log | wc -l)
    Example:
    [2025-02-10T08:24:01.571930Z ERROR ant_node::node 465] Failed to dial /ip4/94.130.14.52/udp/52042/quic-v1/p2p/12D3KooWF9fYrUhUmVqbmoznEw12LVpm9tFerRXjHK3VbvLVeb2q: DialError(DialPeerConditionFalse(NotDialing))

  8. Accumulated 271 peers. (“now we have #271 connected peers”)
    Interestingly, failed to dials became peers: (?)
    [2025-02-10T08:24:01.592909Z ERROR ant_node::node 465] Failed to dial /ip4/173.234.27.250/udp/6946/quic-v1/p2p/12D3KooWETFyNNzDynNU18inPRMG1Qx6NJsSYwgP1RmzhSz1q7WY: DialError(DialPeerConditionFalse(NotDialing))
    [2025-02-10T08:24:01.744421Z INFO ant_networking::event 238] New peer added to routing table: PeerId(“12D3KooWETFyNNzDynNU18inPRMG1Qx6NJsSYwgP1RmzhSz1q7WY”), now we have #5 connected peers

General note, why do nodes listen on localhost?
Example:
[2025-02-10T08:24:01.566884Z INFO ant_networking::event::swarm 319] Local node is listening ListenerId(1) on /ip4/127.0.0.1/udp/41000/quic-v1
[2025-02-10T08:24:01.566982Z INFO ant_networking::event::swarm 319] Local node is listening ListenerId(1) on /ip4/420.420.420.420/udp/41000/quic-v1

Running with port forwarding and firewall.

  1. Kept port forwarding enabled, enabled firewall rules. All blocks were logged, and none referenced the interface the node ran on, in or out.

  2. Started one node with port forwarding, ran it for five minutes, then stopped it.

  3. Node’s antnode.log is 940K in size.

  4. Phrase “Relay manager is disabled for this node.” in antnode.log, from line 573 in file autonomi/ant-networking/src/driver.rs of the source code, confirms node was running with port forwarding.

  5. Phrase “HandshakeTimedOut” found 1,155 times. (`grep -o ‘HandshakeTimedOut’ antnode.log | wc -l``)

  6. Word “error” (case insensitive) found 6,069 times. (grep -io 'error' antnode.log | wc -l)

  7. Word “failed” (case insensitive) found 201 times. (grep -io 'failed' antnode.log | wc -l)

  8. Accumulated 293 peers. (“now we have #293 connected peers”)

General note, why is “failed” found 201 times in both scenarios?

You will have to do this test a number of times since the handshake errors are very dependent on the particular peers that node is connecting to and also affected by the random errors in UDP comms

1 Like

Do you know why nodes listen on localhost?

I also observe that nodes listen on, and send traffic on, every interface. (IP address) It would be really good to be able to control that. Any insight?

They listen on that port for any other node/client that wants to contact them unsolicited. The node tells the peers it contacts that its listening port is that, makes up part of its global network address. Each node has its own unique global address made up of IP Port and peerID

A node will contact other nodes on their IP:port and give to your node a reply address using their listening IP:port.

Incoming packets will have the sender’s IP:port being sent to your node’s IP:listen port

Thus the sending is to the other node’s port

The node listens on all interfaces by default or only specific ones if specified by the user on the cli

1 Like

I’ve looked through antctl’s available CLI arguments for add and start but haven’t seen it. How’s it done?

Edit: Found --ip on antnode, and from there antctl add --node-ip, thank you!

1 Like

If you get --ip or --node-ip to work well then tell me.

The node would not function (communicate) and I tried many forms from the actual IP address of the NIC in the PC to the gateway IP to the network (eg 192.168.1.0/24) and a few other guesses. Only using the Nic IP would get the node appearing to function, but no real communications.

I was waiting till after TGE launch to raise this as a support issue. In the mean time I set the default gateway in my PC so that the 2nd nic can operate on a local LAN. I do this for my starlink SBCs

sudo ip route replace default via 192.168.0.1
need to execute this on each bootup

The 192.168.0.1 would be the starlink gateway (ie router IP address)

Also for the 2nd nic it is connected to the local LAN and the local LAN router (not starlink’s router) has the DHCP give out a static lease and that nic’s options having the gateway set to the 2nd nic ip static address. IE nic has a MAC address, and dhcp has for that MAC an address of EG 192.168.2.101 and DHCP option 3 as 192.168.2.101. This prevents the local lan trying to be used for internet as well.

So the internet is only accessible via the starlink router and the nodes will use that. The SBC can then access the NAS devices etc on the local LAN via the other NIC and not try to use that NIC for internet.

2 Likes

Could you characterize that more specifically? What communication wasn’t taking place that I could look for and report back if it’s any different for me? For example, were there no log messages created? No network traffic? No significant CPU usage?

No network traffic after initial burst.

I was moving through the different settings relatively quickly and when seen as not working I moved on. wasn’t particularly interested in documenting just how far it got. Sorry.

Basically if you get records and good traffic through the NIC then that’d be more than I got after a few minutes

I went for the default gateway method as it was a lot easier and reliable

Ok noted. How long does it typically take for nodes you start to receive records?

5 minutes typically will see some records. Sometimes longer and often quicker

1 Like

As @neo says: records very quickly, within minutes. They will be records your node is responsible for that were already on the network

But records that you are paid for that are new to the network could take hours or days for an individual node to get any. Or minutes. Start 100 nodes and you should see some after a couple of hours at the current rate.

3 Likes

To be clear, that’s records my nodes are paid for, correct?

Yes, with the current network at the moment I think you’d get some earnings from at least a couple of nodes out of 100 within an hour.

2 Likes

That was with the old network of course. With the new one we don’t know how big it will be or how many uploads will run or when they’ll start. Rule of thumb: When someone with a similar number of nodes as you is reporting earnings you should be getting them as well.

1 Like

I’m experimenting with several machines running nodes with different configs. Home net, port forwarding, without a firewall, and through a VPN with a firewall. (pf)

The only machine that hasn’t stored any records, and earned ANT, is the one with port forwarding through a VPN with a firewall. Two machines with port forwarding, no VPN intermediary, and no firewall have both stored records and earned ANT.

Investigating blocked traffic, I saw a great deal of interesting blocks that look like:

block in on vtnet0: 151.53.80.83.13092 > 103.36.7.190.55657: UDP.

103.36.7.190 is my VPN’s IP. I searched antnode logs and sure enough, 151.53.80.83 shows up in many “Failed to dial” errors.

The nodes are added with:

antctl add --count 500 --node-port 49000-49499 --metrics-port 33000-33499 --rewards-address … evm-arbitrum-one

Traffic from foreign nodes is being blocked because it’s not exclusively using the 49000-49499 range that I’m forwarding. (e.g., 55657) Is that expected behavior? Shouldn’t nodes on the network only attempt to communicate with port-forwarded peers using the ports they explicitly expose?

Another observation, from running antctl status on the various machines, this machine with no stored records and no earnings, reports half or more nodes having 0 connected peers. All nodes of all other machines report >0 connected peers.

FWIW, my VPN’s firewall rules are:

nat on vtnet0 inet from 103.36.7.190 to any → vtnet0

rdr pass on vtnet inet proto udp from any to 103.36.7.190 port 49000:49499 → 192.168.0.40 port 49000:*

block all

pass in quick on wg0 from 103.36.7.190 to any
pass out quick on wg0 from any to 103.36.7.190
pass out quick on vtnet0 all

I could remove the firewall rules, however my belief is that more secure node machines contributes to a more secure network overall. I’d appreciate any help getting this resolved so we can have known-good firewall rulesets to share.

Update: I removed the block all rule to observe network behavior, and see if peer connectivity would automatically be established. Predictably, that began to generate many “Limiting icmp unreach response …” system log messages, as the UDP traffic isn’t being blocked, but it’s still to ports that aren’t open. Next I’ll try forwarding everything to 192.168.0.40, and not a specific range.

1 Like