NAT solution with QOS

: preg_replace(): The /e modifier is deprecated, use preg_replace_callback instead in /var/www/virtual/rlogix/includes/unicode.inc on line 311.

15.10. Example of a full nat solution with QoS

Here I'm describing a common set up where we have lots of users in a private network connected to the Internet trough a Linux router with a public ip address that is doing network address translation (NAT). I use this QoS setup to give access to the Internet to 198 users in a university dorm, in which I live and I'm netadmin of. The users here do heavy use of peer to peer programs, so proper traffic control is a must. I hope this serves as a practical example for all interested lartc readers.

At first I make a practical approach with step by step configuration, and in the end I explain how to make the process automatic at bootime. The network to which this example applies is a private LAN connected to the Internet through a Linux router which has one public ip address. Extending it to several public ip address should be very easy, a couple of iptables rules should be added.
In order to get things working we need:

Linux 2.4.18 or higher kernel version installed

If you use 2.4.18 you will have to apply HTB patch available here.

iproute

Also ensure the "tc" binary is HTB ready, a precompiled binary is distributed with HTB.

iptables

Let's begin optimizing that scarce bandwidth

First we set up some qdiscs in which we will classify the traffic. We create a htb qdisc with 6 classes with ascending priority. Then we have classes that will always get allocated rate, but can use the unused bandwidth that other classes don't need. Recall that classes with higher priority ( i.e with a lower prio number ) will get excess of bandwith allocated first. Our connection is 2Mb down 300kbits/s up Adsl. I use 240kbit/s as ceil rate just because it's the higher I can set it before latency starts to grow, due to buffer filling in whatever place between us and remote hosts. This parameter should be timed experimentally, raising and lowering it while observing latency between some near hosts.

Adjust CEIL to 75% of your upstream bandwith limit by now, and where I use eth0, you should use the interface which has a public Internet address. To begin our example execute the following in a root shell:

CEIL=240
tc qdisc add dev eth0 root handle 1: htb default 15
tc class add dev eth0 parent 1: classid 1:1 htb rate ${CEIL}kbit ceil ${CEIL}kbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 80kbit ceil 80kbit prio 0
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 80kbit ceil ${CEIL}kbit prio 1
tc class add dev eth0 parent 1:1 classid 1:12 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:13 htb rate 20kbit ceil ${CEIL}kbit prio 2
tc class add dev eth0 parent 1:1 classid 1:14 htb rate 10kbit ceil ${CEIL}kbit prio 3
tc class add dev eth0 parent 1:1 classid 1:15 htb rate 30kbit ceil ${CEIL}kbit prio 3
tc qdisc add dev eth0 parent 1:12 handle 120: sfq perturb 10
tc qdisc add dev eth0 parent 1:13 handle 130: sfq perturb 10
tc qdisc add dev eth0 parent 1:14 handle 140: sfq perturb 10
tc qdisc add dev eth0 parent 1:15 handle 150: sfq perturb 10
			

We have just created a htb tree with one level depth. Something like this:

+---------+
| root 1: |
+---------+
     |
+---------------------------------------+
| class 1:1                             |
+---------------------------------------+
  |      |      |      |      |      |      
+----+ +----+ +----+ +----+ +----+ +----+
|1:10| |1:11| |1:12| |1:13| |1:14| |1:15| 
+----+ +----+ +----+ +----+ +----+ +----+ 
			

15.10.2. Classifying packets

We have created the qdisc setup but no packet classification has been made, so now all outgoing packets are going out in class 1:15 ( because we used: tc qdisc add dev eth0 root handle 1: htb default 15 ). Now we need to tell which packets go where. This is the most important part.

Now we set the filters so we can classify the packets with iptables. I really prefer to do it with iptables, because they are very flexible and you have packet count for each rule. Also with the RETURN target packets don't need to traverse all rules. We execute the following commands:

tc filter add dev eth0 parent 1:0 protocol ip prio 1 handle 1 fw classid 1:10
tc filter add dev eth0 parent 1:0 protocol ip prio 2 handle 2 fw classid 1:11
tc filter add dev eth0 parent 1:0 protocol ip prio 3 handle 3 fw classid 1:12
tc filter add dev eth0 parent 1:0 protocol ip prio 4 handle 4 fw classid 1:13
tc filter add dev eth0 parent 1:0 protocol ip prio 5 handle 5 fw classid 1:14
tc filter add dev eth0 parent 1:0 protocol ip prio 6 handle 6 fw classid 1:15
			

We have just told the kernel that packets that have a specific FWMARK value ( handle x fw ) go in the specified class ( classid x:x). Next you will see how to mark packets with iptables.

First you have to understand how packet traverse the filters with iptables:

        +------------+                +---------+               +-------------+
Packet -| PREROUTING |--- routing-----| FORWARD |-------+-------| POSTROUTING |- Packets
input   +------------+    decision    +-­-------+       |       +-------------+    out
                             |                          |
                        +-------+                    +--------+   
                        | INPUT |---- Local process -| OUTPUT |
                        +-------+                    +--------+

			

I assume you have all your tables created and with default policy ACCEPT ( -P ACCEPT ) if you haven't poked with iptables yet, It should be ok by default. Ours private network is a class B with address 172.17.0.0/16 and public ip is 212.170.21.172

Next we instruct the kernel to actually do NAT, so clients in the private network can start talking to the outside.

echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s 172.17.0.0/255.255.0.0 -o eth0 -j SNAT --to-source 212.170.21.172
			


Now check that packets are flowing through 1:15:

tc -s class show dev eth0
			


You can start marking packets adding rules to the PREROUTING chain in the mangle table.

iptables -t mangle -A PREROUTING -p icmp -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p icmp -j RETURN
			


Now you should be able to see packet count increasing when pinging from machines within the private network to some site on the Internet. Check packet count increasing in 1:10

tc -s class show dev eth0
			

We have done a -j RETURN so packets don't traverse all rules. Icmp packets won't match other rules below RETURN. Keep that in mind.
Now we can start adding more rules, lets do proper TOS handling:

iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Delay -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j MARK --set-mark 0x5
iptables -t mangle -A PREROUTING -m tos --tos Minimize-Cost -j RETURN
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j MARK --set-mark 0x6
iptables -t mangle -A PREROUTING -m tos --tos Maximize-Throughput -j RETURN
			

Now prioritize ssh packets:

iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j MARK --set-mark 0x1
iptables -t mangle -A PREROUTING -p tcp -m tcp --sport 22 -j RETURN
			

A good idea is to prioritize packets to begin tcp connections, those with SYN flag set:

iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j MARK --set-mark 0x1
iptables -t mangle -I PREROUTING -p tcp -m tcp --tcp-flags SYN,RST,ACK SYN -j RETURN
			

And so on.

When we are done adding rules to PREROUTING in mangle, we terminate the PREROUTING table with:

iptables -t mangle -A PREROUTING -j MARK --set-mark 0x6
			

So previously unmarked traffic goes in 1:15. In fact this last step is unnecessary since default class was 1:15, but I will mark them in order to be consistent with the whole setup, and furthermore it's useful to see the counter in that rule.

It will be a good idea to do the same in the OUTPUT rule, so repeat those commands with -A OUTPUT instead of PREROUTING. ( s/PREROUTING/OUTPUT/ ). Then traffic generated locally (on the Linux router) will also be classified. I finish OUTPUT chain with -j MARK --set-mark 0x3 so local traffic has higher priority.