流量控制概述
linux下通过tc traffic control 框架及系列实现和工具来实现对出口,甚至入口流量的控制,所谓的控制,就是进行包延迟传输,
丢包,包损坏,带宽限制,针对某个ip规则进行限制等等,来达到模拟网络异常状况,包优先级传输,或者更多功能;
从手册上看:主要提供一下几种控制:
SHAPING: 平滑突发流量,如限制传输速率,小于有效带宽,作用于出口
When traffic is shaped, its rate of transmission is under
control. Shaping may be more than lowering the available
bandwidth - it is also used to smooth out bursts in
traffic for better network behaviour. Shaping occurs on
egress.
SCHEDULING : 作用于出口,调度数据包的传输,比如优先级等
By scheduling the transmission of packets it is possible
to improve interactivity for traffic that needs it while
still guaranteeing bandwidth to bulk transfers. Reordering
is also called prioritizing, and happens only on egress.
POLICING: 作用于入口流量
Whereas shaping deals with transmission of traffic,
policing pertains to traffic arriving. Policing thus
occurs on ingress.
DROPPING: 当流量超过阈值,丢弃数据包,作用于入口和出口;
Traffic exceeding a set bandwidth may also be dropped
forthwith, both on ingress and on egress.
例子:
ref: https://netbeez.net/blog/how-to-use-the-linux-traffic-control/
1 | 查看:当前只有默认的先入先出规则 |
- 指令解释:
qdisc: modify the scheduler (aka queuing discipline) 即实际的使用是依赖的qidsc机制
add: add a new rule 添加一个排队规则
dev eth0: rules will be applied on device eth0 排队规则作用对象一般是网卡
root: modify the outbound traffic scheduler (aka known as the egress qdisc) 修改出口流量调度程序
netem: use the network emulator to emulate a WAN property 使用wan网络模拟器
delay: the network property that is modified
200ms: introduce delay of 200 ms
tc是系统如linux提供的用户层操作指令,这里用的是shell指令:
更多 https://man7.org/linux/man-pages/man8/tc.8.html
流量控制的基本实现原理
在linux内核中,流量控制用Qos实现,实际上使用了qdisc队列;主要是出口队列;(egress)
在链路层,每个数据包通过邻居子系统后,或者说离开协议栈后,都会由dev_queue_xmit(dev.c)来进一步调用相关设备驱动的发送函数
来发送出去; 而qdisc队列,和相关的排队规则即作用在dev_queue_xmit之后,设备驱动发送函数之前;
流量控制的实现和基本流程:
相关代码:
1 | sch_generic.c |
流程
在内核中的整体处理流程,及位置:
1 | net/core/dev.c |
流量控制的结构:
构成流量控制的基本元素有三种: 排队规则,类和过滤器
1 |
|
排队规则:
在启用了流量控制的情况下,每个网络设备至少会配置一个排队规则;排队规则包括简单的fifo缓冲和令牌桶等,而精确的排队规则通常需要管理多个队列;
常见的排队规则由 fifo,令牌桶tbf(token bucket filter)等;排队规则的分类:
排队规则至少有一个队列,可能简单,如fifo排队规则,也有复杂如令牌桶;通常排队规则分无类和有类两种,无类规则简单,内部不能包含可配置的子类及内部规则
而有类则可包含多个类,如上图,且每个类又可以包含一个排队规则,这里的排队规则叫内部规则,可以是有类和无类的;
无类规则不可被用户配置,而有类的可以;1
2
3分为可分类的qdisc和不可分类的qdisc实现:
不可分类:pfifo ,pfifo_fast,red,sfq,tbf
可分类:cbq,htb,prio如默认:
1
qdisc pfifo_fast 0: dev eth0 root refcnt 2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
在tc指令中,如下的,其中结尾的 qdisc [qdisc specific parameters] 就是指定具体的排队规则类型;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183创建qdisc 规则
tc [ OPTIONS ] qdisc [ add | change | replace | link | delete ]
dev DEV [ parent qdisc-id | root ] [ handle qdisc-id ] [
ingress_block BLOCK_INDEX ] [ egress_block BLOCK_INDEX ] qdisc [
qdisc specific parameters ]
eg:
目前支持的qdisc: 无类的:
The classless qdiscs are:
choke CHOKe (CHOose and Keep for responsive flows, CHOose and
Kill for unresponsive flows) is a classless qdisc designed
to both identify and penalize flows that monopolize the
queue. CHOKe is a variation of RED, and the configuration
is similar to RED.
codel CoDel (pronounced "coddle") is an adaptive "no-knobs"
active queue management algorithm (AQM) scheme that was
developed to address the shortcomings of RED and its
variants.
[p|b]fifo
Simplest usable qdisc, pure First In, First Out behaviour.
Limited in packets or in bytes.
fq Fair Queue Scheduler realises TCP pacing and scales to
millions of concurrent flows per qdisc.
fq_codel
Fair Queuing Controlled Delay is queuing discipline that
combines Fair Queuing with the CoDel AQM scheme. FQ_Codel
uses a stochastic model to classify incoming packets into
different flows and is used to provide a fair share of the
bandwidth to all the flows using the queue. Each such flow
is managed by the CoDel queuing discipline. Reordering
within a flow is avoided since Codel internally uses a
FIFO queue.
fq_pie FQ-PIE (Flow Queuing with Proportional Integral controller
Enhanced) is a queuing discipline that combines Flow
Queuing with the PIE AQM scheme. FQ-PIE uses a Jenkins
hash function to classify incoming packets into different
flows and is used to provide a fair share of the bandwidth
to all the flows using the qdisc. Each such flow is
managed by the PIE algorithm.
gred Generalized Random Early Detection combines multiple RED
queues in order to achieve multiple drop priorities. This
is required to realize Assured Forwarding (RFC 2597).
hhf Heavy-Hitter Filter differentiates between small flows and
the opposite, heavy-hitters. The goal is to catch the
heavy-hitters and move them to a separate queue with less
priority so that bulk traffic does not affect the latency
of critical traffic.
ingress
This is a special qdisc as it applies to incoming traffic
on an interface, allowing for it to be filtered and
policed.
mqprio The Multiqueue Priority Qdisc is a simple queuing
discipline that allows mapping traffic flows to hardware
queue ranges using priorities and a configurable priority
to traffic class mapping. A traffic class in this context
is a set of contiguous qdisc classes which map 1:1 to a
set of hardware exposed queues.
multiq Multiqueue is a qdisc optimized for devices with multiple
Tx queues. It has been added for hardware that wishes to
avoid head-of-line blocking. It will cycle though the
bands and verify that the hardware queue associated with
the band is not stopped prior to dequeuing a packet.
netem Network Emulator is an enhancement of the Linux traffic
control facilities that allow to add delay, packet loss,
duplication and more other characteristics to packets
outgoing from a selected network interface.
pfifo_fast
Standard qdisc for 'Advanced Router' enabled kernels.
Consists of a three-band queue which honors Type of
Service flags, as well as the priority that may be
assigned to a packet.
pie Proportional Integral controller-Enhanced (PIE) is a
control theoretic active queue management scheme. It is
based on the proportional integral controller but aims to
control delay.
red Random Early Detection simulates physical congestion by
randomly dropping packets when nearing configured
bandwidth allocation. Well suited to very large bandwidth
applications.
rr Round-Robin qdisc with support for multiqueue network
devices. Removed from Linux since kernel version 2.6.27.
sfb Stochastic Fair Blue is a classless qdisc to manage
congestion based on packet loss and link utilization
history while trying to prevent non-responsive flows (i.e.
flows that do not react to congestion marking or dropped
packets) from impacting performance of responsive flows.
Unlike RED, where the marking probability has to be
configured, BLUE tries to determine the ideal marking
probability automatically.
sfq Stochastic Fairness Queueing reorders queued traffic so
each 'session' gets to send a packet in turn.
tbf The Token Bucket Filter is suited for slowing traffic down
to a precisely configured rate. Scales well to large
bandwidths.
无类的,在添加规则时需要注意:
In the absence of classful qdiscs, classless qdiscs can only be
attached at the root of a device. Full syntax:
tc qdisc add dev DEV root QDISC QDISC-PARAMETERS
To remove, issue
tc qdisc del dev DEV root
The pfifo_fast qdisc is the automatic default in the absence of a
configured qdisc.
有类的:
ATM Map flows to virtual circuits of an underlying
asynchronous transfer mode device.
CBQ Class Based Queueing implements a rich linksharing
hierarchy of classes. It contains shaping elements as
well as prioritizing capabilities. Shaping is performed
using link idle time calculations based on average packet
size and underlying link bandwidth. The latter may be ill-
defined for some interfaces.
DRR The Deficit Round Robin Scheduler is a more flexible
replacement for Stochastic Fairness Queuing. Unlike SFQ,
there are no built-in queues -- you need to add classes
and then set up filters to classify packets accordingly.
This can be useful e.g. for using RED qdiscs with
different settings for particular traffic. There is no
default class -- if a packet cannot be classified, it is
dropped.
DSMARK Classify packets based on TOS field, change TOS field of
packets based on classification.
ETS The ETS qdisc is a queuing discipline that merges
functionality of PRIO and DRR qdiscs in one scheduler. ETS
makes it easy to configure a set of strict and bandwidth-
sharing bands to implement the transmission selection
described in 802.1Qaz.
HFSC Hierarchical Fair Service Curve guarantees precise
bandwidth and delay allocation for leaf classes and
allocates excess bandwidth fairly. Unlike HTB, it makes
use of packet dropping to achieve low delays which
interactive sessions benefit from.
HTB The Hierarchy Token Bucket implements a rich linksharing
hierarchy of classes with an emphasis on conforming to
existing practices. HTB facilitates guaranteeing bandwidth
to classes, while also allowing specification of upper
limits to inter-class sharing. It contains shaping
elements, based on TBF and can prioritize classes.
PRIO The PRIO qdisc is a non-shaping container for a
configurable number of classes which are dequeued in
order. This allows for easy prioritization of traffic,
where lower classes are only able to send if higher ones
have no packets available. To facilitate configuration,
Type Of Service bits are honored by default.
QFQ Quick Fair Queueing is an O(1) scheduler that provides
near-optimal guarantees, and is the first to achieve that
goal with a constant cost also with respect to the number
of groups and the packet length. The QFQ algorithm has no
loops, and uses very simple instructions and data
structures that lend themselves very well to a hardware
implementation.
#创建规则:
tc qdisc add dev eth0 root handle 1:0 htb default 1
#添加一个tbf规则,绑定到eth0上,命名为1:0 ,默认归类为1
#handle:为规则命名或指定某规则排队规则在内核中的表示结构:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42描述排队规则的结构:
struct Qdisc {
int (*enqueue)(struct sk_buff *skb, --上面提到的两个函数
struct Qdisc *sch,
struct sk_buff **to_free);
struct sk_buff * (*dequeue)(struct Qdisc *sch);
unsigned int flags;
const struct Qdisc_ops *ops;//队列操作的接口,每个排队规则都必须实现该接口,如pfifo,tbf
struct qdisc_size_table __rcu *stab;
struct list_head list;
u32 handle; //和tc的handle对应 句柄,排队规则,类和过滤器都有一个32位的句柄标识;
u32 parent; //父句柄
void *u32_node;
struct netdev_queue *dev_queue;//和netdevice挂钩
struct sk_buff_head q;//队列当前的数据包数
..
}
struct Qdisc_ops {
struct Qdisc_ops *next;//用于链接已注册的各种排队规则的操作接口
const struct Qdisc_class_ops *cl_ops;//所在规则提供的类操作接口
char id[IFNAMSIZ];
int priv_size;
int (*enqueue)(struct sk_buff *skb, //将数据包加入排队规则的函数
struct Qdisc *sch,
struct sk_buff **to_free);
struct sk_buff * (*dequeue)(struct Qdisc *);
struct sk_buff * (*peek)(struct Qdisc *);
int (*init)(struct Qdisc *, struct nlattr *arg);//排队规则的初始化
void (*reset)(struct Qdisc *);
void (*destroy)(struct Qdisc *);
int (*change)(struct Qdisc *, struct nlattr *arg);
void (*attach)(struct Qdisc *);
int (*dump)(struct Qdisc *, struct sk_buff *);
int (*dump_stats)(struct Qdisc *, struct gnet_dump *);
struct module *owner;
};类:
类: 定义在排队规则中,报文通过过滤器,过滤,分配到不同的类中;排队规则可以没有类,如fifo先进先出,也可以有多个类
类中也可以有内部的排队规则,包被过滤器过滤为某个类后,在这个类中通过fifo的排队规则出去,或者其他规则,这里的规则就是内部规则;
创建类:1
2
3
4
5
6
7
8
9tc [ OPTIONS ] class [ add | change | replace | delete ] dev DEV
parent qdisc-id [ classid class-id ] qdisc [ qdisc specific
parameters ]
eg:
#创建分类
tc class add dev eth0 parent 1:0 classid 1:1 htb rate 10Mbit burst 15k
#为eth0下的root队列1:0添加一个分类并命名为1:1,类型为htb,带宽为10M类的表示:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25在linux中,以xxx_class来表示,如htb:
struct htb_class {
struct Qdisc_class_common common;
struct psched_ratecfg rate;
struct psched_ratecfg ceil;
s64 buffer, cbuffer;/* token bucket depth/rate */
s64 mbuffer; /* max wait time */
u32 prio; /* these two are used only by leaves... */
int quantum; /* but stored for parent-to-leaf return */
struct tcf_proto __rcu *filter_list; /* class attached filters */ 类的过滤器链
int filter_cnt;
int refcnt; /* usage count of this class */
int level; /* our level (see above) */
unsigned int children;
struct htb_class *parent; /* parent class */
struct gnet_stats_rate_est64 rate_est;
/*
* Written often fields
*/
struct gnet_stats_basic_packed bstats;
struct tc_htb_xstats xstats; /* our special stats */过滤器
过滤器: 具体的过滤规则,用来分类;包含若干个匹配条件,如果符合条件的包,被分类到具体的类中;
一个类至少有一个过滤器,可能有多个过滤器,
tc指令:
1 | tc [ OPTIONS ] filter [ add | change | replace | delete | get ] |
在内核中的结构:
1 | struct tcf_proto { |
qdisc的例子: pfifo ,ftb等
通过fifo学习如何实现一个规则;
默认情况下是pfifo,这个通过dev_open挂到设备上;如果需要其他的,通过tc后->netlink再操作到dev结构等上;
1 | struct Qdisc noop_qdisc = { |
tc工具的netlink接口
定义在sch_api.c,主要操作排队规则中的类和过滤器;
tc是通过netlink向内核通信,从而实现创建,修改qos等功能
本文只是给了一个流程和具体认知,通过本文来知道tc大致原理和框架,从而为进一步提供便利和查找依据;