內(nèi)核版本:2.6.34
實現(xiàn)思路:
報文在網(wǎng)絡(luò)協(xié)議棧中的流動,對于接收來講,是對報文的脫殼的過程,由于報文是已知的輸入,只要逐個解析協(xié)議號;對于發(fā)送來講,是各層發(fā)送函數(shù)
的嵌套調(diào)用,由于沒有已知的輸入,只能按事先設(shè)計好的協(xié)議進行層層構(gòu)造。但無論報文怎樣的流動,核心是報文所在設(shè)備(skb->dev)的變化,相
當(dāng)于各層之間傳遞的交接棒。
按照上述思路,brcm協(xié)議接收的處理作為模塊brcm_packet_type加入到ptype_base中就可以了;brcm協(xié)議發(fā)送的處理則復(fù)雜一
點,發(fā)送的嵌套調(diào)用完全是依賴于設(shè)備來推動的,因此要有一種新創(chuàng)建的設(shè)備X,插入到vlan設(shè)備和網(wǎng)卡設(shè)備之間。
因此,至少要有brcm_packet_type來加入ptype_base和register_brcm_dev()來向系統(tǒng)注冊設(shè)備X。進一步考慮,
設(shè)備X在全局量init_net中有存儲,但我們還需要知道設(shè)備X與vlan設(shè)備以及網(wǎng)卡設(shè)備是何種組織關(guān)系,所以在這里設(shè)計了
brcm_group_hash來存儲這種關(guān)系。為了對設(shè)備感興趣的事件作出響應(yīng),添加自己的notifier到netdev_chain中。另外,為了
用戶空間具有一定控制能力(如創(chuàng)建、刪除),還需要添加brcm相關(guān)的ioctl調(diào)用。為了讓它看起來更完整,一種新的設(shè)備在proc中也應(yīng)有對應(yīng)項,用
來調(diào)試和查看設(shè)備。
從最簡單開始
要讓網(wǎng)絡(luò)協(xié)議棧能夠接收一種新協(xié)議是很簡單的,由于已經(jīng)有報文作為輸入,我們要做的僅僅是編寫好brcm_packet_type,然后在注冊模塊時只用做一件事:dev_add_pack。
- static int __init brcm_proto_init(void)
- {
- dev_add_pack(&brcm_packet_type);
- }
-
- static struct packet_type brcm_packet_type __read_mostly = {
- .type = cpu_to_be16(ETH_P_BRCM),
- .func = brcm_skb_recv,
- };
-
- int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *ptype, struct net_device *orig_dev)
- {
- struct brcm_hdr *bhdr;
- struct brcm_rx_stats *rx_stats;
-
- skb = skb_share_check(skb, GFP_ATOMIC);
- if(!skb)
- goto err_free;
- bhdr = (struct brcm_hdr *)skb->data;
-
- rcu_read_lock();
- skb_pull_rcsum(skb, BRCM_HLEN);
-
- skb->protocol = bhdr->brcm_encapsulated_proto;
-
- skb = brcm_check_reorder_header(skb);
- if (!skb)
- goto err_unlock;
-
- netif_rx(skb);
- rcu_read_unlock();
- return NET_RX_SUCCESS;
-
- err_unlock:
- rcu_read_unlock();
-
- err_free:
- kfree_skb(skb);
- return NET_RX_DROP;
- }
注冊這個模塊后,協(xié)議棧就能正常接收帶brcm報頭的報文的,代碼中ETH_P_BRCM是brcm的協(xié)議號,BRCM_HLEN是brcm的報頭長度。正是由于有報文作為輸入,接收變得十分簡單。
但這僅僅是能接收而已,發(fā)送的報文還是不帶brcm報頭的,而且接收的這段代碼也很粗略,沒有變更skb的設(shè)備,沒有記錄流量,沒有對brcm報頭作有意義的處理,下面逐一進行添加。
設(shè)備的相關(guān)定義
一種設(shè)備就是net_device類型,而每種設(shè)備都有自己的私有變量,它存儲在net_device末尾,定義如下,其中real_dev指向下層設(shè)備,這是最基本屬性,其余可以視需要自己設(shè)定,brcm_rx_stats則是該設(shè)備接收流量統(tǒng)計:
- struct brcm_dev_info{
- struct net_device *real_dev;
- u16 brcm_port;
- unsigned char real_dev_addr[ETH_ALEN];
- struct proc_dir_entry *dent;
- struct brcm_rx_stats __percpu *brcm_rx_stats;
- };
- struct brcm_rx_stats {
- unsigned long rx_packets;
- unsigned long rx_bytes;
- unsigned long multicast;
- unsigned long rx_errors;
- };
設(shè)備間的關(guān)系問題
如果brcm僅僅是只有一個設(shè)備,則無需數(shù)據(jù)結(jié)構(gòu)來存儲這種關(guān)系,一個全局全變的brcm_dev就可以了。這里的設(shè)計考慮的是復(fù)雜的情況,可以存在多個
下層設(shè)備,多個brcm設(shè)備,之間沒有固定的關(guān)系。所以需要一種數(shù)據(jù)結(jié)構(gòu)來存儲這種關(guān)系- brcm_group_hash。下面是一個簡單的圖示:

各個數(shù)據(jù)結(jié)構(gòu)定義如下:
- static struct hlist_head brcm_group_hash[BRCM_GRP_HASH_SIZE];
- struct brcm_group {
- struct hlist_node hlist;
- struct net_device *real_dev;
- int nr_ports;
- int killall;
- struct net_device *brcm_devices_array[BRCM_GROUP_ARRAY_LEN];
- struct rcu_head rcu;
- };
brcm_group_hash作為全局變量存在,以hash表形式組織,brcm_group被插入到brcm_group_hash
中,brcm_group存儲了它與下層設(shè)備的關(guān)系(eth與brcm),real_dev指向e下層設(shè)備,而brcm設(shè)備則存儲在
brcm_devices_array數(shù)組中。
下面完成由下層設(shè)備轉(zhuǎn)換成brcm設(shè)備的函數(shù),brcm_port是報頭中的值,可以自己設(shè)定它的含義,這里設(shè)定它表示報文來自于哪個端口。
- struct net_device *find_brcm_dev(struct net_device *real_dev, u16 brcm_port)
- {
- struct brcm_group *grp = brcm_find_group(real_dev);
- if (grp)
- brcm_dev = grp->brcm_devices_array[brcm_port];
- return NULL;
- }
因為在接收報文時,報文到達brcm層開始處理時,skb->dev指向的仍是下層設(shè)備,這時通過skb->dev查到
brcm_group->real_dev相匹配的hash項,然后通過報文brcm報頭的信息,確定
brcm_group->brcm_devices_array中哪個brcm設(shè)備作為skb的新設(shè)備;
而在發(fā)送報文時,報文到達brcm層開始處理時,skb->dev指向的是brcm設(shè)備,為了繼續(xù)向下傳遞,需要變更為它的下層設(shè)備,在設(shè)備數(shù)據(jù)
net_device的私有數(shù)據(jù)部分,一般會存儲一個指針,指向它的下層設(shè)備,因此skb->dev只要變更為
brcm_dev_info(dev)->real_dev。
流量統(tǒng)計
在數(shù)據(jù)結(jié)構(gòu)中,brcm設(shè)備的私有數(shù)據(jù)brcm_dev_info中brcm_rx_stats記錄接收的流量信息;而dev->_tx[index]則會記錄發(fā)送的流量信息。
在接收函數(shù)brcm_skb_rcv()中對于成功接收的報文會增加流量統(tǒng)計:
- rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats,
- smp_processor_id());
- rx_stats->rx_packets++;
- rx_stats->rx_bytes += skb->len;
在發(fā)送函數(shù)brcm_dev_hard_start_xmit()中對于發(fā)送的報文會增加相應(yīng)流量統(tǒng)計:
- if (likely(ret == NET_XMIT_SUCCESS)) {
- txq->tx_packets++;
- txq->tx_bytes += len;
- } else
- txq->tx_dropped++;
而brcm_netdev_ops->ndo_get_stats()即brcm_dev_get_stats()函數(shù),則會將brcm網(wǎng)卡設(shè)備中
記錄的發(fā)送和接收流量信息匯總成通用的格式net_device_stats,像ifconfig等命令使用的就是net_device_stats轉(zhuǎn)換
后的結(jié)果。
完整收發(fā)函數(shù)
有了這些后接收函數(shù)brcm_skb_recv()就可以完整了,其中關(guān)于報頭brcm_hdr的處理可以略過,由于是空想的協(xié)議,含義是可以自己設(shè)定的:
- int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,
- struct packet_type *ptype, struct net_device *orig_dev)
- {
- struct brcm_hdr *bhdr;
- struct brcm_rx_stats *rx_stats;
- int op, brcm_port;
-
- skb = skb_share_check(skb, GFP_ATOMIC);
- if(!skb)
- goto err_free;
- bhdr = (struct brcm_hdr *)skb->data;
- op = bhdr->brcm_tag.brcm_53242_op;
- brcm_port = bhdr->brcm_tag.brcm_53242_src_portid- 23;
-
- rcu_read_lock();
-
-
- if (op != BRCM_RCV_OP || brcm_port < 1
- || brcm_port > 27)
- goto err_unlock;
-
- skb->dev = find_brcm_dev(dev, brcm_port);
- if (!skb->dev) {
- goto err_unlock;
- }
-
- rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats,
- smp_processor_id());
- rx_stats->rx_packets++;
- rx_stats->rx_bytes += skb->len;
- skb_pull_rcsum(skb, BRCM_HLEN);
-
- switch (skb->pkt_type) {
- case PACKET_BROADCAST:
-
- break;
-
- case PACKET_MULTICAST:
- rx_stats->multicast++;
- break;
-
- case PACKET_OTHERHOST:
-
-
-
-
- if (!compare_ether_addr(eth_hdr(skb)->h_dest,
- skb->dev->dev_addr))
- skb->pkt_type = PACKET_HOST;
- break;
- default:
- break;
- }
-
-
- skb->protocol = bhdr->brcm_encapsulated_proto;
-
-
- skb = brcm_check_reorder_header(skb);
- if (!skb) {
- rx_stats->rx_errors++;
- goto err_unlock;
- }
-
- netif_rx(skb);
- rcu_read_unlock();
- return NET_RX_SUCCESS;
-
- err_unlock:
- rcu_read_unlock();
-
- err_free:
- kfree_skb(skb);
- return NET_RX_DROP;
- }
同時,發(fā)送函數(shù)brcm_dev_hard_start_xmit()可以完整了,同樣,其中關(guān)于brcm_hdr的處理可以略過:
- static netdev_tx_t brcm_dev_hard_start_xmit(struct sk_buff *skb,
- struct net_device *dev)
- {
- int i = skb_get_queue_mapping(skb);
- struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
- struct brcm_ethhdr *beth = (struct brcm_ethhdr *)(skb->data);
- unsigned int len;
- u16 brcm_port;
- int ret;
-
-
-
-
-
-
- if (beth->h_brcm_proto != htons(ETH_P_BRCM)){
-
- brcm_t brcm_tag;
- brcm_port = brcm_dev_info(dev)->brcm_port;
- if (brcm_port == BRCM_ANY_PORT) {
- brcm_tag.brcm_op_53242 = 0;
- brcm_tag.brcm_tq_53242 = 0;
- brcm_tag.brcm_te_53242 = 0;
- brcm_tag.brcm_dst_53242 = 0;
- }else {
- brcm_tag.brcm_op_53242 = BRCM_SND_OP;
- brcm_tag.brcm_tq_53242 = 0;
- brcm_tag.brcm_te_53242 = 0;
- brcm_tag.brcm_dst_53242 = brcm_port + 23;
- }
-
- skb = brcm_put_tag(skb, *(u32 *)(&brcm_tag));
- if (!skb) {
- txq->tx_dropped++;
- return NETDEV_TX_OK;
- }
- }
-
- skb_set_dev(skb, brcm_dev_info(dev)->real_dev);
- len = skb->len;
- ret = dev_queue_xmit(skb);
-
- if (likely(ret == NET_XMIT_SUCCESS)) {
- txq->tx_packets++;
- txq->tx_bytes += len;
- } else
- txq->tx_dropped++;
-
- return ret;
- }
注冊設(shè)備
接收通過dev_add_pack(),就可以融入?yún)f(xié)議棧了,前面幾篇的分析已經(jīng)講過通過ptype_base對報文進行脫殼?,F(xiàn)在要融入的發(fā)送,函數(shù)已
經(jīng)完成了,既然發(fā)送是一種嵌套的調(diào)用,并且是由dev來推過的,那么發(fā)送函數(shù)的融入一定在設(shè)備進行注冊時,作為設(shè)備的一種發(fā)送方法。
創(chuàng)建一種設(shè)備時,一定會有設(shè)備的XXX_setup()初始化,大部分設(shè)備都會用ether_setup()來作初始化,再進行適當(dāng)更改。下面是brcm_setup():
- void brcm_setup(struct net_device *dev)
- {
- ether_setup(dev);
-
- dev->priv_flags |= IFF_BRCM_TAG;
- dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
- dev->tx_queue_len = 0;
-
- dev->netdev_ops = &brcm_netdev_ops;
- dev->destructor = free_netdev;
- dev->ethtool_ops = &brcm_ethtool_ops;
-
- memset(dev->broadcast, 0, ETH_ALEN);
- }
其中發(fā)送函數(shù)就在brcm_netdev_ops中,每層設(shè)備都會這樣調(diào)用:dev->netdev_ops->ndo_start_xmit()。
- static const struct net_device_ops brcm_netdev_ops = {
- .ndo_change_mtu = brcm_dev_change_mtu,
- .ndo_init = brcm_dev_init,
- .ndo_uninit = brcm_dev_uninit,
- .ndo_open = brcm_dev_open,
- .ndo_stop = brcm_dev_stop,
- .ndo_start_xmit = brcm_dev_hard_start_xmit,
- .ndo_validate_addr = eth_validate_addr,
- .ndo_set_mac_address = brcm_dev_set_mac_address,
- .ndo_set_rx_mode = brcm_dev_set_rx_mode,
- .ndo_set_multicast_list = brcm_dev_set_rx_mode,
- .ndo_change_rx_flags = brcm_dev_change_rx_flags,
-
- .ndo_neigh_setup = brcm_dev_neigh_setup,
- .ndo_get_stats = brcm_dev_get_stats,
- };
而設(shè)備的初始化應(yīng)該發(fā)生在創(chuàng)建設(shè)備時,也就是向網(wǎng)絡(luò)注冊它時,也就是register_brcm_dev(),注冊一個新設(shè)備,需要知道它的下層設(shè)備
real_dev以及唯一標(biāo)識brcm設(shè)備的brcm_port。首先確定該設(shè)備沒有被創(chuàng)建,然后用alloc_netdev_mq創(chuàng)建新設(shè)備
new_dev,然后設(shè)置相關(guān)屬性,特別是它的私有屬性brcm_dev_info(new_dev),然后添加它到brcm_group_hash中,
最后發(fā)生真正的注冊register_netdevice()。
- static int register_brcm_dev(struct net_device *real_dev, u16 brcm_port)
- {
- struct net_device *new_dev;
- struct net *net = dev_net(real_dev);
- struct brcm_group *grp;
- char name[IFNAMSIZ];
- int err;
-
- if(brcm_port >= BRCM_PORT_MASK)
- return -ERANGE;
-
-
- if (find_brcm_dev(real_dev, brcm_port) != NULL)
- return -EEXIST;
-
- snprintf(name, IFNAMSIZ, "brcm%i", brcm_port);
- new_dev = alloc_netdev_mq(sizeof(struct brcm_dev_info), name,
- brcm_setup, 1);
- if (new_dev == NULL)
- return -ENOBUFS;
- new_dev->real_num_tx_queues = real_dev->real_num_tx_queues;
- dev_net_set(new_dev, net);
- new_dev->mtu = real_dev->mtu;
-
- brcm_dev_info(new_dev)->brcm_port = brcm_port;
- brcm_dev_info(new_dev)->real_dev = real_dev;
- brcm_dev_info(new_dev)->dent = NULL;
-
-
- grp = brcm_find_group(real_dev);
- if (!grp)
- grp = brcm_group_alloc(real_dev);
-
- err = register_netdevice(new_dev);
- if (err < 0)
- goto out_free_newdev;
-
-
- dev_hold(real_dev);
- brcm_group_set_device(grp, brcm_port, new_dev);
-
- return 0;
-
- out_free_newdev:
- free_netdev(new_dev);
- return err;
- }
ioctl
由于brcm設(shè)備可以存在多個,并且和下層設(shè)備不是固定的對應(yīng)關(guān)系,因此它的創(chuàng)建應(yīng)該可以人為控制,因此通過ioctl由用戶進行創(chuàng)建。這里只為brcm
提供了兩種操作-添加與刪除。一種設(shè)備添加一定是與下層設(shè)備成關(guān)系的,因此添加時需要手動指明這種下層設(shè)備,然后通過
__dev_get_by_name()從網(wǎng)絡(luò)空間中找到這種設(shè)備,就可以調(diào)用register_brcm_dev()來完成注冊了。而設(shè)備的刪除則是直
接刪除,直接刪除unregister_brcm_dev()。
- static int brcm_ioctl_handler(struct net *net, void __user *arg)
- {
- int err;
- struct brcm_ioctl_args args;
- struct net_device *dev = NULL;
-
- if (copy_from_user(&args, arg, sizeof(struct brcm_ioctl_args)))
- return -EFAULT;
-
-
- args.device1[23] = 0;
- args.u.device2[23] = 0;
-
- rtnl_lock();
-
- switch (args.cmd) {
- case ADD_BRCM_CMD:
- case DEL_BRCM_CMD:
- err = -ENODEV;
- dev = __dev_get_by_name(net, args.device1);
- if (!dev)
- goto out;
-
- err = -EINVAL;
- if (args.cmd != ADD_BRCM_CMD && !is_brcm_dev(dev))
- goto out;
- }
-
- switch (args.cmd) {
- case ADD_BRCM_CMD:
- err = -EPERM;
- if (!capable(CAP_NET_ADMIN))
- break;
- err = register_brcm_dev(dev, args.u.port);
- break;
-
- case DEL_BRCM_CMD:
- err = -EPERM;
- if (!capable(CAP_NET_ADMIN))
- break;
- unregister_brcm_dev(dev, NULL);
- err = 0;
- break;
-
- default:
- err = -EOPNOTSUPP;
- break;
- }
- out:
- rtnl_unlock();
- return err;
- }
這些是brcm協(xié)議模塊的主體部分了,當(dāng)然它還不完整,在下篇中繼續(xù)完成brcm協(xié)議的添加,為它完善一些細節(jié):proc文件系統(tǒng), notifier機制等等,以及內(nèi)核Makefile的編寫,當(dāng)然還有協(xié)議的測試。相關(guān)源碼在下篇中打包上傳。