news 2026/6/14 3:28:59

yolov26改进 | Neck/颈部创新篇 | 独创HFPN利用分层特征融合块HFFB模块融合多层次特征改进yolov26(全网独家创新)

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
yolov26改进 | Neck/颈部创新篇 | 独创HFPN利用分层特征融合块HFFB模块融合多层次特征改进yolov26(全网独家创新)

一、本文介绍

本文给大家带来的最新改进是利用分层特征融合块HFFB创新yolov26的neck部分我称之为HFPN,这个模块可以融合局部特征、全局特征、中间特征将三种特征融合在一起辅助yolov26进行检测,经过我的设计分为三种可以针对大目标、小目标、标准目标的检测方式均不同,大家可以根据自己的数据集进行不同的选择,本文的内容为我独家创新。

专栏链接:YOLOv26有效涨点专栏包含:Conv、注意力机制、主干/Backbone、损失函数、优化器、后处理等改进机制


目录

一、本文介绍

二、原理介绍

三、核心代码

四、添加方法

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

五、正式训练

5.1 yaml文件

5.2 训练代码

5.3 训练过程截图

五、本文总结


二、原理介绍

官方论文地址:官方论文点击此处即可跳转

官方代码地址:官方代码点击此处即可跳转


HiFuse 采用了三分支分层多尺度特征融合网络,结合 CNN 和 Transformer 的优势:

局部分支(Local Feature Block):通过 3×3 深度可分离卷积提取局部特征。

全局分支(Global Feature Block):基于 Swin Transformer 采用窗口多头自注意力(W-MSA)提取全局信息。

自适应层次特征融合块(HFF Block):用于融合不同层次的局部和全局特征,包括:

空间注意力(SA):增强局部细节。

通道注意力(CA):提升特定语义特征。

残差反向 MLP(IRMLP):防止梯度消失,提高信息流动。

Shortcut 连接:优化特征融合效果。


三、核心代码

核心代码的使用方式看章节四!

import torch import torch.nn as nn import torch.nn.functional as F class Conv(nn.Module): def __init__(self, inp_dim, out_dim, kernel_size=3, stride=1, bn=False, relu=True, bias=True, group=1): super(Conv, self).__init__() self.inp_dim = inp_dim self.conv = nn.Conv2d(inp_dim, out_dim, kernel_size, stride, padding=(kernel_size-1)//2, bias=bias) self.relu = None self.bn = None if relu: self.relu = nn.ReLU(inplace=True) if bn: self.bn = nn.BatchNorm2d(out_dim) def forward(self, x): assert x.size()[1] == self.inp_dim, "{} {}".format(x.size()[1], self.inp_dim) x = self.conv(x) if self.bn is not None: x = self.bn(x) if self.relu is not None: x = self.relu(x) return x def drop_path_f(x, drop_prob: float = 0., training: bool = False): """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). This is the same as the DropConnect impl I created for EfficientNet, etc networks, however, the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper... See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use 'survival rate' as the argument. """ if drop_prob == 0. or not training: return x keep_prob = 1 - drop_prob shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device) random_tensor.floor_() # binarize output = x.div(keep_prob) * random_tensor return output class DropPath(nn.Module): """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks). """ def __init__(self, drop_prob=None): super(DropPath, self).__init__() self.drop_prob = drop_prob def forward(self, x): return drop_path_f(x, self.drop_prob, self.training) ##### Local Feature Block Component ##### class LayerNorm(nn.Module): r""" LayerNorm that supports two data formats: channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch_size, height, width, channels) while channels_first corresponds to inputs with shape (batch_size, channels, height, width). """ def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"): super().__init__() self.weight = nn.Parameter(torch.ones(normalized_shape), requires_grad=True) self.bias = nn.Parameter(torch.zeros(normalized_shape), requires_grad=True) self.eps = eps self.data_format = data_format if self.data_format not in ["channels_last", "channels_first"]: raise ValueError(f"not support data format '{self.data_format}'") self.normalized_shape = (normalized_shape,) def forward(self, x: torch.Tensor) -> torch.Tensor: if self.data_format == "channels_last": return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps) elif self.data_format == "channels_first": # [batch_size, channels, height, width] mean = x.mean(1, keepdim=True) var = (x - mean).pow(2).mean(1, keepdim=True) x = (x - mean) / torch.sqrt(var + self.eps) x = self.weight[:, None, None] * x + self.bias[:, None, None] return x class Local_block(nn.Module): r""" Local Feature Block. There are two equivalent implementations: (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W) (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back We use (2) as we find it slightly faster in PyTorch Args: dim (int): Number of input channels. drop_rate (float): Stochastic depth rate. Default: 0.0 """ def __init__(self, dim, drop_rate=0.): super().__init__() self.dwconv = nn.Conv2d(dim, dim, kernel_size=3, padding=1, groups=dim) # depthwise conv self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_last") self.pwconv = nn.Linear(dim, dim) # pointwise/1x1 convs, implemented with linear layers self.act = nn.GELU() self.drop_path = DropPath(drop_rate) if drop_rate > 0. else nn.Identity() def forward(self, x: torch.Tensor) -> torch.Tensor: shortcut = x x = self.dwconv(x) x = x.permute(0, 2, 3, 1) # [N, C, H, W] -> [N, H, W, C] x = self.norm(x) x = self.pwconv(x) x = self.act(x) x = x.permute(0, 3, 1, 2) # [N, H, W, C] -> [N, C, H, W] x = shortcut + self.drop_path(x) return x class IRMLP(nn.Module): def __init__(self, inp_dim, out_dim): super(IRMLP, self).__init__() self.conv1 = Conv(inp_dim, inp_dim, 3, relu=False, bias=False, group=inp_dim) self.conv2 = Conv(inp_dim, inp_dim * 4, 1, relu=False, bias=False) self.conv3 = Conv(inp_dim * 4, out_dim, 1, relu=False, bias=False, bn=True) self.gelu = nn.GELU() self.bn1 = nn.BatchNorm2d(inp_dim) def forward(self, x): residual = x out = self.conv1(x) out = self.gelu(out) out += residual out = self.bn1(out) out = self.conv2(out) out = self.gelu(out) out = self.conv3(out) return out # Hierachical Feature Fusion Block class HFFB(nn.Module): def __init__(self, ch_1, r_2=16, drop_rate=0.): super(HFFB, self).__init__() ch_2 = ch_1 ch_int = ch_1 ch_out = ch_2 self.maxpool=nn.AdaptiveMaxPool2d(1) self.avgpool=nn.AdaptiveAvgPool2d(1) self.se=nn.Sequential( nn.Conv2d(ch_2, ch_2 // r_2, 1,bias=False), nn.ReLU(), nn.Conv2d(ch_2 // r_2, ch_2, 1,bias=False) ) self.sigmoid = nn.Sigmoid() self.spatial = Conv(2, 1, 7, bn=True, relu=False, bias=False) self.W_l = Conv(ch_1, ch_int, 1, bn=True, relu=False) self.W_g = Conv(ch_2, ch_int, 1, bn=True, relu=False) self.Avg = nn.AvgPool2d(2, stride=2) self.Updim = Conv(ch_int//2, ch_int, 1, bn=True, relu=True) self.norm1 = LayerNorm(ch_int * 3, eps=1e-6, data_format="channels_first") self.norm2 = LayerNorm(ch_int * 2, eps=1e-6, data_format="channels_first") self.norm3 = LayerNorm(ch_1 + ch_2 + ch_int, eps=1e-6, data_format="channels_first") self.W3 = Conv(ch_int * 3, ch_int, 1, bn=True, relu=False) self.W = Conv(ch_int * 2, ch_int, 1, bn=True, relu=False) self.gelu = nn.GELU() self.residual = IRMLP(ch_1 + ch_2 + ch_int, ch_out) self.drop_path = DropPath(drop_rate) if drop_rate > 0. else nn.Identity() def forward(self, x): l, g, f = x W_local = self.W_l(l) # local feature from Local Feature Block W_global = self.W_g(g) # global feature from Global Feature Block if f is not None: W_f = self.Updim(f) W_f = self.Avg(W_f) shortcut = W_f X_f = torch.cat([W_f, W_local, W_global], 1) X_f = self.norm1(X_f) X_f = self.W3(X_f) X_f = self.gelu(X_f) else: shortcut = 0 X_f = torch.cat([W_local, W_global], 1) X_f = self.norm2(X_f) X_f = self.W(X_f) X_f = self.gelu(X_f) # spatial attention for ConvNeXt branch l_jump = l max_result, _ = torch.max(l, dim=1, keepdim=True) avg_result = torch.mean(l, dim=1, keepdim=True) result = torch.cat([max_result, avg_result], 1) l = self.spatial(result) l = self.sigmoid(l) * l_jump # channel attetion for transformer branch g_jump = g max_result=self.maxpool(g) avg_result=self.avgpool(g) max_out=self.se(max_result) avg_out=self.se(avg_result) g = self.sigmoid(max_out+avg_out) * g_jump fuse = torch.cat([g, l, X_f], 1) fuse = self.norm3(fuse) fuse = self.residual(fuse) fuse = shortcut + self.drop_path(fuse) return fuse

四、添加方法

下面的步骤如果你不会或者不想麻烦操作,可以联系作者获得本专栏添加所有项目文件的源代码,可直接训练.

4.1 修改一

第一还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹!


4.2 修改二

然后在Addmodules文件夹内建立一个新的py文件,将本文章节三中的“核心代码"复制粘贴进去


4.3 修改三

第二步我们在该目录下创建一个新的py文件名字为'__init__.py',然后在其内部导入我们的文件,如下图所示。


4.4 修改四

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(此处只需要添加一次即可,如果你用我其它的改进机制这里的步骤只需要添加一次)

​​​​


4.5 修改五

在'ultralytics/nn/tasks.py'文件内的parse_model方法函数内(位置大概在1500+行左右)。

# ------------------------------HFFB-------------------------------- elif m is HFFB: c2 = ch[f[0]] args = [c2, *args] # ------------------------------HFFB--------------------------------


五、正式训练


5.1 yaml文件

训练信息:YOLO26-Neck-HFFB summary: 291 layers, 3,068,352 parameters, 3,068,352 gradients, 13.0 GFLOPs

# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics YOLO26 object detection model with P3/8 - P5/32 outputs # Model docs: https://docs.ultralytics.com/models/yolo26 # Task docs: https://docs.ultralytics.com/tasks/detect # Parameters nc: 80 # number of classes end2end: True # whether to use end-to-end mode reg_max: 1 # DFL bins scales: # model compound scaling constants, i.e. 'model=yolo26n.yaml' will call yolo26.yaml with scale 'n' # [depth, width, max_channels] n: [0.50, 0.25, 1024] # summary: 260 layers, 2,572,280 parameters, 2,572,280 gradients, 6.1 GFLOPs s: [0.50, 0.50, 1024] # summary: 260 layers, 10,009,784 parameters, 10,009,784 gradients, 22.8 GFLOPs m: [0.50, 1.00, 512] # summary: 280 layers, 21,896,248 parameters, 21,896,248 gradients, 75.4 GFLOPs l: [1.00, 1.00, 512] # summary: 392 layers, 26,299,704 parameters, 26,299,704 gradients, 93.8 GFLOPs x: [1.00, 1.50, 512] # summary: 392 layers, 58,993,368 parameters, 58,993,368 gradients, 209.5 GFLOPs # YOLO26n backbone backbone: # [from, repeats, module, args] - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2 - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4 - [-1, 2, C3k2, [256, False, 0.25]] - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8 - [-1, 2, C3k2, [512, False, 0.25]] - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16 - [-1, 2, C3k2, [512, True]] - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32 - [-1, 2, C3k2, [1024, True]] - [-1, 1, SPPF, [1024, 5, 3, True]] # 9 - [-1, 2, C2PSA, [1024]] # 10 # YOLO26n head head: - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 6], 1, Concat, [1]] # cat backbone P4 - [-1, 2, C3k2, [512, True]] # 13 - [-1, 1, nn.Upsample, [None, 2, "nearest"]] - [[-1, 4], 1, Concat, [1]] # cat backbone P3 - [-1, 2, C3k2, [256, True]] # 16 (P3/8-small) - [-1, 1, Conv, [256, 3, 2]] - [[-1, 13], 1, Concat, [1]] # cat head P4 - [-1, 2, C3k2, [512, True]] # 19 (P4/16-medium) - [-1, 1, Conv, [512, 3, 2]] - [[-1, 10], 1, Concat, [1]] # cat head P5 - [-1, 1, C3k2, [1024, True, 0.5, True]] # 22 (P5/32-large) # 下面分了三组,每一组针对的目标不一样顺序是 大、中、小,根据自己的选择进行注释选择即可,只能选择一个默认是小 # - [[22, 10, 19], 1, HFFB, []] # 23 (P5/32-large) # - [[16, 19, 23], 1, Detect, [nc]] # Detect(P3, P4, P5) # - [[19, 6, 16], 1, HFFB, []] # 23 (P4/16-medium) # - [[16, 23, 22], 1, Detect, [nc]] # Detect(P3, P4, P5) - [[16, 3, 1], 1, HFFB, []] # 23 (P3/8-small) - [[23, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)

5.2 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

import warnings warnings.filterwarnings('ignore') from ultralytics import YOLO if __name__ == '__main__': model = YOLO('模型配置文件地址,也就是5.1你保存到本地文件的地址') # 如何切换模型版本, 上面的ymal文件可以改为 yolo26s.yaml就是使用的26s, # 类似某个改进的yaml文件名称为yolo26-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolo26l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)! # model.load('yolo26n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度 model.train( data=r"数据集文件地址", # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose cache=False, imgsz=640, epochs=20, single_cls=False, # 是否是单类别检测 batch=16, close_mosaic=0, workers=0, device='0', optimizer='MuSGD', # using SGD/MuSGD # resume=, # 这里是填写last.pt地址 amp=True, # 如果出现训练损失为Nan可以关闭amp project='runs/train', name='exp', )

5.3 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv26改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

专栏链接:

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/6/14 3:26:52

联想刃7000k BIOS隐藏选项终极解锁指南:释放硬件潜能的完整教程

联想刃7000k BIOS隐藏选项终极解锁指南:释放硬件潜能的完整教程 【免费下载链接】Lenovo-7000k-Unlock-BIOS Lenovo联想刃7000k2021-3060版解锁BIOS隐藏选项并提升为Admin权限 项目地址: https://gitcode.com/gh_mirrors/le/Lenovo-7000k-Unlock-BIOS 联想刃…

作者头像 李华
网站建设 2026/6/14 3:25:23

深入解析NXP Kinetis LPSCI串口驱动:从阻塞/非阻塞模式到DMA集成实战

1. 项目概述:LPSCI驱动的核心价值与定位在嵌入式开发领域,串口通信(UART)就像设备与外界对话的“嘴巴”和“耳朵”,其稳定性和效率直接决定了整个系统的交互能力和可靠性。对于基于NXP Kinetis系列微控制器的项目而言&…

作者头像 李华
网站建设 2026/6/14 3:18:58

别再傻傻分不清!工业相机选型时,信噪比和动态范围到底该看哪个?

工业相机选型实战:信噪比与动态范围的场景化决策指南在机器视觉系统的设计与实施中,工业相机的选型往往成为项目成败的关键分水岭。当工程师面对海康MV-CH250-10GM与Basler ace acA2440-35um两款参数相近的工业相机时,信噪比(SNR)与动态范围(…

作者头像 李华