
tile_broadcast_one_blk【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass代码位置[TOC]概述tile_broadcast_one_blk模块实现 epilogue 阶段的 one-block 广播操作。将 UB 上的单个元素广播到整个 block32B常用于将 scalar scale/zero 点广播后参与向量计算。API 清单API风格说明TileBroadcastOneBlk非 TLAAscendC::BrcbBrcbRepeatParamsTileBroadcastOneBlkTlaTLATLA 版本tensor.layout()(tensor.coord())偏移调用示例TileBroadcastOneBlk非 TLA#include catlass/epilogue/tile/tile_broadcast_one_blk.hpp using namespace Catlass::Epilogue::Tile; using ComputeType Gemm::GemmTypehalf, layout::RowMajor; constexpr uint32_t COMPUTE_LENGTH 256; using BroadcastOp TileBroadcastOneBlkArch::AtlasA2, ComputeType, COMPUTE_LENGTH; AscendC::LocalTensorhalf ubOut, ubIn; BroadcastOp broadcastOp; broadcastOp(ubOut, ubIn);TileBroadcastOneBlkTlaTLAconstexpr uint32_t COMPUTE_LENGTH 256; auto layoutOut tla::MakeLayouthalf, layout::RowMajor(COMPUTE_LENGTH, 32); auto layoutIn tla::MakeLayouthalf, layout::VectorLayout(COMPUTE_LENGTH, 1); AscendC::LocalTensorhalf ubOutData, ubInData; auto ubOut tla::MakeTensor(ubOutData, layoutOut, Arch::PositionUB{}); auto ubIn tla::MakeTensor(ubInData, layoutIn, Arch::PositionUB{}); TileBroadcastOneBlkTlaArch::AtlasA2, half, COMPUTE_LENGTH op; op(ubOut, ubIn);【免费下载链接】catlass本项目是CANN的算子模板库提供NPU上高性能矩阵乘及其相关融合类算子模板样例。项目地址: https://gitcode.com/cann/catlass创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考