tesseract++ 0.0.1
N-dimensional tensor library for embedded systems
Loading...
Searching...
No Matches
Public Types | Static Public Member Functions | Static Public Attributes | List of all members
detail::KernelGemm< T, Bits, Arch > Struct Template Reference

#include <kernel_gemm.h>

Public Types

using K = Microkernel< T, Bits, Arch >
 
using Helpers = KernelHelpers< T, Bits, Arch >
 

Static Public Member Functions

static void gemm (const T *A, my_size_t M, my_size_t K_len, my_size_t strideA, const T *B, my_size_t N, my_size_t strideB, T *C, my_size_t strideC) noexcept
 Register-blocked GEMM: C[M,N] = A[M,K] × B[K,N].
 

Static Public Attributes

static constexpr my_size_t simdWidth = K::simdWidth
 
static constexpr my_size_t MR = K::MR
 Tile height: rows of C computed per micro-kernel invocation.
 
static constexpr my_size_t NR_VECS = K::NR_VECS
 Number of SIMD vectors per tile column. The tile width is NR = NR_VECS × simdWidth.
 
static constexpr my_size_t NR = K::NR
 Tile width: columns of C computed per wide micro-kernel invocation.
 

Member Typedef Documentation

◆ Helpers

template<typename T , my_size_t Bits, typename Arch >
using detail::KernelGemm< T, Bits, Arch >::Helpers = KernelHelpers<T, Bits, Arch>

◆ K

template<typename T , my_size_t Bits, typename Arch >
using detail::KernelGemm< T, Bits, Arch >::K = Microkernel<T, Bits, Arch>

Member Function Documentation

◆ gemm()

template<typename T , my_size_t Bits, typename Arch >
static void detail::KernelGemm< T, Bits, Arch >::gemm ( const T *  A,
my_size_t  M,
my_size_t  K_len,
my_size_t  strideA,
const T *  B,
my_size_t  N,
my_size_t  strideB,
T *  C,
my_size_t  strideC 
)
inlinestaticnoexcept

Register-blocked GEMM: C[M,N] = A[M,K] × B[K,N].

Top-level dispatcher that tiles the output matrix and routes each tile to the appropriate micro-kernel based on its position.

All pointers address raw physical memory with padded row strides. The caller must ensure the favorable layout (see Memory Layout Requirements).

Parameters
APointer to first element of A
MNumber of rows of A (and C)
K_lenContraction length (columns of A, rows of B)
strideAPhysical row stride of A (≥ K_len, includes padding)
BPointer to first element of B
NNumber of columns of B (and C)
strideBPhysical row stride of B (≥ N, includes padding)
CPointer to first element of C (output, zero-initialized not required)
strideCPhysical row stride of C (≥ N, includes padding)
Here is the call graph for this function:

Member Data Documentation

◆ MR

template<typename T , my_size_t Bits, typename Arch >
constexpr my_size_t detail::KernelGemm< T, Bits, Arch >::MR = K::MR
staticconstexpr

Tile height: rows of C computed per micro-kernel invocation.

◆ NR

template<typename T , my_size_t Bits, typename Arch >
constexpr my_size_t detail::KernelGemm< T, Bits, Arch >::NR = K::NR
staticconstexpr

Tile width: columns of C computed per wide micro-kernel invocation.

◆ NR_VECS

template<typename T , my_size_t Bits, typename Arch >
constexpr my_size_t detail::KernelGemm< T, Bits, Arch >::NR_VECS = K::NR_VECS
staticconstexpr

Number of SIMD vectors per tile column. The tile width is NR = NR_VECS × simdWidth.

◆ simdWidth

template<typename T , my_size_t Bits, typename Arch >
constexpr my_size_t detail::KernelGemm< T, Bits, Arch >::simdWidth = K::simdWidth
staticconstexpr

The documentation for this struct was generated from the following file: