mindformers.modules¶
MindFormers Transformers API.
mindformers.modules.layers¶
A Dropout Implements with P.DropoutGenMask and P.DropoutDoMask for parallel training. |
|
Fixed Sparse Attention Layer. |
|
A self-defined layer norm operation using reduce sum and reduce mean |
|
The dense connected layer. |
mindformers.modules.transformer¶
Get the Lower triangular matrix from the input mask. |
|
The parallel config of |
|
The multilayer perceptron with two linear layers with dropout applied at final output. |
|
The configuration of MoE (Mixture of Expert). |
|
This is an implementation of multihead attention in the paper Attention is all you need.Given the query vector with source length, and the key and value vector with target length, the attention will be performed as the following. |
|
OpParallelConfig for the setting data parallel and model parallel. |
|
Transformer module including encoder and decoder. |
|
Transformer Decoder module with multi-layer stacked of TransformerDecoderLayer, including multihead self attention, cross attention and feedforward layer. |
|
Transformer Decoder Layer. |
|
Transformer Encoder module with multi-layer stacked of TransformerEncoderLayer, including multihead self attention and feedforward layer. |
|
Transformer Encoder Layer. |
|
TransformerOpParallelConfig for setting parallel configuration, such as the data parallel and model parallel. |
|
TransformerRecomputeConfig for the setting recompute attributes for encoder/decoder layers. |
|
The embedding lookup table from the 0-th dim of the parameter table. |