Mila
Deep Neural Network Library
|
Implementation of the Gaussian Error Linear Unit (GELU) activation function. More...
#include <memory>
#include <vector>
#include <string>
#include <iostream>
#include <sstream>
#include <type_traits>
import Serialization.ModelArchive;
import Compute.CudaMemoryResource;
import Compute.CpuMemoryResource;
import Dnn.Modules.Gelu:Config;
import Dnn.Module;
import Compute.ComputeDevice;
import Compute.OperationAttributes;
import Compute.DeviceType;
import Compute.Precision;
import Compute.DeviceContext;
import Compute.OperationRegistry;
import Dnn.TensorTraits;
import Compute.UnaryOperation;
import Dnn.Tensor;
import Compute.OperationBase;
import Compute.MemoryResource;
Classes | |
class | Mila::Dnn::Gelu< TDeviceType, TDataType > |
Gaussian Error Linear Unit (GELU) activation function module. More... | |
Namespaces | |
namespace | Mila |
namespace | Mila::Dnn |
Typedefs | |
template<typename TDataType = float> | |
using | Mila::Dnn::CpuGelu = Gelu< DeviceType::Cpu, TDataType > |
Type alias for CPU-specific GELU module. | |
template<typename TDataType = float> | |
using | Mila::Dnn::CudaGelu = Gelu< DeviceType::Cuda, TDataType > |
Type alias for CUDA-specific GELU module. | |
Implementation of the Gaussian Error Linear Unit (GELU) activation function.
This module implements the GELU activation function as described in: "Gaussian Error Linear Units (GELUs)" by Hendrycks and Gimpel (2016). https://arxiv.org/abs/1606.08415
GELU activation has become a standard component in transformer architectures like BERT, GPT, and their derivatives, often replacing traditional ReLU activations.