AIfES 2  2.0.0
aiopti_adam.h File Reference

Base optimizer implementation of the Adam optimizer More...

Go to the source code of this file.

Data Structures

struct  aiopti_adam
 General Adam optimizer struct. More...
 
struct  aiopti_adam_momentums
 Struct for the momentum tensors of an Adam optimizer. More...
 

Typedefs

typedef struct aiopti_adam aiopti_adam_t
 New data type name for code reduction.
 
typedef struct aiopti_adam_momentums aiopti_adam_momentums_t
 New data type name for code reduction.
 

Functions

aiopti_taiopti_adam (aiopti_adam_t *opti)
 Initialize the given Adam optimizer. More...
 
uint32_t aiopti_adam_sizeof_optimem (aiopti_t *self, const aitensor_t *params)
 Calculates the required memory for the optimization step. More...
 
void aiopti_adam_init_optimem (aiopti_t *self, const aitensor_t *params, const aitensor_t *gradients, void *optimem)
 Initialization of the optimization memory buffer. More...
 
void aiopti_adam_zero_gradients (aiopti_t *self, aitensor_t *gradients)
 Set the gradients to zero. More...
 
void aiopti_adam_update_params (aiopti_t *self, aitensor_t *params, const aitensor_t *gradients, void *optimem)
 Update the given parameter tensor with respect to the gradients. More...
 
void aiopti_adam_print_specs (const aiopti_t *self)
 Print the optimizer specification. More...
 

Variables

const aicore_optitype_taiopti_adam_type
 Adam optimizer type. More...
 

Detailed Description

Base optimizer implementation of the Adam optimizer

Version
2.2.0

This is an "abstract" data-type independent implementation. To use the optimizer, use one of the provided implementations for a specific hardware and data-type (for example from aiopti_adam_default.h) or set the required math functions on your own.

The Adam optimizer is based on SGD and uses first-order and second-order moments for adaptive estimation. It uses the pre-calculated gradients to optimize the given parameters. For every parameter \( p \) of the parameters to optimize (trainable parameters) and the related gradient \( g \) it calculates

\[ m_t = \beta_1 \cdot m_{t-1} + (1 - \beta_1) \cdot g_t \]

\[ v_t = \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2_t \]

\[ p_t = p_{t-1} - lr_t \cdot \frac{m_t} {\sqrt{v_t} + \hat{\epsilon}} \]

in every optimization step with \( lr_t = lr \cdot \frac{\sqrt{1 - \beta^t_2}} {(1 - \beta_1)^t} \).
\( lr \) is the learning rate that defines how big the optimization steps should be, and therefore how fast the training will be. \( m \) and \( v \) are the first- and second-order moments related to the parameter and must be stored in the optimization memory for every parameter.

Function Documentation

◆ aiopti_adam()

Initialize the given Adam optimizer.

This function represents the "constructor" of the abstract Adam optimizer.
This function is not intended to call it directly. Instead use one of the data type specific implementations (like for example aiopti_adam_f32_default()).

Parameters
*optiThe optimizer to initialize.
Returns
Pointer to the (successfully) initialized general optimizer structure (aiopti_adam.base)

◆ aiopti_adam_init_optimem()

void aiopti_adam_init_optimem ( aiopti_t self,
const aitensor_t params,
const aitensor_t gradients,
void *  optimem 
)

Initialization of the optimization memory buffer.

Implementation of aiopti.init_optimem.

Initialize the first and second moment tensors with zeros:

\[ m_{0,i} \leftarrow 0 \]

\[ v_{0,i} \leftarrow 0 \]

Used math functions:

Parameters
*selfThe optimizer
*paramsThe tensor of trainable parameters
*gradientsThe gradients associated to the parameters
*optimemThe optimization memory (containing the first and second moment) associated to the parameters

◆ aiopti_adam_print_specs()

void aiopti_adam_print_specs ( const aiopti_t self)

Print the optimizer specification.

Parameters
*selfThe optimizer to print the specification for

◆ aiopti_adam_sizeof_optimem()

uint32_t aiopti_adam_sizeof_optimem ( aiopti_t self,
const aitensor_t params 
)

Calculates the required memory for the optimization step.

Implementation of aiopti.sizeof_optimem.

Calculates the size of the memory space that must be reserved. The memory is used for the first and second moment tensors and is calculated by:

2 * (sizeof(aitensor) + sizeof(params.data))
A tensor in AIfES.
Definition: aifes_math.h:89
Parameters
*selfThe optimizer
*paramsThe tensor of trainable parameters to calculate the memory for

◆ aiopti_adam_update_params()

void aiopti_adam_update_params ( aiopti_t self,
aitensor_t params,
const aitensor_t gradients,
void *  optimem 
)

Update the given parameter tensor with respect to the gradients.

Implementation of aiopti.update_params.

Calculate and update the values of the trainable parameters (perform one update step):

\[ m_t \leftarrow \beta_1 \cdot m_{t-1} + (1 - \beta_1) \cdot g_t \]

\[ v_t \leftarrow \beta_2 \cdot v_{t-1} + (1 - \beta_2) \cdot g^2_t \]

\[ p_t \leftarrow p_{t-1} - lr_t \cdot \frac{m_t} {\sqrt{v_t} + \hat{\epsilon}} \]

\( m \): First moment estimates
\( v \): Second moment estimates
\( p \): Tensor of trainable parameters to update (params)
\( g \): Gradients
\( lr \): Learning rate / Optimization step size
\( \beta_1 \): Exponential decay rate for the first moment estimates
\( \beta_2 \): Exponential decay rate for the second moment estimates
\( \hat{\epsilon} \): Small positive number for numerical stability

Used math functions:

Parameters
*selfThe optimizer
*paramsThe tensor of trainable parameters \( p \) to update
*gradientsThe gradients \( g \) associated to the parameters
*optimemThe buffer to store the first and second momentums \( m \) and \( v \)

◆ aiopti_adam_zero_gradients()

void aiopti_adam_zero_gradients ( aiopti_t self,
aitensor_t gradients 
)

Set the gradients to zero.

Implementation of aiopti.zero_gradients.

\[ g_{i} \leftarrow 0 \]

Used math functions:

Parameters
*selfThe optimizer
*gradientsThe gradients to set to zero

Variable Documentation

◆ aiopti_adam_type

const aicore_optitype_t* aiopti_adam_type
extern

Adam optimizer type.

Defines the type of the optimizer (for example for type checks and debug prints). See aicore_optitype for more information about the optimizer type.