When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.
The apply() method adds shadow copies of trained variables and add ops that maintain a moving average of the trained variables in their shadow copies. It is used when building the training model. The ops that maintain moving averages are typically run after each training step. The average() and average_name() methods give access to the shadow variables and their names.
When you run the ops to maintain the moving averages, each shadow variable is updated with the formula:
shadow_variable -= (1 – decay) * (shadow_variable – variable)
This is mathematically equivalent to the classic formula below, but the use of an assign_sub op (the “-=” in the formula) allows concurrent lockless updates to the variables:
shadow_variable = decay * shadow_variable + (1 – decay) * variable
Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.9999, etc.