滑动平均模型浅谈

本文根据tf.train.ExponentialMovingAverage()展开。简单介绍一个可以使模型在测试数据集上更健壮的方法,即滑动平均模型。

When training a model, it is often beneficial to maintain moving averages of the trained parameters. Evaluations that use averaged parameters sometimes produce significantly better results than the final trained values.

The apply() method adds shadow copies of trained variables and add ops that maintain a moving average of the trained variables in their shadow copies. It is used when building the training model. The ops that maintain moving averages are typically run after each training step. The average() and average_name() methods give access to the shadow variables and their names.

When you run the ops to maintain the moving averages, each shadow variable is updated with the formula:

shadow_variable -= (1 – decay) * (shadow_variable – variable)

This is mathematically equivalent to the classic formula below, but the use of an assign_sub op (the “-=” in the formula) allows concurrent lockless updates to the variables:

shadow_variable = decay * shadow_variable + (1 – decay) * variable

Reasonable values for decay are close to 1.0, typically in the multiple-nines range: 0.999, 0.9999, etc.

Share this to:

发表评论

电子邮件地址不会被公开。 必填项已用*标注