# Alexander Shekhovtsov presents Explainable Training of Binary Neural Networks

On 2020-12-22 11:00
at Online https://cw.felk.cvut.cz/brute/bbb.php?join=cw_Xptf5FKT1b

There is a high demand for neural networks using low precision computations or

even using mostly binary operations, which are much faster and need less

energy.

The performance of such binary neural networks on benchmarks like the ImageNet

classification challenge steadily improves while the number of unclear tricks

and special ingredients involved in the training procedures grows. Many of

these

tricks are about the question of how to train with binary activations and or

binary weights by somehow (ab)using backpropagation. Can we instead derive

learning methods that would be correct in a certain sense so that we would know

what we are doing? Towards this end, we apply the stochastic relaxation method:

each binary entity has a probability of taking a particular state, then the

optimization and gradients can be performed with respect to these continuous

probabilities. With some surprise, we have derived the popular straight-through

estimator in a particular form and some of the popular weight update rules.

This

theoretically grounded approach allowed us to analyze these estimators,

understand the limitations, analytically derive useful recommendations

regarding

their application in practice, obtain improved estimators and numerically

verify

their accuracy.

The conceptual message of this research is that it is possible to train binary

networks using explainable methods, although partially coinciding with previous

empirical approaches but now free of guessing, with known properties and

limitations, swap-in more accurate methods as needed and improve them further.

Meeting link (CTU login):

https://cw.felk.cvut.cz/brute/bbb.php?join=cw_Xptf5FKT1b

Meeting link (non-CTU): https://cw.felk.cvut.cz/bbb/cw_Xptf5FKT1b

References:

[1] "Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks"

https://arxiv.org/abs/2006.03143

[2] "Reintroducing Straight-Through Estimators as Principled Methods for

Stochastic Binary Networks"

https://openreview.net/forum?id=F8lXvXpZdrL

Web page of this event:

https://docs.google.com/document/d/1dwnWx31cscD4wRKJArqScfGa8zU-4rUKMrvpoxnmiwU/edit#

even using mostly binary operations, which are much faster and need less

energy.

The performance of such binary neural networks on benchmarks like the ImageNet

classification challenge steadily improves while the number of unclear tricks

and special ingredients involved in the training procedures grows. Many of

these

tricks are about the question of how to train with binary activations and or

binary weights by somehow (ab)using backpropagation. Can we instead derive

learning methods that would be correct in a certain sense so that we would know

what we are doing? Towards this end, we apply the stochastic relaxation method:

each binary entity has a probability of taking a particular state, then the

optimization and gradients can be performed with respect to these continuous

probabilities. With some surprise, we have derived the popular straight-through

estimator in a particular form and some of the popular weight update rules.

This

theoretically grounded approach allowed us to analyze these estimators,

understand the limitations, analytically derive useful recommendations

regarding

their application in practice, obtain improved estimators and numerically

verify

their accuracy.

The conceptual message of this research is that it is possible to train binary

networks using explainable methods, although partially coinciding with previous

empirical approaches but now free of guessing, with known properties and

limitations, swap-in more accurate methods as needed and improve them further.

Meeting link (CTU login):

https://cw.felk.cvut.cz/brute/bbb.php?join=cw_Xptf5FKT1b

Meeting link (non-CTU): https://cw.felk.cvut.cz/bbb/cw_Xptf5FKT1b

References:

[1] "Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks"

https://arxiv.org/abs/2006.03143

[2] "Reintroducing Straight-Through Estimators as Principled Methods for

Stochastic Binary Networks"

https://openreview.net/forum?id=F8lXvXpZdrL

Web page of this event:

https://docs.google.com/document/d/1dwnWx31cscD4wRKJArqScfGa8zU-4rUKMrvpoxnmiwU/edit#