There seems to be exploding gradients when training on large data sets or many epochs
There seems to be exploding gradients when training on large data sets or many epochs