caused by num_computational_layers being wrong when tax layers were placed as
the first layer. These visit functions being wrong also caused multi-GPU
support to not work on such networks.
std::async() since std::async creates new threads with each invocation, which
in turn causes objects with thread_local storage duration to be reconstructed
each time. This is problematic because CUDA context objects for cublas and
cudnn get reconstructed over and over, slowing things down and generally using
more resources than should be used.
progress" estimate. I also renamed the get/set functions for the shrink amount
to have a consistent name and use the word "factor" instead of "amount".
object as an input. This allows the solvers to exhibit a more complex behavior
that depends on the specific layer. It also removes the learning rate from the
solver's parameter set and pushes it entirely into the core training code.
This also removes the need for the separate "step size" which previously was
multiplied with the output of the solvers.
Most of the code is still the same, and in the core and trainer the step_size
variables have just been renamed to learning_rate. The dnn_trainer's relevant
member functions have also been renamed.
The examples have been updated to reflect these API changes. I also cleaned up
the resnet definition and added better downsampling.
skip layers and add_prev style layers. In particular, now in-place layers only
overwrite the gradient information in their child layer if they are operating
in in-place mode. Otherwise, they add their gradients to their child layers.
It should also be noted that it's safe for in-place layers to overwrite
gradients when in in-place mode since their child layers are inaccessible when
in-place layers operate in in-place mode. This prevents any other layers from
trying to add to the child layer, thereby avoiding the potability of layer
interference. So the bug this change fixes is that, when not in in-place mode
the child layers are still accessible but in-place layers were *still*
overwriting child gradients.
* Define LIB_INSTALL_DIR cache variable, allowing for multilib installations
* Discover BLAS and LAPACK via pkg-config if possible
* Fix incorrect toolchain variables in "dlib/test/makefile"