// The contents of this file are in the public domain. See LICENSE_FOR_EXAMPLE_PROGRAMS.txt /* This is an example illustrating the use of the deep learning tools from the dlib C++ Library. I'm assuming you have already read the dnn_introduction_ex.cpp, the dnn_introduction2_ex.cpp and the dnn_introduction3_ex.cpp examples. In this example program we are going to show how one can train a neural network using an unsupervised loss function. In particular, we will train the ResNet50 model from the paper "Deep Residual Learning for Image Recognition" by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. To train the unsupervised loss, we will use the self-supervised learning (SSL) method called Barlow Twins, introduced in this paper: "Barlow Twins: Self-Supervised Learning via Redundancy Reduction" by Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stéphane Deny. The paper contains a good explanation on how and why this works, but the main idea behind the Barlow Twins method is: - generate two distorted views of a batch of images: YA, YB - feed them to a deep neural network and obtain their representations and and batch normalize them: ZA, ZB - compute the empirical cross-correlation matrix between both feature representations as: C = trans(ZA) * ZB. - make C as close as possible to the identity matrix. This removes the redundancy of the feature representations by maximizing the encoded information about the images themselves, while minimizing the information about the transforms and data augmentations used to obtain the representations. The original Barlow Twins paper uses the ImageNet dataset, but in this example we are using CIFAR-10, so we will follow the recommendations of this paper, instead: "A Note on Connecting Barlow Twins with Negative-Sample-Free Contrastive Learning" by Yao-Hung Hubert Tsai, Shaojie Bai, Louis-Philippe Morency, Ruslan Salakhutdinov, in which they experiment with Barlow Twins on CIFAR-10 and Tiny ImageNet. Since the CIFAR-10 contains relatively small images, we will define a ResNet50 architecture that doesn't downsample the input in the first convolutional layer, and doesn't have a max pooling layer afterwards, like the paper does. This example shows how to use the Barlow Twins loss for this common scenario: Let's imagine that we have collected an image data set but we don't have enough resources to label it all, just a small fraction of it. We can use the Barlow Twins loss on all the available training data (both labeled and unlabeled images) to train a feature extractor and learn meaningful representations for the data set. Once the feature extractor is trained, we proceed to train a linear multiclass SVM classifier on top of it using only the fraction of labeled data. */ #include #include #include #include #include using namespace std; using namespace dlib; // A custom definition of ResNet50 with a downsampling factor of 8 instead of 32. // It is essentially the original ResNet50, but without the max pooling and a // convolutional layer with a stride of 1 instead of 2 at the input. namespace resnet50 { using namespace dlib; template