Based on the
Compact Convolutional Transformers
example on
keras.io
created by
Sayak Paul
.
Model description
As discussed in the
Vision Transformers (ViT)
paper, a Transformer-based architecture for vision typically requires a larger dataset than usual, as well as a longer pre-training schedule. ImageNet-1k (which has about a million images) is considered to fall under the medium-sized data regime with respect to ViTs. This is primarily because, unlike CNNs, ViTs (or a typical Transformer-based architecture) do not have well-informed inductive biases (such as convolutions for processing images). This begs the question: can't we combine the benefits of convolution and the benefits of Transformers in a single network architecture? These benefits include parameter-efficiency, and self-attention to process long-range and global dependencies (interactions between different regions in an image).
In the original paper, the authors use
AutoAugment
to induce stronger regularization. In this example, the standard geometric augmentations (like random cropping and flipping) are used.
The CCT model was trained for 30 epochs. Its plot in the 'Training Metrics' tab shows no signs of overfitting. This means that this network can be trained for longer (perhaps with a bit more regularization) and better performance may be obtained. This performance can further be improved by additional recipes like cosine decay learning rate schedule, other data augmentation techniques like AutoAugment, MixUp or Cutmix.
cct huggingface.co is an AI model on huggingface.co that provides cct's model effect (), which can be used instantly with this keras-io cct model. huggingface.co supports a free trial of the cct model, and also provides paid use of the cct. Support call cct model through api, including Node.js, Python, http.
cct huggingface.co is an online trial and call api platform, which integrates cct's modeling effects, including api services, and provides a free online trial of cct, you can try cct online for free by clicking the link below.
cct is an open source model from GitHub that offers a free installation service, and any user can find cct on GitHub to install. At the same time, huggingface.co provides the effect of cct install, users can directly use cct installed effect in huggingface.co for debugging and trial. It also supports api for free installation.