In a nutshell, the paper indicates that a
Deep-Narrow
model architecture is favorable for
downstream
performance compared to other model architectures
of similar parameter count.
To quote the paper:
We generally recommend a DeepNarrow strategy where the model’s depth is preferentially increased
before considering any other forms of uniform scaling across other dimensions. This is largely due to
how much depth influences the Pareto-frontier as shown in earlier sections of the paper. Specifically, a
tall small (deep and narrow) model is generally more efficient compared to the base model. Likewise,
a tall base model might also generally more efficient compared to a large model. We generally find
that, regardless of size, even if absolute performance might increase as we continue to stack layers,
the relative gain of Pareto-efficiency diminishes as we increase the layers, converging at 32 to 36
layers. Finally, we note that our notion of efficiency here relates to any one compute dimension, i.e.,
params, FLOPs or throughput (speed). We report all three key efficiency metrics (number of params,
FLOPS and speed) and leave this decision to the practitioner to decide which compute dimension to
consider.
To be more precise,
model depth
is defined as the number of transformer blocks that are stacked sequentially.
A sequence of word embeddings is therefore processed sequentially by each transformer block.
Details model architecture
This model checkpoint -
t5-efficient-small-dl4
- is of model type
Small
with the following variations:
dl
is
4
It has
52.13
million parameters and thus requires
ca.
208.51 MB
of memory in full precision (
fp32
)
or
104.25 MB
of memory in half precision (
fp16
or
bf16
).
A summary of the
original
T5 model architectures can be seen here:
Model
nl (el/dl)
ff
dm
kv
nh
#Params
Tiny
4/4
1024
256
32
4
16M
Mini
4/4
1536
384
32
8
31M
Small
6/6
2048
512
32
8
60M
Base
12/12
3072
768
64
12
220M
Large
24/24
4096
1024
64
16
738M
Xl
24/24
16384
1024
128
32
3B
XXl
24/24
65536
1024
128
128
11B
whereas the following abbreviations are used:
Abbreviation
Definition
nl
Number of transformer blocks (depth)
dm
Dimension of embedding vector (output vector of transformers block)
kv
Dimension of key/value projection matrix
nh
Number of attention heads
ff
Dimension of intermediate vector within transformer block (size of feed-forward projection matrix)
el
Number of transformer blocks in the encoder (encoder depth)
dl
Number of transformer blocks in the decoder (decoder depth)
sh
Signifies that attention heads are shared
skv
Signifies that key-values projection matrices are tied
If a model checkpoint has no specific,
el
or
dl
than both the number of encoder- and decoder layers correspond to
nl
.
Note
: This model is a
pretrained
checkpoint and has to be fine-tuned for practical usage.
The checkpoint was pretrained in English and is therefore only useful for English NLP tasks.
You can follow on of the following examples on how to fine-tune the model:
Text Classification
-
Note
: You will have to slightly adapt the training example here to make it work with an encoder-decoder model.
Downstream Performance
TODO: Add table if available
Computational Complexity
TODO: Add table if available
More information
We strongly recommend the reader to go carefully through the original paper
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
to get a more nuanced understanding of this model checkpoint.
As explained in the following
issue
, checkpoints including the
sh
or
skv
model architecture variations have
not
been ported to Transformers as they are probably of limited practical usage and are lacking a more detailed description. Those checkpoints are kept
here
as they might be ported potentially in the future.
Runs of google t5-efficient-small-dl4 on huggingface.co
7
Total runs
0
24-hour runs
0
3-day runs
4
7-day runs
-5
30-day runs
More Information About t5-efficient-small-dl4 huggingface.co Model
t5-efficient-small-dl4 huggingface.co is an AI model on huggingface.co that provides t5-efficient-small-dl4's model effect (), which can be used instantly with this google t5-efficient-small-dl4 model. huggingface.co supports a free trial of the t5-efficient-small-dl4 model, and also provides paid use of the t5-efficient-small-dl4. Support call t5-efficient-small-dl4 model through api, including Node.js, Python, http.
t5-efficient-small-dl4 huggingface.co is an online trial and call api platform, which integrates t5-efficient-small-dl4's modeling effects, including api services, and provides a free online trial of t5-efficient-small-dl4, you can try t5-efficient-small-dl4 online for free by clicking the link below.
google t5-efficient-small-dl4 online free url in huggingface.co:
t5-efficient-small-dl4 is an open source model from GitHub that offers a free installation service, and any user can find t5-efficient-small-dl4 on GitHub to install. At the same time, huggingface.co provides the effect of t5-efficient-small-dl4 install, users can directly use t5-efficient-small-dl4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
t5-efficient-small-dl4 install url in huggingface.co: