Experimental research model made to test our Reactive Transformer architecture and Attention-based Memory System.
Reactive Transformer has additional Short-Term Memory layers, connected to model with Memory Cross-Attention, and updated by Memory Encoder and Memory Attention.
Short-Term Memory state is kept between interactions/event (single message), not between tokens in sequence - that's key difference between RxNNs and RNNs.
The goal of the architecture is to process only single messages and keep conversation history in Short-Term Memory - we believe, that this is the key requirement
for awareness and AGI. Processing all the chat history on every interaction is not natural and that's not how human awareness is working. Then, Reactive Transformer
architecture is a first step in transition from language models to awareness models.
This model (encoder) is a memory encoder for Reactive Transformer system and is made for first stage of training - base model pre-training.
During first stage, Memory Cross-Attention layers are frozen and STM is in default initial random state (normal distribution with 0 mean and almost 0 variance),
to not disturb basic language modelling training. We are training decoder and encoder separately with shared embeddings. Then, in second stage, we are fine-tuning models
to interaction format (processing single messages), before the third stage - Memory Reinforcement Learning, they will be connected into bigger ensemble with
additional Memory Norm and Memory Attention layers, and will learn how to keep and update memory.
RxT-Alpha models intentionally use very short sequence length and STM size (1024 tokens for Mini), but that isn't their "full" context size - it's only for single
message. "Full" context is theoretically infinite, restricted by STM size and memory abilites. That sizes are good for research, final models will handle SOTA contexts.
Compared to decoder, encoder is using dense model, while decoder is Mixture-of-Experts (~6x bigger)
RxT-Alpha Mini Training
Mini models from RxT-Alpha series are the second PoC for Reactive Transformer, Attention-Based Memory System and Memory Reinforcement Learning.
Encoder was trained on Masked Language Modelling task with additional MLM head model
RxT-Alpha-Mini-MLM
,
with Fineweb/Fineweb-edu/Wikipedia datasets, using
20B total tokens
and reached
~62% accuracy
.
RxT-Alpha-Mini-Encoder huggingface.co is an AI model on huggingface.co that provides RxT-Alpha-Mini-Encoder's model effect (), which can be used instantly with this ReactiveAI RxT-Alpha-Mini-Encoder model. huggingface.co supports a free trial of the RxT-Alpha-Mini-Encoder model, and also provides paid use of the RxT-Alpha-Mini-Encoder. Support call RxT-Alpha-Mini-Encoder model through api, including Node.js, Python, http.
RxT-Alpha-Mini-Encoder huggingface.co is an online trial and call api platform, which integrates RxT-Alpha-Mini-Encoder's modeling effects, including api services, and provides a free online trial of RxT-Alpha-Mini-Encoder, you can try RxT-Alpha-Mini-Encoder online for free by clicking the link below.
ReactiveAI RxT-Alpha-Mini-Encoder online free url in huggingface.co:
RxT-Alpha-Mini-Encoder is an open source model from GitHub that offers a free installation service, and any user can find RxT-Alpha-Mini-Encoder on GitHub to install. At the same time, huggingface.co provides the effect of RxT-Alpha-Mini-Encoder install, users can directly use RxT-Alpha-Mini-Encoder installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
RxT-Alpha-Mini-Encoder install url in huggingface.co: