World's first experimental real-time
Reactive Language Model (RxLM)
trained on limited real-world data (after synthetic
RxT-Alpha generation). It's based on revolutionary
Reactive Transformer
architecture - processing only single
interactions/messages, with all the context moved to
Short-Term Memory
, managed by
Attention-Based Memory System
.
Docs in progress
Model Details
Model Description
First
Reactive Language Model (RxLM)
trained on limited real-world datasets, based on
Reactive Transformer (RxT)
architecture
RxLMs
have linear computational/inference cost scaling (
O(NT)
) compared to
LLMs
quadratic growth (
O(N²T)
),
where
N
is the number of messages in conversation and
T
is the number of tokens in single interaction. Thanks to that
scaling, they are just
N
times faster and cheaper than
LLMs
.
That's not all from the advantages - event-driven real-time processing with memory is a lot more natural and human-like,
than LLMs data-driven approach (processing full conversation history everytime). It's a crucial milestone in development
of AGI and awareness models.
This is
Supervised
version of the model with "weak" memory system - result of Supervised Memory System Training (SMST). It's
able to remember information between interactions (without passing it explicitly in prompt/chat template), but it
has to be refined in next Memory Reinforcement Learning (MRL) stage for full functionality.
After successful experiments with simple synthetic datasets, we moved to real-world data, but this model still had limited
amount of english-only data for pre-training - only 10B tokens from Wikipedia and FineWeb-Edu (+2B tokens in later stages).
Then it could have limited general knowledge and should be fine-tuned for some specialization - for example, we trained
RxT-Beta-Micro-Supervised-AI
on AI/Data Science knowledge
based chats.
Reactive Transformer Architecture
Experimental research model made to test our Reactive Transformer architecture and Attention-based Memory System.
Reactive Transformer has additional Short-Term Memory layers, connected to model with Memory Cross-Attention, and updated by Memory Encoder and Memory Attention.
Short-Term Memory state is kept between interactions/event (single message), not between tokens in sequence - that's key difference between RxNNs and RNNs.
The goal of the architecture is to process only single messages and keep conversation history in Short-Term Memory - we believe, that this is the key requirement
for awareness and AGI. Processing all the chat history on every interaction is not natural and that's not how human awareness is working. Then, Reactive Transformer
architecture is a first step in transition from language models to awareness models.
To balance number of the parameters, decoder is based on Mixture-of-Experts architecture, while the encoder is using regular
dense feed forward layers. This model is using gated self/interlayer version of memory attention network with sigmoid residual gates.
This model is still experimental and it was pre-trained on limited corpus with only 10B tokens, so it's general knowledge is also limited. It's recommended
to further fine-tune the model for some specialization, like our
RxT-Beta-Micro-Supervised-AI
,
that's trained on AI/Data Science based conversations.
Supervised
RxT models are partially functional intermediate stage models - it's recommended to refine them in Memory Reinforcement Learning (MRL) and Reactive
Reinforcement Learning from Human Feedback (RxRLHF) to reach final stage.
Direct Use
It's not recommended to use this model directly without additional specialization training or reinforcement learning stages.
Reactive Transformer
models are made for conversational tasks, especially chatbots or as a stateful base for agentic systems.
Downstream Use
It's recommended to further fine-tune the model for some specialization, because of limited pre-training data. For the example,
we trained
RxT-Beta-Micro-Supervised-AI
Out-of-Scope Use
Reactive Transformer
models are natively conversational and made for multi-step tasks. They aren't typical Gen AI and aren't made
for single-step generative tasks (like summarization, dataset generation, etc.) - they will work in those scenarios, but it will be waste
of computational resources (initializing/processing memory, when it's not needed). For that case it's better to use stateless LLM.
Bias, Risks, and Limitations
The model is still experimental, made to test
Reactive Transformer
architecture on real-world data, after succesful experiments with simple synthetic data.
It was pre-trained on 10B tokens only (and additional 2B in next stages), so it's general knowledge is limited and responses could be inaccurate.
Conversation context is theoretically infinite (1024 tokens limit is only for single interaction), but after some number of messages model will slowly forget
outdated information - that's why it's called
Short-Term Memory
. It will be extended in upcoming generations with
Long-Term Memory
for true infinite context.
Recommendations
As mentioned before, supervised models are in intermediate stage and it's recommended to continue the training in reinforcement learning stages. It's also recommended
to fine-tune this base model for some specialization.
import torch
from rxlm.rxt.models import RxTBeta
from rxlm.training.tokenizer import load_tokenizer_from_hf_hub
tokenizer = load_tokenizer_from_hf_hub('ReactiveAI/RxT-Beta-Micro')
model = RxTBeta.from_pretrained('RxT-Beta-Micro-Supervised', tokenizer=tokenizer)
model.share_components() # currently required to connect embeddings/STM
device = torch.device('cuda'if torch.cuda.is_available() else'cpu')
model.to(device)
seq_len = 1024# Memory init - could be used as "system prompt" in LLMs (not recommended in this model, as it wasn't trained with system prompts)
stm_init_state = model.tokenize_full_interaction('System prompt like', 'Initial memory for the model', max_seq_len=seq_len, device=device)
model.init_stm_state(**stm_init_state)
# Helper functiondefinteraction(query: str):
tokenized_query = model.tokenize_query(query, max_seq_len=seq_len, device=device)
for token_id in model.interact(**tokenized_query, max_seq_len=seq_len, temperature=1.0):
if token_id == -1: print('\n', '[Start memory update...]')
elif token_id == -2: print('[Memory updated]')
else:
txt_token = model.stringify_token(token_id)
print(txt_token, end='')
# Process first interaction
interaction('Hello! Who are you?')
# Process follow-up interaction
interaction('Follow-up question?')
Training Details
Stateful & real-time nature of
Reactive Transformer
architecture, especially asynchronous memory update, requires advanced training pipeline with multiple
supervised and reinforcement learning stages:
Supervised:
Joint Language Models Pre-Training | raw large text corpora
Interaction Supervised Fine-Tuning | single, not connected interactions (query + answer)
Supervised Memory System Training includes 4 steps, before proceeding to Reinforcement Learning stages.
Joint Language Models Pre-Training
Decoder was trained with Encoder and additional MLM head model, using Joint LM Training (with MLM and Autoregressive loss),
using
HuggingFaceFW/fineweb-edu
and
wikimedia/wikipedia
datasets.
Both encoder and decoder are using shared embedding layer
Supervised Fine-Tuning
RxT-Beta Micro
model was fine-tuned to real-time interactions (sequences) format on our datasets, derived from HuggingFace ones:
Models were fine-tuned using Joint LM Training mode (for memory cross-attention pre-training):
encode data with encoder and calculate MLM loss for it
save encoder layer's results as Short-Term Memory (available for decoder by memory cross-attention)
process data with decoder and calculate autoregressive loss
That training results in decoder with ~95% accuracy, because it has access to all next tokens information with memory cross-attention. In next training stages it
will access previous interactions data with those layers.
Self-Supervised Memory Attention Pre-Training
Memory Attention was pre-trained to combine accumulated Short-Term Memory states with next interaction data processed by the
encoder, using weighted mean (with randomized arbitrary weights) as labels and negative cosine similarity as loss. Label weights
depending on inner step:
first step, when STM is in initial random normal state, using 90% of new encoded data
follow-up steps are using
50% - step * 5%
of new encoded data
each step could have 0-15% random differences in weights
Additionally, random noise is added to both inputs and labels.
This model was trained on six arbitrary selected steps using single epoch on 30% from
ReactiveAI/Real-Chat-SMAT
dataset.
Supervised Memory-Aware Training
Finally, with pre-trained/fine-tuned components, in last supervised stage, model is trained to use previous/accumulated STM
states as memory cross-attention input, instead of the same sequences as decoder's input:
previous (or first) interaction is processed by encoder and used to update memory
next interaction is processed by decoder, using related information from STM
loss is calculated from decoder's logits and gradients propagate through memory attention to encoder
We used staged memory-aware training with different datasets:
Pre-training is done on raw text corpora and it require only tokenization. In next stages, model is processing sequences in simple
Interaction format
, that's used
instead complex chat templates -
[Q] User's query... [A] Model's answer
. For upcoming reasoning models, it will be extended to
[Q] User's query... [T] Reasoning... [A] Model's answer
Training Hyperparameters
Training regime:
bf16 mixed precision (AMP autocast)
RxT-Beta-Micro-Supervised huggingface.co is an AI model on huggingface.co that provides RxT-Beta-Micro-Supervised's model effect (), which can be used instantly with this ReactiveAI RxT-Beta-Micro-Supervised model. huggingface.co supports a free trial of the RxT-Beta-Micro-Supervised model, and also provides paid use of the RxT-Beta-Micro-Supervised. Support call RxT-Beta-Micro-Supervised model through api, including Node.js, Python, http.
RxT-Beta-Micro-Supervised huggingface.co is an online trial and call api platform, which integrates RxT-Beta-Micro-Supervised's modeling effects, including api services, and provides a free online trial of RxT-Beta-Micro-Supervised, you can try RxT-Beta-Micro-Supervised online for free by clicking the link below.
ReactiveAI RxT-Beta-Micro-Supervised online free url in huggingface.co:
RxT-Beta-Micro-Supervised is an open source model from GitHub that offers a free installation service, and any user can find RxT-Beta-Micro-Supervised on GitHub to install. At the same time, huggingface.co provides the effect of RxT-Beta-Micro-Supervised install, users can directly use RxT-Beta-Micro-Supervised installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
RxT-Beta-Micro-Supervised install url in huggingface.co: