Experimental research model made to test our Reactive Transformer architecture and Attention-based Memory System.
Reactive Transformer has additional Short-Term Memory layers, connected to model with Memory Cross-Attention, and updated by Memory Encoder and Memory Attention.
Short-Term Memory state is kept between interactions/event (single message), not between tokens in sequence - that's key difference between RxNNs and RNNs.
The goal of the architecture is to process only single messages and keep conversation history in Short-Term Memory - we believe, that this is the key requirement
for awareness and AGI. Processing all the chat history on every interaction is not natural and that's not how human awareness is working. Then, Reactive Transformer
architecture is a first step in transition from language models to awareness models.
This model (decoder) is the fine-tuned generator decoder for Reactive Transformer system, trained to process/generate single interactions (sequences) in real-time.
Decoder is based on Mixture-of-Experts architecture with 12 experts and 2 active ones.
Same as in the first stage, in the second stage (Supervised Fine-Tuning) Memory Cross-Attention layers are frozen and STM is in default initial random
state (normal distribution with 0 mean and almost 0 variance), to not disturb interaction query-answer modeling. We are training decoder and encoder
separately, using shared embeddings from encoder training. Then, in third stage - Memory Reinforcement Learning, they will be connected into bigger
ensemble with additional Memory Norm and Memory Attention layers, and will learn how to keep and update memory.
RxT-Alpha models intentionally use very short sequence length and STM size (256 tokens for Micro), but that isn't their "full" context size - it's only for single
message. "Full" context is theoretically infinite, restricted by STM size and memory abilites. That sizes are good for research, final models will handle SOTA contexts.
RxT-Alpha Micro Training
Micro models from RxT-Alpha series are first PoC for Reactive Transformer, Attention-Based Memory System and Memory Reinforcement Learning,
used mainly to test library and architecture basics, before training bigger models (that are still relatively small, as it's PoC).
Decoder was trained on Autoregressive Language Modelling task with embedding from
encoder pre-training
,
with
roneneldan/TinyStories
dataset, using
2.5B total tokens
and reached
~70.7% accuracy
.
Supervised Fine-Tuning
RxT-Alpha-Micro
models were fine-tuned to generate real-time interactions (sequences) on our synthetic dataset, inspired by TinyStories -
ReactiveAI/TinyStories-QA-SFT-v2
.
Decoder reached the best validation loss and best train/valid loss ratio after 12 from 30 epochs (~85M processed tokens)
Details
GPU: 1x L4
epochs: 12/30 (early stoppage)
lr: 3e-4 peak, cosine annealing schedule
batch size: 128
processed tokens: ~85M
loss: 0.6846 (validation) / 0.6555 (train)
accuracy:
84.3%
Next Stage: Memory Reinforcement Learning
The model is able to generate meaningful interactions, using grammatically correct sentences, and is ready for the memory training in the next stage. More info soon.
Install Flash Attention (optional, but recommended) - details in
RxNN framework docs
import torch
from rxnn.rxt.models import RxTAlphaDecoder
from rxnn.transformers.sampler import Sampler, SampleDecoder
from rxnn.training.tokenizer import load_tokenizer_from_hf_hub
model = RxTAlphaDecoder.from_pretrained('ReactiveAI/RxT-Alpha-Micro-Decoder-SFT')
tokenizer = load_tokenizer_from_hf_hub('ReactiveAI/RxT-Alpha-Micro-Decoder-SFT')
sampler = Sampler(model, torch.device('cuda'if torch.cuda.is_available() else'cpu'), end_token_id=3)
sample = SampleDecoder(sampler, tokenizer)
# 0.1 and 0.9 are default values for temperature and top_p
generated = sample('[Q] Tell me a story about a little black dog [A]', temperature=0.1, top_p=0.9, max_seq_len=256)
sample('[Q] Tell me a story about a little black dog [A]', temperature=0.1, top_p=0.9, max_seq_len=256, print_stream=True)
Runs of ReactiveAI RxT-Alpha-Micro-Decoder-SFT on huggingface.co
0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
-20
30-day runs
More Information About RxT-Alpha-Micro-Decoder-SFT huggingface.co Model
More RxT-Alpha-Micro-Decoder-SFT license Visit here:
RxT-Alpha-Micro-Decoder-SFT huggingface.co is an AI model on huggingface.co that provides RxT-Alpha-Micro-Decoder-SFT's model effect (), which can be used instantly with this ReactiveAI RxT-Alpha-Micro-Decoder-SFT model. huggingface.co supports a free trial of the RxT-Alpha-Micro-Decoder-SFT model, and also provides paid use of the RxT-Alpha-Micro-Decoder-SFT. Support call RxT-Alpha-Micro-Decoder-SFT model through api, including Node.js, Python, http.
RxT-Alpha-Micro-Decoder-SFT huggingface.co is an online trial and call api platform, which integrates RxT-Alpha-Micro-Decoder-SFT's modeling effects, including api services, and provides a free online trial of RxT-Alpha-Micro-Decoder-SFT, you can try RxT-Alpha-Micro-Decoder-SFT online for free by clicking the link below.
ReactiveAI RxT-Alpha-Micro-Decoder-SFT online free url in huggingface.co:
RxT-Alpha-Micro-Decoder-SFT is an open source model from GitHub that offers a free installation service, and any user can find RxT-Alpha-Micro-Decoder-SFT on GitHub to install. At the same time, huggingface.co provides the effect of RxT-Alpha-Micro-Decoder-SFT install, users can directly use RxT-Alpha-Micro-Decoder-SFT installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
RxT-Alpha-Micro-Decoder-SFT install url in huggingface.co: