huggingface.co
Total runs: 6
24-hour runs: 0
7-day runs: 0
30-day runs: -8
Model's Last Updated: September 04 2024

Introduction of Squid

Model Details of Squid

Squid: Long Context as a New Modality for on-device RAG

- Nexa Model Hub - ArXiv

nexa-octopus

Overview

Squid is a novel approach to accelerate language model inference by treating long context as a new modality, similar to image, audio, and video modalities in vision-language models. This innovative method incorporates a language encoder model to encode context information into embeddings, applying multimodal model concepts to enhance the efficiency of language model inference。 Below are model highlights:

  • 🧠 Context as a distinct modality
  • 🗜️ Language encoder for context compression
  • 🔗 Multimodal techniques applied to language processing
  • ⚡ Optimized for energy efficiency and on-device use
  • 📜 Specialized for long context understanding
Model Architecture

Squid employs a decoder-decoder framework with two main components:

  1. A smaller decoder (0.5B parameters) for transforming information from extensive contexts
  2. A larger decoder (7B parameters) for comprehending and generating responses to current queries
  3. The architecture also includes a projector to align embeddings between the text encoder and the main decoder.

Model Architecture

Running the Model
Method 1

download this repository and run the following commands:

git lfs install
git clone https://huggingface.co/NexaAIDev/Squid
python inference_example.py
Method 2

Install nexaai-squid package

pip install nexaai-squid

Then run the following commands:

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch
from squid.configuration_squid import SquidConfig
from squid.modeling_squid import SquidForCausalLM


def inference_instruct(mycontext, question, device="cuda:0"):
    import time
    MEMORY_SIZE = 32
    start_time = time.time()
    generated_token_ids = []
    prompt = f" <context>{question}"
    text_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split("<context>")]
    input_ids = (
        torch.tensor(
            text_chunks[0] + [-1] * MEMORY_SIZE + text_chunks[1], dtype=torch.long
        )
        .unsqueeze(0)
        .to(device)
    )
    context_tokenized = tokenizer(
        mycontext + "".join([f"[memory_{i}]" for i in range(MEMORY_SIZE)]),
        return_tensors="pt",
    )
    context_tokenized = {k: v.to(device) for k, v in context_tokenized.items()}
    context_token_count = (context_tokenized["input_ids"]).shape[1] - MEMORY_SIZE
    for i in range(context_token_count):
        next_token = (
            model(
                input_ids,
                context_input_ids=context_tokenized["input_ids"],
                context_attention_mask=context_tokenized["attention_mask"],
            )
            .logits[:, -1]
            .argmax(-1)
        )
        if next_token.item() == 151643:
            break
        generated_token_ids.append(next_token.item())
        input_ids = torch.cat([input_ids, next_token.unsqueeze(1)], dim=-1)
    result = tokenizer.decode(generated_token_ids)
    print(f"Time taken: {time.time() - start_time}")
    return result


if __name__ == "__main__":
    device_name = "cuda:0" if torch.cuda.is_available() else "cpu"
    AutoConfig.register("squid", SquidConfig)
    AutoModelForCausalLM.register(SquidConfig, SquidForCausalLM)
    tokenizer = AutoTokenizer.from_pretrained('NexaAIDev/Squid')
    model = AutoModelForCausalLM.from_pretrained('NexaAIDev/Squid', trust_remote_code=True, torch_dtype=torch.bfloat16, device_map=device_name)
    
    # Run inference example
    mycontext = "Nexa AI is a Cupertino-based company founded in May 2023 that researches and develops models and tools for on-device AI applications. The company is founded by Alex and Zack. The company is known for its Octopus-series models, which rival large-scale language models in capabilities such as function-calling, multimodality, and action-planning, while remaining efficient and compact for edge device deployment. Nexa AI's mission is to advance on-device AI in collaboration with the global developer community. To this end, the company has created an on-device model hub for users to find, share, and collaborate on open-source AI models optimized for edge devices, as well as an SDK for developers to run and deploy AI models locally"
    question = "Who founded Nexa AI?"
    result = inference_instruct(mycontext, question, device=device_name)
    print("Result:", result)
Training Process

Squid's training involves three stages:

  1. Restoration Training: Reconstructing original context from compressed embeddings
  2. Continual Training: Generating context continuations from partial compressed contexts
  3. Instruction Fine-tuning: Generating responses to queries given compressed contexts

This multi-stage approach progressively enhances the model's ability to handle long contexts and generate appropriate responses.

Citation

If you use Squid in your research, please cite our paper:

@article{chen2024squidlongcontextnew,
      title={Squid: Long Context as a New Modality for Energy-Efficient On-Device Language Models}, 
      author={Wei Chen and Zhiyuan Li and Shuo Xin and Yihao Wang},
      year={2024},
      eprint={2408.15518},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.15518}, 
}
Contact

For questions or feedback, please contact us

Runs of NexaAI Squid on huggingface.co

6
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
-8
30-day runs

More Information About Squid huggingface.co Model

Squid huggingface.co

Squid huggingface.co is an AI model on huggingface.co that provides Squid's model effect (), which can be used instantly with this NexaAI Squid model. huggingface.co supports a free trial of the Squid model, and also provides paid use of the Squid. Support call Squid model through api, including Node.js, Python, http.

NexaAI Squid online free

Squid huggingface.co is an online trial and call api platform, which integrates Squid's modeling effects, including api services, and provides a free online trial of Squid, you can try Squid online for free by clicking the link below.

NexaAI Squid online free url in huggingface.co:

https://huggingface.co/NexaAI/Squid

Squid install

Squid is an open source model from GitHub that offers a free installation service, and any user can find Squid on GitHub to install. At the same time, huggingface.co provides the effect of Squid install, users can directly use Squid installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Squid install url in huggingface.co:

https://huggingface.co/NexaAI/Squid

Url of Squid

Squid huggingface.co Url

Provider of Squid huggingface.co

NexaAI
ORGANIZATIONS

Other API from NexaAI

huggingface.co

Total runs: 2.7K
Run Growth: 1.2K
Growth Rate: 46.54%
Updated:November 14 2025
huggingface.co

Total runs: 1.9K
Run Growth: -261
Growth Rate: -13.38%
Updated:December 14 2024
huggingface.co

Total runs: 1.4K
Run Growth: -504
Growth Rate: -36.26%
Updated:August 21 2025
huggingface.co

Total runs: 1.3K
Run Growth: -484
Growth Rate: -38.05%
Updated:August 09 2025
huggingface.co

Total runs: 486
Run Growth: 0
Growth Rate: 0.00%
Updated:July 22 2025
huggingface.co

Total runs: 314
Run Growth: -156
Growth Rate: -49.68%
Updated:May 21 2024
huggingface.co

Total runs: 79
Run Growth: 32
Growth Rate: 40.51%
Updated:November 07 2025
huggingface.co

Total runs: 78
Run Growth: -51
Growth Rate: -65.38%
Updated:September 28 2025
huggingface.co

Total runs: 74
Run Growth: 4
Growth Rate: 5.41%
Updated:July 22 2025
huggingface.co

Total runs: 58
Run Growth: 40
Growth Rate: 68.97%
Updated:December 03 2025
huggingface.co

Total runs: 30
Run Growth: -2
Growth Rate: -6.67%
Updated:May 05 2024
huggingface.co

Total runs: 19
Run Growth: 8
Growth Rate: 42.11%
Updated:December 17 2025
huggingface.co

Total runs: 10
Run Growth: -2
Growth Rate: -20.00%
Updated:January 10 2026
huggingface.co

Total runs: 9
Run Growth: 1
Growth Rate: 11.11%
Updated:December 16 2025
huggingface.co

Total runs: 8
Run Growth: -6
Growth Rate: -66.67%
Updated:November 07 2025
huggingface.co

Total runs: 5
Run Growth: -5
Growth Rate: -100.00%
Updated:January 13 2026
huggingface.co

Total runs: 4
Run Growth: 0
Growth Rate: 0.00%
Updated:January 16 2026
huggingface.co

Total runs: 3
Run Growth: 1
Growth Rate: 33.33%
Updated:January 16 2026