nvidia / Phi-4-multimodal-instruct-NVFP4

huggingface.co
Total runs: 1.6K
24-hour runs: 0
7-day runs: -82
30-day runs: 287
Model's Last Updated: September 05 2025

Introduction of Phi-4-multimodal-instruct-NVFP4

Model Details of Phi-4-multimodal-instruct-NVFP4

Model Overview

Description:

The NVIDIA Phi-4-multimodal-instruct FP4 model is the quantized version of Microsoft’s Phi-4-multimodal-instruct model, which is a multimodal foundation model that uses an optimized transformer architecture. For more information, please check here . The NVIDIA Phi-4-multimodal-instruct FP4 model is quantized with TensorRT Model Optimizer .

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA (Phi-4-multimodal-instruct) Model Card .

License/Terms of Use:

Use of this model is governed by nvidia-open-model-license ADDITIONAL INFORMATION: MIT_License .

Deployment Geography:

Global, except in European Union

Use Case:

Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications.

Release Date:

Huggingface 09/15/2025 via https://huggingface.co/nvidia/Phi-4-multimodal-instruct-FP4

Model Architecture:

Architecture Type: Transformers
Network Architecture: Phi4MMForCausalLM

* This model was developed based on Phi-4-multimodal-instruct ** Number of model parameters 5.6 10^9

Input:

Input Type(s): Text, image and speech
Input Format(s): String, Images (see properties), Soundfile
Input Parameters: One-Dimensional (1D), Two-Dimensional (2D), One-Dimensional (1D)
Other Properties Related to Input: Any common RGB/gray image format (e.g., (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", ".webp")) can be supported. Any audio format that can be loaded by soundfile package should be supported. Context length up to 128K

Output:

Output Type(s): Text
Output Format: String
Output Parameters: 1D (One-Dimensional): Sequences
Other Properties Related to Output: N/A

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Supported Runtime Engine(s):

  • TensorRT-LLM

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Blackwell

Preferred Operating System(s):

  • Linux
Model Version(s):

The model is quantized with nvidia-modelopt v0.35.0

Post Training Quantization

This model was obtained by quantizing the weights and activations of Phi-4-multimodal-instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized.

Training and Testing Datasets:

** Data Modality [Audio] [Image] [Text] ** Text Training Data Size [1 Billion to 10 Trillion Tokens] ** Audio Training Data Size [More than 1 Million Hours] ** Image Training Data Size [1 Billion to 10 Trillion image-text Tokens]

Calibration Dataset:

** Link: cnn_dailymail
** Data collection method: Automated.
** Labeling method: Automated.

Training Datasets:

** Data Collection Method by Dataset: Automated
** Labeling Method by Dataset: Human, Automated
** Properties: publicly available documents filtered for quality, selected high-quality educational data, and code

  • newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.)
  • high quality human labeled data in chat format
  • selected high-quality image-text interleave data
  • synthetic and publicly available image, multi-image, and video data
  • anonymized in-house speech-text pair data with strong/weak transcriptions
  • selected high-quality publicly available and anonymized in-house speech data with task-specific supervisions
  • selected synthetic speech data
  • synthetic vision-speech data
Testing Dataset:

** Data Collection Method by Dataset: Undisclosed
** Labeling Method by Dataset: Undisclosed
** Properties: Undisclosed

Inference:

Engine: TensorRT-LLM
Test Hardware: B200 coming soon
** Currently supported on DGX Spark

Usage
Deploy with TensorRT-LLM

To deploy the quantized checkpoint with TensorRT-LLM LLM API, follow the sample codes below:

  • LLM API sample usage:
from tensorrt_llm import LLM, SamplingParams


def main():

    prompts = [
        "Hello, my name is",
        "The president of the United States is",
        "The capital of France is",
        "The future of AI is",
    ]
    sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

    llm = LLM(model="nvidia/Phi-4-multimodal-instruct-FP4", trust_remote_code=True)

    outputs = llm.generate(prompts, sampling_params)

    # Print the outputs.
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")


# The entry point of the program needs to be protected for spawning processes.
if __name__ == '__main__':
    main()
Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Runs of nvidia Phi-4-multimodal-instruct-NVFP4 on huggingface.co

1.6K
Total runs
0
24-hour runs
0
3-day runs
-82
7-day runs
287
30-day runs

More Information About Phi-4-multimodal-instruct-NVFP4 huggingface.co Model

More Phi-4-multimodal-instruct-NVFP4 license Visit here:

https://choosealicense.com/licenses/nvidia-open-model-license

Phi-4-multimodal-instruct-NVFP4 huggingface.co

Phi-4-multimodal-instruct-NVFP4 huggingface.co is an AI model on huggingface.co that provides Phi-4-multimodal-instruct-NVFP4's model effect (), which can be used instantly with this nvidia Phi-4-multimodal-instruct-NVFP4 model. huggingface.co supports a free trial of the Phi-4-multimodal-instruct-NVFP4 model, and also provides paid use of the Phi-4-multimodal-instruct-NVFP4. Support call Phi-4-multimodal-instruct-NVFP4 model through api, including Node.js, Python, http.

Phi-4-multimodal-instruct-NVFP4 huggingface.co Url

https://huggingface.co/nvidia/Phi-4-multimodal-instruct-NVFP4

nvidia Phi-4-multimodal-instruct-NVFP4 online free

Phi-4-multimodal-instruct-NVFP4 huggingface.co is an online trial and call api platform, which integrates Phi-4-multimodal-instruct-NVFP4's modeling effects, including api services, and provides a free online trial of Phi-4-multimodal-instruct-NVFP4, you can try Phi-4-multimodal-instruct-NVFP4 online for free by clicking the link below.

nvidia Phi-4-multimodal-instruct-NVFP4 online free url in huggingface.co:

https://huggingface.co/nvidia/Phi-4-multimodal-instruct-NVFP4

Phi-4-multimodal-instruct-NVFP4 install

Phi-4-multimodal-instruct-NVFP4 is an open source model from GitHub that offers a free installation service, and any user can find Phi-4-multimodal-instruct-NVFP4 on GitHub to install. At the same time, huggingface.co provides the effect of Phi-4-multimodal-instruct-NVFP4 install, users can directly use Phi-4-multimodal-instruct-NVFP4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

Phi-4-multimodal-instruct-NVFP4 install url in huggingface.co:

https://huggingface.co/nvidia/Phi-4-multimodal-instruct-NVFP4

Url of Phi-4-multimodal-instruct-NVFP4

Phi-4-multimodal-instruct-NVFP4 huggingface.co Url

Provider of Phi-4-multimodal-instruct-NVFP4 huggingface.co

nvidia
ORGANIZATIONS

Other API from nvidia

huggingface.co

Total runs: 406.9K
Run Growth: 342.6K
Growth Rate: 83.78%
Updated:December 04 2025
huggingface.co

Total runs: 232.6K
Run Growth: 214.6K
Growth Rate: 92.28%
Updated:September 10 2025
huggingface.co

Total runs: 174.2K
Run Growth: 30.0K
Growth Rate: 17.33%
Updated:December 04 2025
huggingface.co

Total runs: 156.3K
Run Growth: 146.1K
Growth Rate: 93.49%
Updated:April 11 2026
huggingface.co

Total runs: 128.4K
Run Growth: 24.0K
Growth Rate: 18.70%
Updated:January 15 2025
huggingface.co

Total runs: 115.4K
Run Growth: -14.6K
Growth Rate: -12.62%
Updated:November 15 2023
huggingface.co

Total runs: 76.7K
Run Growth: 59.4K
Growth Rate: 77.48%
Updated:September 10 2025
huggingface.co

Total runs: 66.6K
Run Growth: 48.7K
Growth Rate: 73.13%
Updated:November 29 2025
huggingface.co

Total runs: 51.6K
Run Growth: -16.9K
Growth Rate: -32.65%
Updated:July 22 2025
huggingface.co

Total runs: 38.0K
Run Growth: 10.9K
Growth Rate: 28.62%
Updated:December 03 2025
huggingface.co

Total runs: 36.8K
Run Growth: 4.0K
Growth Rate: 10.85%
Updated:December 16 2025
huggingface.co

Total runs: 33.2K
Run Growth: 10.0K
Growth Rate: 30.20%
Updated:August 06 2022
huggingface.co

Total runs: 30.4K
Run Growth: 15.7K
Growth Rate: 51.63%
Updated:September 10 2025
huggingface.co

Total runs: 29.7K
Run Growth: 15.8K
Growth Rate: 53.28%
Updated:August 06 2022
huggingface.co

Total runs: 24.9K
Run Growth: 10.2K
Growth Rate: 40.86%
Updated:January 30 2026
huggingface.co

Total runs: 23.6K
Run Growth: -4.5K
Growth Rate: -19.11%
Updated:May 08 2025