The NVIDIA Phi-4-multimodal-instruct FP4 model is the quantized version of Microsoft’s Phi-4-multimodal-instruct model, which is a multimodal foundation model that uses an optimized transformer architecture. For more information, please check
here
. The NVIDIA Phi-4-multimodal-instruct FP4 model is quantized with
TensorRT Model Optimizer
.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA
(Phi-4-multimodal-instruct) Model Card
.
Developers looking to take off the shelf pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications.
*
This model was developed based on Phi-4-multimodal-instruct
** Number of model parameters 5.6
10^9
Input:
Input Type(s):
Text, image and speech
Input Format(s):
String, Images (see properties), Soundfile
Input Parameters:
One-Dimensional (1D), Two-Dimensional (2D), One-Dimensional (1D)
Other Properties Related to Input:
Any common RGB/gray image format (e.g., (".jpg", ".jpeg", ".png", ".ppm", ".bmp", ".pgm", ".tif", ".tiff", ".webp")) can be supported. Any audio format that can be loaded by soundfile package should be supported. Context length up to 128K
Output:
Output Type(s):
Text
Output Format:
String
Output Parameters:
1D (One-Dimensional): Sequences
Other Properties Related to Output:
N/A
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
The model is quantized with nvidia-modelopt
v0.35.0
Post Training Quantization
This model was obtained by quantizing the weights and activations of Phi-4-multimodal-instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks of the language model are quantized.
Training and Testing Datasets:
** Data Modality
[Audio]
[Image]
[Text]
** Text Training Data Size
[1 Billion to 10 Trillion Tokens]
** Audio Training Data Size
[More than 1 Million Hours]
** Image Training Data Size
[1 Billion to 10 Trillion image-text Tokens]
** Data Collection Method by Dataset: Automated
** Labeling Method by Dataset: Human, Automated
** Properties: publicly available documents filtered for quality, selected high-quality educational data, and code
newly created synthetic, “textbook-like” data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (e.g., science, daily activities, theory of mind, etc.)
high quality human labeled data in chat format
selected high-quality image-text interleave data
synthetic and publicly available image, multi-image, and video data
anonymized in-house speech-text pair data with strong/weak transcriptions
selected high-quality publicly available and anonymized in-house speech data with task-specific supervisions
selected synthetic speech data
synthetic vision-speech data
Testing Dataset:
** Data Collection Method by Dataset: Undisclosed
** Labeling Method by Dataset: Undisclosed
** Properties: Undisclosed
Inference:
Engine:
TensorRT-LLM
Test Hardware:
B200 coming soon
** Currently supported on DGX Spark
Usage
Deploy with TensorRT-LLM
To deploy the quantized checkpoint with
TensorRT-LLM
LLM API, follow the sample codes below:
LLM API sample usage:
from tensorrt_llm import LLM, SamplingParams
def main():
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="nvidia/Phi-4-multimodal-instruct-FP4", trust_remote_code=True)
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
# The entry point of the program needs to be protected for spawning processes.
if __name__ == '__main__':
main()
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
Runs of nvidia Phi-4-multimodal-instruct-NVFP4 on huggingface.co
1.6K
Total runs
0
24-hour runs
0
3-day runs
-82
7-day runs
287
30-day runs
More Information About Phi-4-multimodal-instruct-NVFP4 huggingface.co Model
More Phi-4-multimodal-instruct-NVFP4 license Visit here:
Phi-4-multimodal-instruct-NVFP4 huggingface.co is an AI model on huggingface.co that provides Phi-4-multimodal-instruct-NVFP4's model effect (), which can be used instantly with this nvidia Phi-4-multimodal-instruct-NVFP4 model. huggingface.co supports a free trial of the Phi-4-multimodal-instruct-NVFP4 model, and also provides paid use of the Phi-4-multimodal-instruct-NVFP4. Support call Phi-4-multimodal-instruct-NVFP4 model through api, including Node.js, Python, http.
Phi-4-multimodal-instruct-NVFP4 huggingface.co is an online trial and call api platform, which integrates Phi-4-multimodal-instruct-NVFP4's modeling effects, including api services, and provides a free online trial of Phi-4-multimodal-instruct-NVFP4, you can try Phi-4-multimodal-instruct-NVFP4 online for free by clicking the link below.
nvidia Phi-4-multimodal-instruct-NVFP4 online free url in huggingface.co:
Phi-4-multimodal-instruct-NVFP4 is an open source model from GitHub that offers a free installation service, and any user can find Phi-4-multimodal-instruct-NVFP4 on GitHub to install. At the same time, huggingface.co provides the effect of Phi-4-multimodal-instruct-NVFP4 install, users can directly use Phi-4-multimodal-instruct-NVFP4 installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
Phi-4-multimodal-instruct-NVFP4 install url in huggingface.co: