What is the Nvidia Hopper Architecture?
The Nvidia Hopper architecture represents a significant leap in GPU technology, succeeding the Ampere architecture. Designed to accelerate AI workloads, particularly in data centers, Hopper GPUs like the H100 are engineered for high performance and efficiency. The architecture incorporates advancements such as enhanced Tensor Cores and improved interconnect technology, making it ideal for training and deploying large AI models.
The Hopper architecture includes the new chip H20.
This is the latest solution for Nvida to solve export restrictions in China. In a few words, H20 has cut back on only flops, but the interconnect bandwidth is the same.
At the time of its release, the A100 (Ampere) and the H100 (Hopper) were leading the pack. These GPUs, now succeeded by newer generations, remain powerful and capable, but are subject to export limitations that significantly influence their availability in specific regions, notably China.
H100 vs. H800: Key Differences
The H100 and H800 are both Hopper architecture-based GPUs, but they differ significantly due to export regulations. The H800 was created as a version of the H100 that complies with US export controls, primarily targeting the Chinese market.
- Computational Performance (FLOPS): The H800 maintains similar computational performance in FLOPS (Floating Point Operations Per Second) as the H100.
- Interconnect Bandwidth: The primary modification in the H800 is a reduction in interconnect bandwidth. This reduction was intended to keep the GPU within the permissible export limits set by the US government.
- High Flops, Low Communication: The H800 was designed with high floating point operations and low communication.

The H100 was able to use all the floating points operations. deepseek knew how to utilize the H800 and work around the bandwidth interconnection. The deep work of understanding how to do this, allowed the H800 chip to use the GPU fully.
Here's a summary of the key differences between H100 and H800:
Feature |
H100 |
H800 |
Architecture |
Nvidia Hopper |
Nvidia Hopper |
Target Market |
Global (subject to export controls) |
Primarily China |
Computational Power |
High |
High |
Interconnect Bandwidth |
Higher |
Lower |
Compliance |
Complies with US export regulations globally |
Complies with US export regulations for China |
Initial Export Restrictions and Two-Factor Scale
The initial export controls imposed by the US government were based on a two-factor Scale, considering both chip interconnect and FLOPS (Floating Point Operations per Second).
Any chip exceeding certain levels in both interconnects and FLOPS was restricted. This approach aimed to limit the overall capability of exported GPUs to perform advanced AI tasks.
- Chip Interconnect vs. FLOPS: The restrictions were based on the chip interconnection and flops floating-point operations of each chip.
- Government Realization and Revision: The government realized the initial two factor scale contained a flaw in the restrictions.
- Just Floating Point Operations: The restrictions were changed to just floating point operations.

The Move to Solely Floating-Point Operations (FLOPS) Restrictions
Recognizing a loophole in the initial restrictions, the US government revised its export controls to focus solely on floating-point operations (FLOPS). This meant that any chip exceeding a specified FLOPS threshold was subject to export limitations, irrespective of its interconnect bandwidth.
-
Limitation of floating-point operations This action limited exports depending only of floating point operations and not other factors such as chip interconnection.
-
H800 had high flops and low communication. High FLOPS, low communication.
-
The DeepSeek Workaround: This means that the H800 was not initially banned. However, the new export controls of Jan 13, 2025 banned the H800.
-
H20 is the latest GPU Since then the H20 is the latest GPU on the market now with the changes and ban of the H800 in Jan 13, 2025.
This change was intended to prevent companies from circumventing export controls by reducing interconnect bandwidth while maintaining high computational performance.