QAI-h1290FX
A GPU-ready edge AI storage server supporting NVIDIA® GPUs, U.2 NVMe SSDs, and 25GbE connectivity—designed for on-premises AI, virtualization, and compute-intensive workloads.
QAI-h1290FX is a desktop-class edge compute and storage convergence server that combines high-performance computing architecture with ultra-fast storage. It supports configurable NVIDIA® RTX™ PRO Blackwell GPUs, making it ideal for on-premises AI, LLM inference, private RAG search, virtualization, and other demanding compute workloads.
Powered by QuTS hero with the ZFS file system, the platform delivers enterprise-grade data integrity and consistent performance. Whether for AI deployment, research and development, high-performance computing, or enterprise virtualization environments, QAI-h1290FX enables flexible configuration and rapid deployment, ensuring critical workloads run securely and efficiently at the edge.



GPU-Ready Architecture with RTX PRO Blackwell Support
Built with a GPU-ready design, supporting NVIDIA® RTX™ PRO Blackwell GPUs, including options such as the RTX PRO 6000 Blackwell Max-Q Workstation, to meet the demands of AI workloads, image generation, inference, and GPU-accelerated computing.
High-Speed All-Flash NVMe Storage Architecture
Equipped with 12 U.2 NVMe SSD bays and support for SATA SSDs, allowing flexible storage configurations optimized for performance, capacity, or cost. Ideal for AI workloads, virtualization, and real-time data processing.
On-Premise LLM & RAG Search
Enables local deployment of private LLMs and RAG-based search, providing secure semantic document retrieval without sending sensitive data to the cloud.
ZFS-based QuTS hero OS
Powered by QuTS hero with ZFS, offering inline compression, self-healing, snapshots, and SnapSync for enterprise-grade data integrity.
GPU Acceleration & AI App Templates
Leverage GPU acceleration via Container Station. One-click deploy Ollama, AnythingLLM, Stable Diffusion, etc. Simplifying AI application rollout.
25GbE Connectivity & Expansion Ready
Built-in dual 25GbE and 2.5GbE ports, and upgradable for 100GbE. Scale up with QNAP JBODs to meet growing AI data storage demands.
QNAP QAI-h1290FX
TechRadar Pro Picks Awards Winner for CES 2026
QAI Ideal applications
Enterprise-Grade Edge AI and High-Performance Computing
QAI-h1290FX is more than a storage system — it is a compute-ready, enterprise-grade edge computing platform. Built on a high-performance computing architecture, it supports configurable NVIDIA® RTX™ Pro Blackwell GPUs, making it well-suited for large language model (LLM) inference, image generation, RAG search, and a wide range of compute-intensive and virtualized workloads.
Whether for AI inference, research and development, data analytics, or enterprise applications requiring high core counts and sustained performance, a single desktop-class enterprise platform can deliver outstanding compute efficiency and data security entirely on-premises.
Maximum AI Compute Performance (Optional GPU Configuration)
GPU-Ready Architecture — Supporting NVIDIA® RTX™ Pro Blackwell
QAI-h1290FX features a GPU-ready architecture designed to support NVIDIA® RTX™ Pro Blackwell GPUs. Built on the Blackwell architecture, it supports acceleration technologies such as CUDA, TensorRT, and the Transformer Engine, making it well-suited for modern AI and GPU-accelerated computing workloads.
From large language model (LLM) inference and computer vision to generative AI and other GPU-accelerated professional applications, workloads can be deployed and executed entirely on-premises—delivering strong performance while maintaining data privacy and full system control. The platform can also operate as a CPU-centric high-performance computing system, supporting virtualization and a wide range of enterprise computing scenarios.
NVIDIA® RTX™ Pro Blackwell Series — Redefining AI and High-Performance Computing Workflows
The NVIDIA® RTX™ Pro Blackwell series GPUs are purpose-built for high-intensity AI, compute, and creative workloads, combining the next-generation Blackwell architecture with ultra-fast GDDR7 ECC memory. This delivers a level of compute performance and VRAM capacity on a single professional GPU that previously required multiple consumer-grade graphics cards.
With support for up to 96GB of VRAM and enhanced AI acceleration capabilities, the RTX™ Pro Blackwell series is ideal for advanced LLMs, generative models, data analytics, and complex 3D visualization and professional compute workflows.
Powered by Server-Class AMD EPYC™ Processors for High-Performance Compute
QAI-h1290FX is built on a server-class AMD EPYC processor platform, delivering high core counts and massive multithreaded performance.
Designed for long-term, stable operation under highly parallel workloads, it is well suited for virtualization, multithreaded computing, data processing, and edge computing scenarios—while also supporting AI inference and a wide range of compute-intensive applications.
采用 QuTS hero 操作系统
专为企业所打造的 QuTS hero 操作系统采用高可靠 ZFS 文件系统,为关键数据存储提供强悍数据安全与系统稳定,更拥有专注于提升 SSD 性能与寿命的先进技术,满足企业对于高性能与可靠度的严苛要求。探索 QuTS hero 操作系统了解 QuTS hero 最新功能
Remote access to on-prem AI – anytime, anywhere
Create a seamless hybrid work environment with multiple remote access options offered by QNAP. Whether you're managing AI applications or accessing files, the QAI-h1290FX ensures you're always connected—without compromising security.
Direct or relay access options
- myQNAPcloud DDNS:
Access your QuTS hero interface from anywhere via a custom domain, without remembering IP addresses. - myQNAPcloud Link:
Establishes a secure relay connection through QNAP servers—no need to open router ports or modify firewall settings. - VPN Server Support:
Set up a private VPN using QVPN Service, enabling secure encrypted tunnels for full network access.
Whether you're fine-tuning your LLM container setup, reviewing inference logs, or collaborating across locations, the QAI-h1290FX offers reliable access to your on-prem AI environment from any device, anytime.
A More Powerful Container Station: A New Experience in AI Application Deployment
To promote practical AI adoption, the QAI series integrates Container Station with a wide selection of AI application templates. These templates support one-click deployment of popular AI tools and frameworks, with regular updatesfrom QNAP to ensure access to the latest technologies.
Whether you're new to AI or looking to move workloads on-premises, QAI makes it easy to explore AI, reduce costs, enhance data security, and even develop custom AI tools to boost business innovation.
Containerized AI deployment made simple
Enhance your AI infrastructure with seamless container integration. Explore Container Station
Redefining creativity with AI-powered visual design
ComfyUI empowers artists, designers, and content creators with a powerful, modular interface for AI-driven image and video creation. Through its intuitive node-based design and support for advanced models like Stable Diffusion, users can effortlessly generate, transform, and animate visual content. Combined with GPU acceleration and flexible workflows, ComfyUI lowers the barrier to complex visual design—unlocking unprecedented creative freedom.
Real-world AI performance – measured on QAI-h1290FX
AI deployment performance is validated through real-world benchmark data. Under a high-end GPU test configuration, QAI-h1290FX was fully evaluated with the NVIDIA® RTX™ PRO 6000 Blackwell Max-Q Workstation GPU, verifying its performance in on-premises AI inference and enterprise deployment scenarios.
Ollama LLM Inference Benchmark (Rapid Deployment)
Leveraging the GPU acceleration capabilities of the Blackwell architecture, QAI-h1290FX can run a wide range of large language models locally via Ollama.
Ollama enables rapid deployment and simplified management, making it well suited for proof-of-concept (PoC) projects, single-user environments, and small to mid-scale use cases such as RAG-based search, AI assistants, and offline inference.
vLLM Concurrent Inference Benchmark (Enterprise-Grade Throughput)
To address multi-user and high-concurrency AI service requirements, QAI-h1290FX also supports deployment with the vLLM inference engine.
Compared to single-request–oriented inference approaches, vLLM significantly improves GPU utilization and overall throughput through Paged Attention and efficient scheduling mechanisms. This makes it particularly suitable for enterprise AI services, multi-user RAG systems, and API-based AI applications.
Under the same GPU configuration, vLLM demonstrates more consistent latency characteristics and higher tokens-per-second throughput in concurrent request scenarios, making it ideal for production environments and long-running enterprise AI deployments.
Tested Large Language Model : deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (Hugging Face)
Tested Large Language Model : openai/gpt-oss-20b (Hugging Face)
Unlock AI potential with practical use cases
From document automation to creative workflows and system-wide automation, the QAI-h1290FX empowers every department to apply AI in meaningful, measurable ways—securely hosted on your own infrastructure.
No cloud lock-in, no complex setup—just real results driven by local LLMs, secure containers, and integrated QNAP features.
AI Docker applications on QAI-h1290FX
Run powerful AI solutions via Container Station & GPU integration.



























