anastasiavalti
on 13 April 2021
During GTC last fall, NVIDIA announced an increased focus on the enterprise datacenter, including their vision of the datacenter-on-a-chip. The three pillars of this new software-defined datacenter include the data processing unit (DPU) along with the CPU and GPU. The NVIDIA BlueField DPU advances SmartNIC technology, which NVIDIA acquired through their Mellanox acquisition in 2020.
Here at Canonical, we are proud of our long partnership with NVIDIA to provide the best experience to developers and customers on Ubuntu. This work has advanced the state of the art with secure NVIDIA GPU drivers and provisions for GPU pass-through. Our engineering teams collaborate deeply to provide the fastest path to the newest features and the latest patches for critical vulnerabilities. For networking, this has meant partnering with Mellanox (now NVIDIA) engineering to provide not just Ubuntu support but also support for hardware offload going back to the oldest ConnectX devices. In fact, Ubuntu was the first Linux distro enabled on the Bluefield cards back in 2019. Increasingly, Ubuntu, which has long been the operating system of choice for cutting-edge machine learning developers, data scientists, containers and Kubernetes is seeing more enterprise adoption across verticals.
In the past decade, NVIDIA revolutionized AI and machine learning with their industry-leading SDKs including CUDA, RAPIDS and more, with Ubuntu being the operating system of choice. The vision we now bring is, if you use the GPU for offloading your machine learning algorithms – can the software-defined datacenter be disaggregated further using the dedicated hardware accelerators on the DPU to offload your security and storage workloads? Canonical’s focus, as always, is on ensuring these hardware features have supporting software stacks that are thoroughly tested and available for the world to consume in a clean and supportable fashion.
If you train your latest deep learning algorithms on cutting-edge NVIDIA GPUs on Ubuntu, why would you settle for anything other than the same thought leadership for your networking, security, and storage needs?
What’s new with the BlueField-2 DPU? And what’s next?
In addition to ConnectX-6, the Bluefield-2 DPU packs 8 ARM A72 CPU cores, 200Gb/s networking bandwidth and a 130Gb/s memory bandwidth (DDR4). As the figure below shows, this provides a trusted environment for offloading of datacenter ‘infrastructure’ applications – i.e storage, security and networking freeing up host servers to run more ‘applications’ – stuff that your business really needs to run. In addition to offload, the DPU isolates the security control plane from the host OS and applications, while using dedicated hardware accelerators to improve system performance. NVIDIA estimates up to 30% of host CPU cycles are currently spent on infrastructure applications, and the potential ROI improvements are significant.
Closely following the BlueField-2, is the BlueField-3 datacenter-on-a-chip DPU, announced today and available in early 2022. This next-generation card offers 16 even more powerful ARM A78 cores (up from 8 A72s), 5X more compute and 2X more network bandwidth. If that’s not enough, it also supports the latest generation PCIe Gen5 interface and two DDR5 memory channels, providing 4X improvements in I/O and memory bandwidth and 2X faster acceleration for crypto and security than the BlueField-2. The NVIDIA DOCA SDK offers backward compatibility, meaning applications developed for and running on BlueField-2 silicon can work seamlessly on the BlueField-3.
Canonical Ubuntu and Kubernetes on the BlueField-2
Our partnership with NVIDIA means Ubuntu 20.04 is ready for download today with support for all the latest features. PXE boot support which provides the ability to provision the DPU remotely is imminent. Canonical provides the same support guarantees for Ubuntu on the DPU as on the host, whether you run a containerized Ubuntu on the DPU ARM cores or run natively to customize your system with snaps for bulletproof security.
Additionally, Canonical invests in Kubernetes, strongly driven by its footprint in the cloud native space, with 63% of all Kubernetes solutions running on Ubuntu. Canonical Kubernetes is pure-upstream Kubernetes. Charmed Kubernetes offers cloud-to-edge customization while MicroK8s is compatible with the powerful A72 cores on the DPU. (Charmed Kubernetes is a composable, multi-cloud, Kubernetes with automated operations. MicroK8s is a lightweight, low-touch, opinionated Kubernetes distribution for edge and IoT, loved by developers and enterprises for it’s simplicity.) Kubernetes on systems with the DPU provide streamlined operations for the cloud-native world.
As NVIDIA continues to upstream their patches, feature support continues to improve over time. If you are interested in learning more, here is a link to a talk we presented at GTC 2020. The talk touches upon DPU use-cases, and demos launching Ubuntu and MicroK8s on the DPU before showcasing a real-life deployment with Kubeflow and Charms.
OVS/OVN Offload
Traditional cybersecurity has focused on external security threats. However, the rise of multi-tenant datacenters in this era of fragmentation and microservices has led to a new class of potential vulnerabilities and threats from other applications or tenants running on the same infrastructure as you. The DPU is uniquely positioned to monitor and control all traffic within the datacenter and even traffic between VMs and containers on the same server.
DPUs introduce an architecture shift into infrastructure deployments: networking services that used to provide virtual switching and routing services locally on compute nodes get moved to the DPUs, and thus isolated from the hypervisor hosts. As a result, there are control plane changes that are required for network interface provisioning to instances or pods. Canonical is developing changes necessary to enable seamless support for provisioning of DPU-accelerated network interfaces in OpenStack, Kubernetes and other projects that utilize Open Virtual Network (OVN) and Open vSwitch(OVS). The Canonical team is working to enable OVS and OVN support across NVIDIA’s Bluefield portfolio. These technologies form the backbone for use-cases on the DPU.
Security at the Edge and NVIDIA Morpheus – AI for Cybersecurity
Recently we were privy to a very interesting deployment using Kubernetes to orchestrate a system with thousands of BlueField-2 cards in a hybrid high-performance multi-tenant datacenter. The use case required strict performance consistency and isolation. This deployment offloads OVN to the DPUs to first create a Virtual Private Cloud (VPC), providing complete isolation between multiple tenants, and then uses the DPUs to enforce security policies at each node to create intelligent cloud security infrastructure.
As compute requirements evolve and compute moves into edge datacenters, closer to where the data is generated, we expect more and more such use cases that leverage the capabilities of the DPU.
Case in point – today at GTC, NVIDIA announced their Morpheus AI application framework. Morpheus combines NVIDIA’s traditional strengths in AI and ML on GPUs with the DPU’s ability to monitor and respond to threats and malicious actors in real-time. The framework offers pre-trained AI models that enhance security in the zero-trust environment of the modern containerized datacenters.
To summarize, the DPU decouples applications (DevOps) from infrastructure (IT) and holds promise as the next step in datacenter evolution. We at Canonical are really excited by the possibilities it brings to the next-generation software-defined datacenter. We would love to explore how we can help you get better ROI on your investments by offloading your infrastructure workloads to optimized hardware such as the BlueField-2.
SCanonical at GTC
- Talk: Simplifying Kubernetes across the Clouds : MicroK8s on NVIDIA Technologies
- Talk: The Power of Micro Clouds – a new class of Compute for the Edge
- Blog: Ubuntu for Machine Learning with NVIDIA RAPIDS in 10 minutes
- Blog: Canonical-NVIDIA Edge Collaboration
- Blog: NVIDIA’s Ariel Kit Explains How NVIDIA BlueField DPUs Are Redefining Data Center Services
- Past Talks: DPUs, K8s, ML and Ops : the Future of Compute
External Links
- Ubuntu Documentation for the BF 2 at NVIDIA
- Ubuntu Documentation for the BF 1 at NVIDIA