Skip to content

Arm is working with NVIDIA and Meta to build AI foundations of the future

In the fast-evolving landscape of artificial intelligence, the shift towards edge computing is a game-changer. This transition is characterized by a push towards processing data closer to where it is generated, rather than relying solely on centralized data centers. This approach is revolutionizing how AI is deployed and utilized across various industries, thanks to key players like NVIDIA, Meta and Arm. They are at the forefront of this movement, driving innovation and making AI more accessible and efficient, especially in resource-constrained environments.


The rapid growth of AI at the edge brings multiple benefits, such as reduced latency, improved privacy, and better cost efficiency. Arm is leading this evolution, emphasizing advanced AI capabilities on the edge through its Cortex-A and Cortex-M CPUs and Ethos-U NPUs. As this sector expands swiftly, developers face the challenge of facilitating effortless deployment on countless edge devices.

A key hurdle is crafting deep learning models suited for edge devices. Developers must juggle limited storage, memory, and computing power while ensuring model accuracy and optimal runtime metrics like latency or frame rate. Models built for high-powered platforms often struggle or fail on devices with more restricted resources.

To address this, NVIDIA has introduced the TAO Toolkit, a user-friendly, low-code, open-source tool built upon Tensorflow and PyTorch. This toolkit simplifies the complexities of training deep learning models. It offers a vast library of pre-trained models specifically for computer vision, aiding in transfer learning. Additionally, it provides streamlined model optimizations, including channel pruning and quantization-aware training, enabling the creation of significantly lighter models.


Arm has collaborated with Meta to integrate support for Arm platforms into ExecuTorch, a cutting-edge solution designed to enhance on-device AI capabilities for PyTorch.

Arm champions the development of AI workloads that are both efficient and user-friendly, and has dedicated significant efforts to optimizing the latest models from PyTorch for its platforms.

Historically, adapting new neural networks from research teams for Arm platforms, with PyTorch as the preferred platform, has been a labor-intensive and manual task. This challenge stemmed from restrictions in export processes and the extensive variety of machine learning operators, complicating the adaptation for embedded systems with limited resources.

The release of Meta’s ExecuTorch codebase, which capitalizes on the advancements in PyTorch 2.0, marks a significant shift. This development simplifies the process of capturing and executing state-of-the-art networks across a diverse range of Arm hardware, from CPUs in servers to CPUs and GPUs in mobile devices, as well as Cortex-M microprocessors and Ethos-U NPUs for embedded applications.

In a collaborative effort with Meta, Arm has integrated preliminary support for its devices into ExecuTorch. This effort is an extension of Arm’s substantial investment in Tensor Operator Set Architecture (TOSA) for capturing neural networks and its Ethos NPUs, which are pivotal in accelerating key ML workloads on mobile and embedded platforms.

Recently, Arm introduced a TOSA compilation flow and a runtime delegate for ExecuTorch. This new feature includes prototype support for the Ethos-U55, facilitating the direct export of graphs from the PyTorch Python environment to Ethos-U enabled platforms, such as Corstone-300.

Looking ahead, Arm is committed to furthering this initiative to establish a robust and versatile export pathway for a broad spectrum of machine learning applications.

For developers and typical users of PyTorch, this advancement signifies a notable change. A growing list of networks can now be efficiently operationalized as standalone models on platforms enabled by Cortex-M and Ethos-U technologies.