Dr.-Ing. Nicolas Weber

Researching Automatic Performance Optimizations for Artificial Intelligence, Scientific and High Performance Computing

VEDA: Best practices to use hybrid programming on the NEC SX-Aurora TSUBASA

12.11.2022 Article Nicolas Weber

The Vector Engine Driver API (VEDA) was developed to enable easy porting of existing CUDA applications to NEC’s SX-Aurora TSUBASA. While the API enables a smooth transition between the different architectures, there are unique features that require special attention, to achieve optimal performance.

In this article we present multiple methods to improve your code. First, we explain how to use C++ function overloading and templates. Second, we show how to make best use of the unique features of VEDAdeviceptrs. Third, we explain our improvements for VEDA’s memset and memcopy operations, that can also improve your own code.

Continue reading at nec.com

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

01.05.2022 Article Nicolas Weber

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow.

All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance.

Continue reading at nec.com

AVEO-VEDA: Hybrid Programming for the NEC Vector Engine

14.07.2021 Article Nicolas Weber and Erich Focht

Hybrid programming is a state of the art method for incorporating compute accelerators such as GPUs or vector processors into applications that run on a host system. The main reason for hybrid programming is that compute accelerators are well suited for compute and memory heavy tasks but perform poorly in control flow dominated code sections. Therefore latter are usually executed on CPUs while the compute heavy parts are offloaded to accelerators. This article introduces the low-level AVEO and high-level VEDA programming APIs for programming the NEC SX-Aurora TSUBASA, also called Vector Engine (VE).

Continue reading at nec.com