Dr.-Ing. Nicolas Weber

Researching Automatic Performance Optimizations for Artificial Intelligence, Scientific and High Performance Computing

About me

I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt.

My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications. During my PhD I developed MATOG, an automated tool that automatically chooses optimal memory access patterns in CUDA applications. Since I joined NEC Labs I mainly worked on the SOL project. SOL is a compiler for neural networks running in PyTorch, TensorFlow or ONNX. It has many similarities with TVM, but in contrast mainly focusses on training, therefore provides it’s own AutoGrad implementation and much tighter integration into the executing frameworks.

Aside of SOL, I’m heavily involved in the development of the hybrid programming API VEDA for the NEC SX-Aurora TSUBASA and related libraries for integrating the SX-Aurora into PyTorch and TensorFlow.

My Current Projects

Here a list of all projects that I’m currently involved in and I’m actively maintaining.

Project Description
Illyrian CMake-centric Python Package Creation Tool
Tungl Cross-platform C/C++/Python logging API
VEDA CUDA-like API for NEC SX-Aurora TSUBASA
VEDA-Tensors Tensor Compute Kernels for NEC SX-Aurora TSUBASA
VEDA-PyTorch NEC SX-Aurora TSUBASA device support for PyTorch
SOL Transparent Acceleration of Neural Networks

Open Source Projects I contributed to:

My Previous Projects

Project Description
Keras Merge Keras extension, enabling to merge two Keras models
VEDA-TensorFlow NEC SX-Aurora TSUBASA device support for TensorFlow
Detail-Preserving Pooling in Deep Networks Alternative pooling layer for deep neural networks based on DPID.
MATOG Array access performance auto-tuner for CUDA.
Detail Preserving Image Downscaling Alternative perceptual inspired image downscaling algorithm.
FDGMalloc Fast dynamic memory allocator for CUDA.
Fujitsu FX16 CAN Library CAN network protocol library for Fujitsu FX16 microcontroller.

Reviewer and TPC memberships

Over the years I have been reviewer or TPC member of the following conferences, journals and workshops: