About me
I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt.
My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications. During my PhD I developed MATOG, an automated tool that automatically chooses optimal memory access patterns in CUDA applications. Since I joined NEC Labs I mainly worked on the SOL project. SOL is a compiler for neural networks running in PyTorch, TensorFlow or ONNX. It has many similarities with TVM, but in contrast mainly focusses on training, therefore provides it’s own AutoGrad implementation and much tighter integration into the executing frameworks.
Aside of SOL, I’m heavily involved in the development of the hybrid programming API VEDA for the NEC SX-Aurora TSUBASA and related libraries for integrating the SX-Aurora into PyTorch and TensorFlow.
My Current Projects
Here a list of all projects that I’m currently involved in and I’m actively maintaining.
Project | Description |
---|---|
Illyrian | CMake-centric Python Package Creation Tool |
Tungl | Cross-platform C/C++/Python logging API |
VEDA | CUDA-like API for NEC SX-Aurora TSUBASA |
VEDA-Tensors | Tensor Compute Kernels for NEC SX-Aurora TSUBASA |
VEDA-PyTorch | NEC SX-Aurora TSUBASA device support for PyTorch |
SOL | Transparent Acceleration of Neural Networks |
Open Source Projects I contributed to:
My Previous Projects
Project | Description |
---|---|
Keras Merge | Keras extension, enabling to merge two Keras models |
VEDA-TensorFlow | NEC SX-Aurora TSUBASA device support for TensorFlow |
Detail-Preserving Pooling in Deep Networks | Alternative pooling layer for deep neural networks based on DPID. |
MATOG | Array access performance auto-tuner for CUDA. |
Detail Preserving Image Downscaling | Alternative perceptual inspired image downscaling algorithm. |
FDGMalloc | Fast dynamic memory allocator for CUDA. |
Fujitsu FX16 CAN Library | CAN network protocol library for Fujitsu FX16 microcontroller. |
Reviewer and TPC memberships
Over the years I have been reviewer or TPC member of the following conferences, journals and workshops:
- CCOS: Connection Science Journal
2022 - ITEM: IoT, Edge, and Mobile for Embedded Machine Learning
2020-2024 (TPC) - JRTIP: Journal of Real-Time Image Processing
2014, 2017-2020 - TCSVT: IEEE Transactions on Circuits and Systems for Video Technology
2020-2021 - JSA: Journal of Systems Architecture
2018 - VMV: Symposium on Vision, Modeling, and Visualization
2016