Dr.-Ing. Nicolas Weber

Researching Automatic Performance Optimizations for Artificial Intelligence, Scientific and High Performance Computing

About me

I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt.

My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications.

Continue Reading

Facilitate high-performance hardware integration into AI Frameworks with the NEC SOL AI compiler

11.04.2025 Talk Nicolas Weber

AI development has become increasingly driven by powerful frameworks like PyTorch and TensorFlow, supported by major tech companies. However, the rapid release cycles of these frameworks – every 3-6 months – pose a challenge for new hardware vendors. They struggle to develop the necessary AI functionality and keep pace with frequent updates. In this talk, we introduce NEC’s SOL AI compiler, which seamlessly integrates with PyTorch, TensorFlow, ONNX, Numpy, and soon JAX. SOL provides a unified compiler engine for these frameworks, supporting both inference and training, while also enabling model export to standalone libraries with minimal dependencies. Designed for device-agnostic support and ease of maintenance, SOL requires no specific compiler support (e.g., OpenCL, SyCL, OpenMP, Triton, MLIR, …) but can generate device tailored code with minimal coding effort. We will present SOL’s key concepts and its device-agnostic design in this talk.

VEDA: Best practices to use hybrid programming on the NEC SX-Aurora TSUBASA

12.11.2022 Article Nicolas Weber

The Vector Engine Driver API (VEDA) was developed to enable easy porting of existing CUDA applications to NEC’s SX-Aurora TSUBASA. While the API enables a smooth transition between the different architectures, there are unique features that require special attention, to achieve optimal performance.

In this article we present multiple methods to improve your code. First, we explain how to use C++ function overloading and templates. Second, we show how to make best use of the unique features of VEDAdeviceptrs. Third, we explain our improvements for VEDA’s memset and memcopy operations, that can also improve your own code.

Continue reading at nec.com

Keras Merge

09.11.2022 Open Source Nicolas Weber

Today we released my newest Open Source project: Keras Merge! Keras Merge allows you to merge two Keras models, even when you don’t have access to their building functions! Just run pip3 install keras-merge to install it.

A = init_model_a() # -> keras.Model
B = init_model_b() # -> keras.Model

input_a = init_input_a()
input_b = init_input_b()

c = B(input_b, A(input_a))

import keras_merge as km
C = km.merge(A, B,  			# models
	[*A.inputs, B.inputs[0]],	# inputs
	B.outputs,					# outputs
	[							# mapping [(src->dst), ...]
		(A.outputs[0], B.inputs[1])
	]
)

d = C(input_a, input_b)

Checkout Github or PyPI for more information!

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

01.05.2022 Article Nicolas Weber

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow.

All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance.

Continue reading at nec.com

Older posts