About me

I am Nicolas Weber, research engineer in the Intelligent Software Systems Group at the NEC Laboratories Europe. Before, I was PhD student in the Graphics, Capture and Massively Parallel Computing Group at TU Darmstadt supervised by Prof. Michael Goesele and Associate of the Graduate School of Computational Engineering at TU Darmstadt.

My main research interests are the automated optimization of code running on accelerator hardware, especially for Scientific and High Performance Computing, Biomedical and Artificial Intelligence applications.

COMBINATION OF MULTIPLE DATA PROCSSING AND MACHINE LEARNING FRAMEWORKS FOR A TARGET HARDWARE

27.02.2024 Patent Nicolas Weber

FULL ASYNCHRONOUS EXECUTION QUEUE FOR ACCELERATOR HARDWARE

28.02.2023 Patent Nicolas Weber

VEDA: Best practices to use hybrid programming on the NEC SX-Aurora TSUBASA

12.11.2022 Article Nicolas Weber

The Vector Engine Driver API (VEDA) was developed to enable easy porting of existing CUDA applications to NEC’s SX-Aurora TSUBASA. While the API enables a smooth transition between the different architectures, there are unique features that require special attention, to achieve optimal performance.

In this article we present multiple methods to improve your code. First, we explain how to use C++ function overloading and templates. Second, we show how to make best use of the unique features of VEDAdeviceptrs. Third, we explain our improvements for VEDA’s memset and memcopy operations, that can also improve your own code.

Continue reading at nec.com

Keras Merge

09.11.2022 Open Source Nicolas Weber

Today we released my newest Open Source project: Keras Merge! Keras Merge allows you to merge two Keras models, even when you don’t have access to their building functions! Just run pip3 install keras-merge to install it.

A = init_model_a() # -> keras.Model
B = init_model_b() # -> keras.Model

input_a = init_input_a()
input_b = init_input_b()

c = B(input_b, A(input_a))

import keras_merge as km
C = km.merge(A, B,  			# models
	[*A.inputs, B.inputs[0]],	# inputs
	B.outputs,					# outputs
	[							# mapping [(src->dst), ...]
		(A.outputs[0], B.inputs[1])
	]
)

d = C(input_a, input_b)

Checkout Github or PyPI for more information!

ACCELERATION OF NEURAL NETWORKS USING DEPTH - FIRST PROCESSING

30.08.2022 Patent Nicolas Weber

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

01.05.2022 Article Nicolas Weber

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK, MxNet, PyTorch, DL4J, or TensorFlow.

All of these provide a high level scripting API that allows users to easily design neural networks and run these on various kinds of hardware. What the user usually does not see is the high effort put into these frameworks to provide peak execution performance.

Continue reading at nec.com

SOL: Single middleware for optimized multi-architecture AI training and deployment

01.01.2022 Talk Nicolas Weber

NEC User Group Meeting

AVEO-VEDA: Hybrid Programming for the NEC Vector Engine

14.07.2021 Article Nicolas Weber and Erich Focht

Hybrid programming is a state of the art method for incorporating compute accelerators such as GPUs or vector processors into applications that run on a host system. The main reason for hybrid programming is that compute accelerators are well suited for compute and memory heavy tasks but perform poorly in control flow dominated code sections. Therefore latter are usually executed on CPUs while the compute heavy parts are offloaded to accelerators. This article introduces the low-level AVEO and high-level VEDA programming APIs for programming the NEC SX-Aurora TSUBASA, also called Vector Engine (VE).

Continue reading at nec.com

Older posts