Adrien Cassagne's Homepage

Adrien Cassagne

Associate Professor in Computer Science

Sorbonne Université, LIP6
4 Place Jussieu, 75005 Paris, France
Office 24-25/403
+33 1 44 27 65 61
adrien.cassagne@lip6.fr

Short Bio

Since 2021, I hold an Associate Professor position (Maître de Conférences in french) at Sorbonne University (Paris, France) . For my research work, I'm attached to the LIP6 laboratory (Laboratoire d'Informatique de Paris 6) in the ALSoC (Hardware and Software for Embedded System) team .

In 2020, I completed my Ph.D. thesis under the co-supervision of Prof. Denis Barthou (Inria Lab. ) and Prof. Christophe Jégo (IMS Lab. ) in Bordeaux, France. In 2013, I graduated from the Master's degree in Computer Science at University of Bordeaux.

My main research concern is about providing efficient implementations of parallel algorithms for multi-core and heterogeneous programmable architectures. Recently, I've been focusing on low power Systems-on-a-Chip (SoCs) as the energy efficiency is becoming increasingly crucial. In general, I am interested in everything related to efficient software implementations. Here are some of the domains I'm working on (or have worked on in the past):

Computer Vision on embedded heterogeneous SoCs
Software-Defined Radio on multi-core & SIMD CPUs
Inference of Deep Neural Networks on CPUs, GPUs & NPUs
Computational Fluid Dynamics on Supercomputers (CPUs & discrete GPUs)

Open Source Research Software

AFF3CT

A Fast Forward Error Correction Toolbox or AFF3CT is a simulator and a library dedicated to the Forward Error Correction (FEC or channel coding). It is written in C++ and it supports a large range of codes: from the well-spread Turbo codes to the new Polar codes including the Low-Density Parity-Check (LDPC) codes. AFF3CT can be used as a command line program and it simulates communication chains based on a Monte Carlo method.

AFF3CT was first intended to be a simulator but as it developed, the need to reuse sub-parts of the code intensified: the library was born. Below is a list of possible applications for the library:

Build custom communication chains that are not possible with the simulator
Facilitate hardware prototyping
Enable various modules to be used in Software-Defined Radio contexts

Source code » Documentation » Website »

Related featured publication: AFF3CT: A Fast Forward Error Correction Toolbox!. Elsevier SoftwareX, October 2019.

MIPP

My Intrinsics++ or MIPP is a portable wrapper for vector intrinsic functions (SIMD) written in C++11. It works for SSE, AVX, AVX-512, ARM NEON and SVE (work in progress) instructions. MIPP wrapper supports simple/double precision floating-point numbers and also signed integer arithmetic (64-bit, 32-bit, 16-bit and 8-bit).

With the MIPP wrapper you do not need to write a specific intrinsic code anymore. Just use provided functions and the wrapper will automatically generates the right intrisic calls for your specific architecture.

Source code »

Related featured publication: MIPP: A Portable C++ SIMD Wrapper and its use for Error Correction Coding in 5G Standard. In ACM Workshop on Programming Models for SIMD/Vector Processing, February 2018.

StreamPU

StreamPU is a Domain Specific Embedded Language (DSEL) for streaming applications. It comes in the form of a C++11 library to link with. Its main features are:

Definition of dataflow components: modules, tasks and sockets
Elementary modules and tasks implementations
Multi-threaded runtime with replication and pipeline parallel constructs

The DSEL is suitable for SDR systems, audio/video processing and more generally it matches single-rate Synchronous DataFlow (SDF) streaming applications. It is used as the multi-threaded runtime of AFF3CT and FMDT.

Source code » Documentation »

Related featured publication: StreamPU: A DSEL for High Throughput and Low Latency Software-Defined Radio on Multicore CPUs. Wiley Concurrency and Computation: Practice and Experience, July 2023.

FMDT

Fast Meteor Detection Toolbox or FMDT is a software designed to detect meteors. FMDT is foreseen to be applied to airborne camera systems, e.g. in atmospheric balloons or aircraft. It is robust to camera movements by a motion compensation algorithm.

FMDT is ready for real-time processing on small boards like Raspberry Pi 4 or Nvidia Jetson Nano for embedded systems. For instance, on the Raspberry Pi 4 (@ 1.5 GHz), FMDT is able to compute 30 frames per second on a HD video sequence while the instant power is only around 4 Watts.

Source code » Documentation »

Related featured publication: A 2022 τ-Herculid Meteor Cluster from an Airborne Experiment: Automated Detection, Characterization, and Consequences for Meteoroids. Astronomy and Astrophysics, February 2023.

Hardware Platforms for Research

Dalek

Dalek is an innovative cluster built around CPUs traditionally used in mini-PCs or laptops and from GPUs that can be found in gaming PCs (or iGPUs). The cluster comes with a wide selection of recent components that can be tested on a variety of algorithms. One of the main purpose of Dalek is to provide such a diversity of components at a moderate price. Indeed, components from the general public are much less expensive than server-class hardware. As a consequence, Dalek feats very well to software design and prototyping as it enables researchers to discover and to experience new hardware just after their release date. The partitions are typically composed of four nodes of the same type and the network is based on 2.5 GbE interfaces.

Here are the four available partitions:

AMD Ryzen 9 7945HX CPU + Nvidia GeForce RTX 4090 GPU
AMD Ryzen 9 7945HX CPU + AMD Radeon RX 7900 XTX GPU
Intel Core Ultra 9 185H CPU + Intel Arc A770 GPU
AMD Ryzen AI 9 HX 370 CPU + AMD Radeon 890M iGPU

Documentation »

Monolithe

The Monolithe is an experiment platform composed of multiple single board computers (SBCs). The purpose of having all these SBCs together is to be able to evaluate their suitability for embedded applications. In the context of embedded systems (satellites, weather balloons, network equipment, cars, medical devices, smartphones, ...), the energy consumption is a crucial characteristic. For this reason, a specific power measurement platform has been designed and produced in intern. The later is plugged between the power supply and the SBC. Each platform measures voltage and intensity at high frequency: 5000 samples/s.

The cluster is composed of various SBCs:

Raspberry Pi: 3, 4 and 5
Nvidia Jetson: TX2, Nano, NX AGX, AGX Xavier, Orin Nano, Orin NX, Orin AGX
Low power x86: Mercury EM780, AtomMan X7 Ti
Others: Odroid-XU4, VIM1S, Orange Pi 5 Plus, BPi-F3, M1 Ultra

Documentation »

Teaching

I'm affected to Sorbonne University (UFR d'Ingénierie 919) and to the Polytech Sorbonne engineering school (UFR 933). I am mainly teaching about parallel programming, computers architecture and operating systems.

MU5IN160 - Parallel Programming for Embedded Architectures (M2) (Moodle 2024 link )
MU5IN166 - Hot Topics: Evolution of the SIMD ISAs on General Purpose CPUs (M2)
LU3IN029 - Computer Architecture (L3) (Moodle 2024 link , ressources for the second part of the class )
LU3IN010 - Principles of Operating Systems (L3)
EPU-I9-ICH - High Performance Computing (M2) (Moodle 2024 link )
EPU-N6-IPS - Operating System Fundamentals (L3) (Moodle 2024 link )

Supervision

Current Ph.D. Student

Yacine Idouar (from 2024) - Task scheduling for applications embedded in a nano-satellite and on heterogeneous SoCs. (co-supervised by Prof. Dimitri Galayko and Prof. Lionel Lacassagne )

Ph.D. Alumni

Maxime Millet (2024) - Optimization and time/quality trade-offs of an optical flow algorithm on low-power SoC for real-time meteor detection onboard a nanosatellite. (supervised by Prof. Lionel Lacassagne )

Engineers

Maxime Millet (2025, 4 months) - Integration of the Meteorix computer vision chain into the open-source FMDT toolbox.
Kun He (2019, 1 year) - Development of a MATLAB interface for the AFF3CT C++ library.

Master Internships

Johannes Laute (2024, 6 months) - Vectorizing PyTorch for RISC-V Vector extension (RVV).
Yacine Idouar (2024, 6 months) - Heterogeneous tasks for streaming applications running on SoCs with unified memory.
Enrique Galvez (2024, 5 months) - Study of convolutions for efficient inference of deep neural networks on embedded processors.
Mathuran Kandeepan (2023, 6 months) - Execution of streaming tasks graph on heterogeneous CPU/GPU architectures with shared memory.
Clara Ciocan (2023, 6 months) - Improving the robustness of a meteor detection application with a moving camera.
Nourdinne Hammachi (2023, 2 months) - StreamPU DSEL extension: Support for control flow in a multi-threaded context (pipeline and replication).
Yacine Idouar (2023, 2 months) - StreamPU DSEL extension: Implement a new type of read & write data ("forward socket").
Michaël Baudeur (2023, 2 months) - Implementation of an acquisition system for an Ethernet camera using the Aravis API.
Clara Ciocan & Mathuran Kandeepan (2022, 2 months) - Development of a computer vision application for meteor detection.
Maël Keryell (2021, 6 months) - Semi-automatic exploration of computational expressions, finding compromise between efficiency and precision.

Bachelor Internships

Enrique Galvez (2022, 2 months) - Optimizing the execution of an optical flow application on a heterogeneous & parallel architecture.
Edgar Baucher (2021, 2 months) - Development of a header only C++ library for the RISC-V Vector extension instruction set (RVV).
Mehdi Naciri (2019, 4 months) - Development of a web application dedicated to displaying, comparing and searching in a database of scientific results.