World Fastest algorithm for Connected Component Labeling and Feature Computation on CPUs and GPUs
Introduction
Designing a new algorithm is challenging both from considering the overwhelming literature and from the very performance of best existing algorithms. Goals could be a faster algorithm on some class of computer architecture or minimizing the number of over-created labels or the smallest theoretical complexity. Yet another issue is to be most predictable.
LSL and Computer Architecture: optimized for RISC
Now, from the current state of the computing technology, reaching decent performances actuality requires for CCL algorithms to take into account two specificities/capacities of RISC architectures: the processor pipeline and its cache memories. That amounts to minimize conditional statements (like tests and comparisons) to reduce the number of pipeline stalls and limit random sparse (typically vertical) memory accesses, to lower cache misses.
LSL characteristics
LSL is specifically designed in view of RISC architectures. It uses a segment approach combined with the Selkow’s automaton, to minimize the number of created labels. Its major improvement is the introduction of a new line-relative labeling to simplify equivalence building between segments.
There are two version of the LSL:
LSL-STD is the most systematic and data-independent possible and is designed for noisy images (random or pseudo random images with very few structuration) and for systems where execution time predictability is important (like embedded systems).
LSL-RLE is optimized for real images. It uses a RLE compression to optimize cache footprints and to maximize cache.
Features Computation
As Connected Component Labeling are usually used for features computation, a peciluar attention has been paid to boost this kind of processing. Two kind of features computing are currently integrated in LSL:
bounding rectangle (xmin, xmax, ymin, ymax).
statistical moments (S, Sx, Sy, ...) and centroid (xG,yG).
The LSL software architecture makes the addition of new feature computation very easy.
Benchmarks
A four-step benchmark is proposed (see ICIP and JRTIP) to evalute the modern algorithms. For that benchmark, LSL appears to be the world fastest algorithm, as far as we know. The data sets are provided below.
As a matter of fact, on a 2.8 GHz Penryn, the LSL execution time for a 512x512 random image is 1.6 ms. This execution time drops down to 0.47 ms for 512x512 OCR image.
Examples Data Base
Images are available (here),
in 512x512, 1024x1024 and 2048x2048. Images are splitted in 4 sets:
percolation / random: one of the worst case for segment-based algorithm,
slightly structured images (percolation + morphological dilation / erosion): still hard images (harder than real ones),
highly structured images (homotheties): very easy images (easier than real ones) used to evaluate the impact of the size, the number of labels on the execution time,
OCR-like images: the Universal Human Rights Declaration (case 8 or 12, 72 or 300 dpi) in one image. A more complex real image (cadastre) is also provided<./li>
Bibliography
JRTIP Journal of Real Time Image Processing: "Parallel Light Speed Labeling: an efficient connected component algorithm for labeling and analysis on multi-core processors" L. Cabaret, L. Lacassagne, D. Etiemble, 2018 (Springer Link).
GTC 2021 "Taming Voting Algorithms on GPUs for an Efficient Connected Component Analysis Algorithm", F. Lemaitre, A. Hennequin, L. Lacassagne, GPU Technology Conference (GTC) 2021
PPoPP/WPMVP 2020 "How to speed Connected Component Labeling up with SIMD RLE algorithms", F. Lemaitr, A. Hennequin, L. Lacassagne, Workshop on programming models for SIMD/Vector processing (WPMVP), ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2020
PPoPP/WPMVP 2019 "Designing efficient SIMD algorithms for direct CCL", A. Hennequin, I. Masliah, L. Lacassagne, Workshop on programming models for SIMD/Vector processing (WPMVP), ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) 2019
GTC 2019 San Jose "A New Direct Connected Component Labeling and Analysis Algorithm for GPUs", A. Hennequin, L. Lacassagne, GPU Technology Conference (GTC) @ San Jose 2019 video @ Nvidia + slides @ Nvidia
DASIP 2019 SparseCCL "SparseCCL: Connected Components Labeling and Analysis for sparse images", A. Hennequin, B. Couturier, V. Gligorov, L. Lacassagne, IEEE International Conference on Design and Architectures for Signal and Image Processing, 2019
DASIP 2018 "A new Direct Connected Component Labeling and Analysis Algorithms for GPUs", A. Hennequin, L. Lacassagne, L. Cabaret, Q. L. Meunier, IEEE International Conference on Design and Architectures for Signal and Image Processing, 2018
IPTA 2017 "Distanceless Label Propagation: an Efficient Direct Connected Component Labeling Algorithm for GPUs", L. Cabaret, L. Lacassagne, D. Etiemble, International Conference on Image Processing Theory, Tools and Applications.
GTC 2017 "Distanceless Label Propagation: an Efficient Direct Connected Component Labeling Algorithm for GPUs", L. Cabaret, L. Lacassagne, D. Etiemble, International GPU Technical Conference, Munchen, (poster).
PPoPP/WPMVP 2016 "A new SIMD iterative connected component labeling algorithm", L. Lacassagne, L. Cabaret, D. Etiemble, F. Hebache, A. Petreto, ACM International Workshop on Programming Models for SIMD/Vector (WPMVP) @ Principles and Practice of Parallel Programming Conference (PPoPP), 2016
ICIP 2015 "Parallel Light Speed Labeling: an efficient connected component labeling algorithm for multi-core processors", L. Cabaret, L. Lacassagne, D. Etiemble, IEEE International Conference on Image Processing, 2015
SiPS 2014 "What Is the World’s Fastest Connected Component Labeling Algorithm?", L. Cabaret, L. Lacassagne, IEEE International Workshop on Signal Processing Systems, 2014
DASIP 2014 "A Review of World’s Fastest Connected Component Labeling Algorithms: Speed and Energy Estimation", L. Cabaret, L. Lacassagne, L. Oudni, IEEE International Conference on Design and Architectures for Signal and Image Processing, 2014.
ICIP 2009 "Light Speed Labeling for RISC architectures", L. Lacassagne, B. Zavidovique. IEEE International Conference on Image Processing, 2009, pp. 3245-3248.
ICIAP 1999 "Motion detection, labeling, data association and tracking in real time on RISC computer", L. Lacassagne, M. Milgram, P. Garda, Septembre à Venise, Italy (this pdf is correct, no that on IEEE site)
AAA 2000, (soon recompilation) "Light Speed Labelling: un nouvel algorithme d'étiquetage en composantes connexes", L. Lacassagne, M. Milgram, J. Devars, Congrès Adéquation Algorithme Architecture, 2000, France