Locality-preserving representations of k-mer sets
14
Sketching super-k-mers
PhD Thesis
Abstract
Introduction
1
Comparing genomic sequences
2
Comparing using k-mers
3
Sketching sequences
4
Sampling with minimizers
High-performance sequence processing
5
A primer on vectorization
6
Vectorized sequence parsing
7
Rolling hashes on sequences
8
Vectorized computation of minimizers
9
Application to sequence filtering
Discussion
Locality-preserving representations of k-mer sets
10
Background on k-mer sets
11
Necklaces and minimizers
12
Set representation and operations
13
Super-k-mers maps
14
Sketching super-k-mers
Discussion
Sampling k-mers to lower memory & complexity
15
Background on low density minimizers
16
Multiminimizers
17
Locally-consistent phrases
18
Lexicographic-informed sampling
Discussion
Discussion & conclusion
References
Appendices
A
Vectorized sequence parsing
B
A forward scheme for canonical minimizers
Locality-preserving representations of k-mer sets
14
Sketching super-k-mers
14
Sketching super-k-mers
Note
This chapter is adapted from
(
Rouzé
et al.
, 2023
)
and
(
Rouzé
et al.
, 2025
)
.
Rouzé
, T.,
Martayan
, I.,
Marchet
, C., &
Limasset
, A. (2023)
Fractional Hitting Sets for Efficient and Lightweight Genomic Data Sketching
.
23rd international workshop on algorithms in bioinformatics (WABI 2023)
, vol. 273. Schloss Dagstuhl – Leibniz-Zentrum f
ü
r Informatik.
Rouzé
, T.,
Martayan
, I.,
Marchet
, C., &
Limasset
, A. (2025)
Fractional hitting sets for efficient multiset sketching
.
Algorithms for Molecular Biology
,
20
, 1.
13
Super-k-mers maps
Discussion