Appendix A — Vectorized sequence parsing

A.1 Features of the benchmarked CPUs

Table A.1: Instruction-set extensions supported by the benchmarked CPUs.

CPU	Year	SSE	AVX2	BMI2	NEON
Intel
Xeon X5670	2010	+	–	–	–
Xeon E5-2620	2012	+	–	–	–
Xeon E7-4850 V3	2015	+	+	+	–
Xeon E5-2620 V4	2016	+	+	+	–
Xeon Gold 6130	2017	+	+	+	–
Xeon Gold 5218	2019	+	+	+	–
Xeon Gold 5318Y	2021	+	+	+	–
Xeon Silver 4314	2021	+	+	+	–
Xeon Gold 6442Y	2023	+	+	+	–
Core Ultra 7 165H *¹	2023	+	+	+	–
AMD
Epyc 7301	2017	+	+	~²	–
Epyc 7452	2019	+	+	~	–
Epyc 7642	2019	+	+	~	–
Epyc 7513	2021	+	+	+	–
Epyc 9254	2022	+	+	+	–
Ryzen 5 8500G *	2024	+	+	+	–
ARM
Apple M1 *	2020	–	–	–	+
Apple M3 Pro *	2023	–	–	–	+
Neoverse-V2	2023	–	–	–	+

A.2 Additional experiments on data read from disk

Figure A.1: Throughput of each parser for long reads on multiple CPUs, sorted by manufacturer and year.

A.3 Additional experiments on data loaded in RAM

A.3.1 Throughput

(a) On short reads (FASTQ format).

(a) On short reads (FASTQ format).

Figure A.3: Throughput of Helicase string collection compared to counting DNA bases.

A.3.2 Instructions and cycles

(a) Instructions per byte on short reads.

(a) Instructions per byte on short reads.

A.3.3 Branches and branch misses

(a) On a human genome (FASTA format).

(a) On a human genome (FASTA format).

(a) On a human genome (FASTA format).

(a) On a human genome (FASTA format).

(*) personal computers, not benchmarked in a reproducible environment↩︎
(~) microcoded PDEP/PEXT instructions (very slow)↩︎