Figure A.1: Throughput of each parser for long reads on multiple CPUs, sorted by manufacturer and year.
A.3 Additional experiments on data loaded in RAM
A.3.1 Throughput
(a) On short reads (FASTQ format).
(b) On long reads (FASTQ format).
Figure A.2: Throughput of each parser for data loaded in RAM on multiple CPUs, sorted by manufacturer and year. Needletail and Paraseq both have to use a reader over a slice, which degrades their performance.
Figure A.3: Throughput of Helicase string collection compared to counting DNA bases.
A.3.2 Instructions and cycles
(a) Instructions per byte on short reads.
(b) Cycles per byte on short reads.
Figure A.4: Instructions and cycles per byte on multiple CPUs, sorted by manufacturer and year.
A.3.3 Branches and branch misses
(a) On a human genome (FASTA format).
(b) On short reads (FASTQ format).
Figure A.5: Branches per byte on multiple CPUs, sorted by manufacturer and year.
(a) On a human genome (FASTA format).
(b) On short reads (FASTQ format).
Figure A.6: Branch misses per MB on multiple CPUs, sorted by manufacturer and year.
(*) personal computers, not benchmarked in a reproducible environment↩︎