Wuffs’ PNG image decoder

Last modified on April 09, 2021

Summary: Wuffs’ PNG image decoder is memory-high high quality nonetheless may possibly possibly additionally furthermore clock between
1.22x and a few.75x before libpng, the broadly previous begin supply C
implementation. It’s additionally before the libspng, lodepng and stb_image
C libraries as successfully as essentially the most smartly-liked Dart and Rust PNG libraries. Excessive
effectivity is achieved by SIMD-acceleration, 8-byte big enter and copies when
bit-twiddling and zlib-decompressing the entire image all-at-as quickly as (into one
tremendous intermediate buffer) in rep of 1 row at a time (into smaller,
re-usable buffers). All-at-as quickly as requires additional intermediate reminiscence nonetheless permits
significantly additional of the image to be decoded inside the zlib-decompressor’s
quickest code paths.

Introduction

Transportable Network Graphics, is a ubiquitous, lossless image file format, mainly based mostly
on the zlib compression format. It turned into invented within the
1990s
when 16-bit laptop techniques and 64 KiB
reminiscence limits have been nonetheless an brisk
enlighten
. Newer image
codecs (love WebP) and newer compression codecs (love Zstandard) can create
smaller recordsdata at linked decode speeds, nonetheless there’s nonetheless completely different inertia
inside the zillions of present PNG photographs. By one
metric
, PNG remains to be
essentially the most steadily previous image format on the net. Mozilla telemetry
IMAGE_DECODE_SPEED_XXX pattern counts from 2021-04-03 (Firefox Desktop nightly
89) places PNG second, after JPEG:

  • JPEG: 63.15M
  • PNG: 49.03M
  • WEBP: 19.23M
  • GIF: 3.79M

libpng is a broadly previous begin supply implementation of the PNG image format,
constructing on zlib (the library), a broadly previous begin supply implementation of
zlib (the format).

Wuffs is a 21st century programming language
with a veteran library written in that language. On a mid-fluctuate x86_64
laptop, albeit on an admittedly little pattern attribute, Wuffs can decode PNG photographs
between 1.50x and a few.75x before libpng (which we elaborate because the 1.00x
baseline tempo):

libpng_decode_19k_8bpp                            58.0MB/s ± 0%  1.00x
libpng_decode_40k_24bpp                           73.1MB/s ± 0%  1.00x
libpng_decode_77k_8bpp                             177MB/s ± 0%  1.00x
libpng_decode_552k_32bpp_ignore_checksum           146MB/s ± 0%  (†)
libpng_decode_552k_32bpp_verify_checksum           146MB/s ± 0%  1.00x
libpng_decode_4002k_24bpp                          104MB/s ± 0%  1.00x

libpng                                                  1.00x to 1.00x

----

wuffs_decode_19k_8bpp/clang9                       131MB/s ± 0%  2.26x
wuffs_decode_40k_24bpp/clang9                      153MB/s ± 0%  2.09x
wuffs_decode_77k_8bpp/clang9                       472MB/s ± 0%  2.67x
wuffs_decode_552k_32bpp_ignore_checksum/clang9     370MB/s ± 0%  2.53x
wuffs_decode_552k_32bpp_verify_checksum/clang9     357MB/s ± 0%  2.45x
wuffs_decode_4002k_24bpp/clang9                    156MB/s ± 0%  1.50x

wuffs_decode_19k_8bpp/gcc10                        136MB/s ± 1%  2.34x
wuffs_decode_40k_24bpp/gcc10                       162MB/s ± 0%  2.22x
wuffs_decode_77k_8bpp/gcc10                        486MB/s ± 0%  2.75x
wuffs_decode_552k_32bpp_ignore_checksum/gcc10      388MB/s ± 0%  2.66x
wuffs_decode_552k_32bpp_verify_checksum/gcc10      373MB/s ± 0%  2.55x
wuffs_decode_4002k_24bpp/gcc10                     164MB/s ± 0%  1.58x

wuffs                                                   1.50x to 2.75x

(†): libpng’s “simplified API” doesn’t current one intention to disregard the checksum. We
reproduction the verify_checksum numbers for a 1.00x baseline.

As an occasion, the 77k_8bpp supply image is 160 pixels big, 120 pixels extreme
and its coloration mannequin is Eight bits (1 byte; a palette index) per pixel. Decoding
that to 32bpp BGRA produces 160 &instances; 120 &instances; 4=76800 bytes, abbreviated as 77ok.
The check photographs:

  • 19k_8bpp
    dst: Listed, src: Listed
  • 40k_24bpp
    dst: BGRA, src: RGB
  • 77k_8bpp
    dst: BGRA, src: Listed
  • 552k_32bpp
    dst: BGRA, src: RGBA
  • 4002k_24bpp
    dst: BGRA, src: RGB

Producing 4002ok bytes at 104MB/s or 164MB/s capacity that it takes about
38ms or 24ms for libpng or Wuffs to decode that 1165 &instances; 859 image.

Other PNG implementations (libspng,
lodepng,
stb_image, Dart’s
image/png
and Rust’s
png
) are measured inside the Appendix (Benchmark
Numbers)
.

Wuffs Code

The uncover line examples additional underneath seek the advice of with a wuffs listing. Receive it by
cloning this repository:

$ git clone https://github.com/google/wuffs.git

Just a couple of of the uncover line output, proper right here and underneath, have been omitted or in any other case
edited for brevity.

PNG File Structure

The PNG image format builds on:

  1. Two checksum algorithms, CRC-32 and Adler-32. Both create 32-bit hashes nonetheless
    they're completely different algorithms.
  2. The DEFLATE compression format.
  3. 2-dimensional filtering. For a row of pixels, it’s most steadily higher (smaller
    output) to compress the residuals (the variation between pixel values and a
    weighted sum of their neighbors above and left) than the uncooked values.

Every of these steps may possibly possibly additionally furthermore be optimized.

Checksums

CRC-32

Quickly CRC Computation for Generic Polynomials Utilizing PCLMULQDQ
Instruction

by Gopal, Ozturk, Guilford, Wolrich, Feghali, Dixon and Karakoyunlu is a 2009
white paper on implementing CRC-32 the utilization of x86_64 SIMD directions. The correct
code looks love
this
.
The ARM SIMD code is even
extra realistic
,
as there are devoted CRC-32 linked intrinsics.

As for effectivity, Wuffs’
occasion/crc32
program is roughly similar to Debian’s /bin/crc32, completely different than being 7.3x
sooner (0.056s vs 0.410s) on this 178 MiB
file
.

$ ls -lh linux-5.11.3.tar.gz | awk '{print $5 " " $9}'
178M linux-5.11.3.tar.gz
$ g++ -O3 wuffs/occasion/crc32/crc32.cc -o wcrc32
$ time ./wcrc32   /dev/stdin 

SMHasher

SMHasher is a check and benchmark suite
for a unfold of hash attribute implementations. It may possibly possibly current information for claims
love “our recent Foo hash attribute is before the broadly previous Bar, Baz and Qux
hash features”. Alternatively, when evaluating Foo to CRC-32, endure in thoughts {that a}
SIMD-accelerated CRC-32 implementation may possibly possibly additionally furthermore be 47x
sooner

than SMHasher’s easy CRC-32 implementation.

Adler-32

There isn’t a white paper about it, nonetheless the Adler-32 checksum may possibly possibly additionally furthermore be
SIMD-accelerated. Right right here’s the ARM
code

and the x86_64
code
.

Wuffs’ Adler-32 implementation is spherical 6.4x sooner (11.3GB/s vs 1.76GB/s)
than the one from zlib-the-library (referred to as the ‘mimic library’ proper right here), as
summarized by the
benchstat program:

$ cd wuffs
$ # ¿ is comely a standard persona that's simple to sigh for. By
$ # conference, in Wuffs' supply, it marks build-linked information.
$ grep ¿ check/c/std/adler32.c
// ¿ wuffs mimic cflags: -DWUFFS_MIMIC -lz
$ gcc -O3 check/c/std/adler32.c -DWUFFS_MIMIC -lz
$ # Urge the benchmarks.
$ ./a.out -bench | benchstat /dev/stdin
title                      tempo
wuffs_adler32_10ok/gcc10   11.3GB/s ± 0%
wuffs_adler32_100ok/gcc10  11.6GB/s ± 0%
mimic_adler32_10ok/gcc10   1.76GB/s ± 0%
mimic_adler32_100ok/gcc10  1.72GB/s ± 0%

Ignoring Checksums

Taken to an coarse, the quickest checksum implementation is comely not doing the
checksum calculations in any respect (and skipping over the 4-byte anticipated checksum
values inside the PNG file).

The ignore_checksum versus verify_checksum benchmark numbers on the pinnacle of
this submit counsel a 1.04x effectivity distinction. For Wuffs, proper this is a
one-line
exchange
.
Even while you occur to don’t use Wuffs’ decoder, turning off PNG checksum verification
may possibly possibly nonetheless tempo up your decodes, possibly by additional than 1.04x in case your PNG
decoder wouldn't use a SIMD-accelerated checksum implementation.

If doing so, endure in thoughts that turning off checksum verification is a trade-off:
being much less able to detect information corruption and to deviate from a strict discovering out
of the related file format specs.

DEFLATE Compression

The massive majority of DEFLATE compressed information accommodates a sequence of codes, each
literal codes or reproduction codes. There are 256 conceivable literal codes, one for
each conceivable decompressed byte. Every reproduction code accommodates a dimension (what number of
bytes to reproduction, between Three and 258 inclusive) and a distance (how far earlier in
the ‘historic previous’ or previously-decompressed output to reproduction from, between 1 and
32768 inclusive).

As an occasion, “banana” can be compressed as this sequence:

  • Literal ‘b’.
  • Literal ‘a’.
  • Literal ‘n’.
  • Copy Three bytes starting from 2 bytes in the past: “ana”. Yes, the ultimate ‘a’ of the reproduction
    enter would be the precept ‘a’ of the reproduction output and wasn’t acknowledged till the
    reproduction started.

Codes are Huffman encoded, which suggests that they purchase a variable (nonetheless integral)
assortment of bits (between 1 and 48 inclusive) and map not essentially begin or finish
on byte boundaries.

Literal codes emit a single byte. Copy codes emit as much as 258 bytes. Basically essentially the most
assortment of output bytes from anyone code is subsequently 258. We’ll re-seek recommendation from this
quantity later.

Wuffs mannequin 0.2 had a an similar implementation to zlib-the-library’s, and
performed
equally
,
not not as much as on x86_64. Wuffs mannequin 0.Three provides two vital optimizations for
up to date CPUs (with 64-bit unaligned hundreds and shops): 8-byte-chunk enter and
8-byte-chunk output.

8-Byte-Chunk Input

As eminent above, DEFLATE codes seize between 1 and 48 bits.
zlib-the-library’s “decode 1 DEFLATE code” implementation reads enter bits at
plenty of areas inside the loop. There are 7 instances of rep +=(unsigned
extended)(*in++) in
inffast.c,
loading the enter bits 1 byte (Eight bits) at a time.

We are able to as an completely different enlighten a single 64-bit load as quickly as per loop. Just a couple of of these loaded
bits can be dropped on the underside, if there already are unprocessed bits inside the
bit buffer, nonetheless that’s OK. Drinking these bits will shift in zeroes, bit-wise
OR with zeroes is a no-op and bit-wise OR with enter bits is idempotent. Fabian
“ryg” Giesen’s 2018 weblog submit discusses discovering out bits in noteworthy extra
element
.

For Wuffs, discovering out 64 bits as quickly as per interior loop accelerated its DEFLATE
micro-benchmarks by up to
1.30x
.

8-Byte-Chunk Output

Defend into consideration a DEFLATE code sequence for compressing TO BE OR NOT TO BE. THAT IS
ETC
. The second TO BE can be represented by a replica code of dimension 5 and
distance 13. A simple implementation of a 5 byte reproduction is a loop. In case your CPU
permits unaligned hundreds and shops, a 5 instruction sequence (4-byte load;
4-byte retailer; 1-byte load; 1-byte retailer; out_ptr +=5) may possibly possibly or could not be
sooner, nonetheless nonetheless merely (given a sufficiently tremendous distance). Even higher
(in that it’s fewer directions) is to reproduction too noteworthy (8-byte load; 8-byte
retailer; out_ptr +=5).

: TO_BE_OR_NOT_??????????????????????
: ^            ^
: out_ptr-13   out_ptr
:
:
:              [1234567) copy 8 bytes
:              v       v
: TO_BE_OR_NOT_TO_BE_OR??????????????
:              ^    ^
:                   out_ptr +=5
:
:
:                   [) write 1 byte
:                   vv
: TO_BE_OR_NOT_TO_BE.OR??????????????
:                   ^^
:                    out_ptr +=1

The output of subsequent codes (e.g. a literal '.' byte) will overwrite and
fix the excess. Or, if there are no subsequent codes, have the decompression
API post-condition be that any bytes in the output buffer may be modified,
even past the “number of decompressed bytes” returned.

Note that zlib-the-library’s API doesn’t allow this optimization
unconditionally. Its inflateBack function uses a single buffer for both
history and output
,
so that 8-byte overwrites could incorrectly modify the history (what the
library calls the sliding window) and hence corrupt future output.

For Wuffs, rounding up the copy length to a multiple of 8 sped up its DEFLATE
micro-benchmarks by up to
1.48x
.

gzip

The gzip file format is, roughly speaking, DEFLATE compression combined with
a CRC-32 checksum. Like example/crc32, Wuffs’
example/zcat program
is roughly equivalent to Debian’s /bin/zcat, other than being 3.1x faster
(2.680s vs 8.389s) on the same 178 MiB file and also running in a self-imposed
SECCOMP_MODE_STRICT
sandbox
.

$ gcc -O3 wuffs/example/zcat/zcat.c -o wzcat
$ time ./wzcat    /dev/null
real    0m2.680s
$ time /bin/zcat  /dev/null
real    0m8.389s

As a consistency check, the checksum of the both programs’ output should be the
same (and that 0x750d1011 checksum value should be in the final bytes of the
.gz file). Note that we are now checksumming the decompressed contents. The
earlier example/crc32 output checksummed the compressed file.

$ ./wzcat   

Running Off a Cliff

Racing from point A to point B on a flat track is simple: run as fast as you
can. Now suppose that point B is on the edge of a cliff so that overstepping is
fatal (if not from the fall, then from the sharks). Racing now involves an
initial section (let’s color it blue) where you run as fast as you can and a
final section (let’s color it red) where you go slower but with more control.

cliff illustration #0

Decompressing DEFLATE involves writing to a destination buffer and writing past
the buffer bounds (the classic ‘buffer overflow’ security flaw) is analogous to
running off a cliff. To avoid this, zlib-the-library has two decompression
implementations: a fast ‘blue’ one (when 258 or more bytes away from the
buffer end
,
amongst some other conditions) and a slow ‘red’ one
(otherwise).

Separately, libpng allocates two buffers (for the current and previous row of
pixels) and calls into zlib-the-library H times, where H is the image’s
height in pixels. Each time, the destination buffer is exactly the size of one
row (the width in pixels times the bytes per pixel, plus a filter configuration
byte, roughly speaking) without any slack, which means that zlib-the-library
spends the last 258 or more bytes of each row in the slow ‘red’ zone. For
example, this can be about one quarter of the pixels of a 300 × 200 RGB (3
bytes per pixel) image, and a higher proportion in terms of CPU time.

cliff illustration #1

Wuffs’ zlib-the-format decompressor also uses this blue/red dual
implementation technique, but Wuffs’ PNG decoder decompresses into a single
buffer all-at-once instead of one-row-at-a-time. Almost all (e.g. more than 99%
of the pixels of that 300 × 200 RGB image) of the zlib-the-format
decompression is now in the ‘blue’ zone. This is faster than the ‘red’ zone by
itself but it also avoids any instruction cache or branch prediction slow-downs
when alternating between blue code and red code.

Memory Cost

All-at-once obviously requires O(width × height) intermediate memory (what
Wuffs calls a “work buffer”) instead of O(width) memory, but if you’re
decoding the whole image into RAM anyway, that already requires O(width ×
height)
memory.

Also, Wuffs’ image decoding API does give the caller some choice on memory use.
Wuffs doesn’t say, “I need M bytes of memory to decode this image”, it’s “I
need between M0 and M1 (inclusive). The more you give me, the faster I’ll
be”.

Wuffs’ PNG decoder currently sets M0 equal to M1 (there’s no choice;
all-at-once is mandatory) but a future version could give a one-row-at-a-time
option by offering a lower M0. The extra O(width × height) memory cost
could be avoided (at a performance cost) for those callers that care.

PNG Filtering

Both Wuffs-the-library and libpng (but not all of other PNG decoders measured
here) have SIMD implementations of PNG’s 2-dimensional filters. For example,
here’s Wuffs’ x86
filters
.

libpng can actually be a little faster at this step, since it can ensure that
any self-allocated pixel-row buffers are aligned to the SIMD-friendliest
boundaries. Alignment can impact SIMD instruction selection and performance.
ARM and x86_64 are generally more and less fussy about this respectively.

Wuffs-the-library makes fewer promises about buffer alignment, partly because
Wuffs-the-language doesn’t have the capability to allocate
memory
,
but mainly because zlib-decompressing all-at-once requires giving up being
able to e.g. 4-byte-align the start of each row. This is especially true, even
if RGBA pixels at 8 bits per channel are 4 bytes per pixel, because the PNG
file format prepends one byte (for filter configuration) to each row. The
zlib-decompression layer sees an odd number of bytes per row.

Nonetheless, profiling suggests that more time is spent in zlib-decompression
than in PNG filtering, so that the benefits of all-at-once zlib-decompression
outweigh the costs of unaligned PNG filtering. Wuffs’ Raspberry Pi 4 (32-bit
armv7l) compared-to-libpng benchmark ratios aren’t quite as impressive as its
x86_64 ratios (see Hardware below), but Wuffs still comes out
ahead.

Tangentially, that one filter-configuration byte per row, interleaved between
the rows of filtered pixel data
, also makes it impossible to zlib-decompress
all-at-once directly into the destination pixel buffer. Instead, we have to
decompress to an intermediate work buffer (which has a memory cost) and then
memcpy (and filter) 99% of that to the destination buffer. In hindsight, a
different file format design wouldn’t need a separate work buffer, but it’s far
too late to change PNG now.

Upstream Patches

The optimization techniques described above were applied to new code:
Wuffs-the-library written in Wuffs-the-language. They could also apply to
existing code too, but there are reasons to prefer new code.

Patching libpng

libpng is written in C, whose lack of memory safety is well documented.
Furthermore, its error-handling API is built around setjmp and longjmp.
Non-local gotos make static or formal analysis more complicated.

Despite the file format being largely unchanged since 1999 (version 1.2 was
formalized in 2003; APNG is an
unofficial extension), the libpng C implementation has collected 74 CVE
records from 2002 through to
2021
, 9 of those since
2018.

Its source code has a one-line comment that literally says “TODO: WARNING:
TRUNCATION ERROR: DANGER WILL
ROBINSON”

but doesn’t say anything else. The comment was added in
2013

and is still there in 2021, but the code itself is older.

libpng is also just complicated. As a very rough metric, running wc -l *.c
arm/*.c intel/*.c
in libpng’s repository counts 35182 lines of code (excluding
*.h header files). Running wc -l std/png/*.wuffs in Wuffs’ repository
counts 2110 lines. The former library admittedly implements an encoder, not
just a decoder, but even after halving the first number, it’s still an 8x
ratio.

Patching zlib

I tried patching zlib-the-library a few years
ago
but it’s trickier than I first
thought, because of the inflateBack API issue mentioned above.

In any case, other people have already done this. Both
zlib-ng/zlib-ng and
cloudflare/zlib are zlib-the-library
forks with performance patches. Those patches (as well as those in Chromium’s
copy of
zlib-the-library
)
include optimization ideas similar to those presented here, as well as other
techniques for the encoder side.

Building zlib-ng from source is straightforward:

$ git clone https://github.com/zlib-ng/zlib-ng.git
$ mkdir zlib-ng/build
$ cd zlib-ng/build
$ cmake -DCMAKE_BUILD_TYPE=Release -DZLIB_COMPAT=On ..
$ make

With the test/c/std/png.c program (see Reproduction below),
running LD_LIBRARY_PATH=/the/path/to/zlib-ng/build ./a.out -bench shows that
libpng with zlib-ng (the second set of numbers below) is a little faster
but not a lot faster than with vanilla zlib (the first set of numbers below).

libpng_decode_19k_8bpp                            58.0MB/s ± 0%  1.00x
libpng_decode_40k_24bpp                           73.1MB/s ± 0%  1.00x
libpng_decode_77k_8bpp                             177MB/s ± 0%  1.00x
libpng_decode_552k_32bpp_ignore_checksum           146MB/s ± 0%  (†)
libpng_decode_552k_32bpp_verify_checksum           146MB/s ± 0%  1.00x
libpng_decode_4002k_24bpp                          104MB/s ± 0%  1.00x

libpng                                                  1.00x to 1.00x

----

zlibng_decode_19k_8bpp/gcc10                      63.8MB/s ± 0%  1.10x
zlibng_decode_40k_24bpp/gcc10                     74.1MB/s ± 0%  1.01x
zlibng_decode_77k_8bpp/gcc10                       189MB/s ± 0%  1.07x
zlibng_decode_552k_32bpp_ignore_checksum/gcc10                 skipped
zlibng_decode_552k_32bpp_verify_checksum/gcc10     177MB/s ± 0%  1.21x
zlibng_decode_4002k_24bpp/gcc10                    113MB/s ± 0%  1.09x

zlibng                                                  1.01x to 1.21x

cloudflare/zlib was forked from zlib-the-library version 1.2.8. Pointing
LD_LIBRARY_PATH to its libz.so.1 makes ./a.out fail with version
'ZLIB_1.2.9' not found (required by /lib/x86_64-linux-gnu/libpng16.so.16)
.

Patching Go or Rust

Both Go and Rust are successful, modern and memory-safe programming languages
with significant adoption. However, for existing C/C++ projects, it is easier
to incorporate Wuffs-the-library, which is transpiled to C (and its C form is
checked into the repository). It’d be like using any other third-party C/C++
library, it’s just not hand-written C/C++. In comparison, integrating Go or
Rust code into a C/C++ project involves, at a minimum, setting up additional
compilers and other build tools.

Still, there may very well be some worthwhile follow-up performance work for Go
or Rust’s PNG implementations, based on techniques discussed in this post. For
example, neither Go or Rust’s Adler-32 implementations are SIMD-accelerated. It
may also be worth trying the 8-Byte-Chunk Input and 8-Byte-Chunk Output
techniques. Go’s DEFLATE implementation reads only one byte at a
time
.
Rust’s miniz_oxide reads four bytes at a
time

and four is bigger than one, but eight is even bigger still. As far as I can
tell, neither Go or Rust’s PNG decoder zlib-decompress all-at-once.

Memory Safety

Also, unlike Go or Rust, Wuffs’ memory
safety

is enforced at compile time, not by inserting runtime checks that e.g. the i
in a[i] is inside bounds or that (x + y) doesn’t overflow a u32. Dart and
Rust compilers can elide a few of these checks, particularly when iterating with a
uniform rep entry to sample, nonetheless e.g. decoding DEFLATE codes devour a variable
assortment of bytes per iteration.

Runtime safety checks can rep an have an effect on on effectivity. I like Zig’s “Efficiency and
Security: Interact
Two”

motto nonetheless, in inequity to Zig, Wuffs doesn’t rep separate “Inaugurate Quickly” and “Inaugurate
Obliging” construct modes.
There’s comely one Wuffs “Inaugurate” configuration (poke -O3 to your C compiler)
and it’s each speedy and prime quality on the an similar time.

Even so, when dealing with untrusted (third party) PNG photographs, sandboxing and a
multi-direction of structure can current additional defence in depth. Wuffs’
occasion/convert-to-nia program converts from image codecs love PNG to an
easily-parsed Naïve Image
Structure

and, on Linux, runs in a self-imposed SECCOMP_MODE_STRICT
sandbox
.

Conclusion

Wuffs model
0.3.0-beta.1
has
comely been prick and it accommodates the quickest, most protected PNG decoder inside the realm. Search
the Wuffs instance
programs
for the way
to rep it. Its PNG decoder wouldn't pink meat up coloration areas or gamma correction
but (recount Wuffs enlighten 39 while you occur to
care), nonetheless a few of you may possibly possibly additionally nonetheless acquire it treasured at this early stage.


Appendix (Benchmark Numbers)

libpng capacity the /usr/lib/x86_64-linux-gnu/libpng16.so create that comes on
my Debian Bullseye machine.

libpng_decode_19k_8bpp                            58.0MB/s ± 0%  1.00x
libpng_decode_40k_24bpp                           73.1MB/s ± 0%  1.00x
libpng_decode_77k_8bpp                             177MB/s ± 0%  1.00x
libpng_decode_552k_32bpp_ignore_checksum           146MB/s ± 0%  (†)
libpng_decode_552k_32bpp_verify_checksum           146MB/s ± 0%  1.00x
libpng_decode_4002k_24bpp                          104MB/s ± 0%  1.00x

libpng                                                  1.00x to 1.00x

----

wuffs_decode_19k_8bpp/clang9                       131MB/s ± 0%  2.26x
wuffs_decode_40k_24bpp/clang9                      153MB/s ± 0%  2.09x
wuffs_decode_77k_8bpp/clang9                       472MB/s ± 0%  2.67x
wuffs_decode_552k_32bpp_ignore_checksum/clang9     370MB/s ± 0%  2.53x
wuffs_decode_552k_32bpp_verify_checksum/clang9     357MB/s ± 0%  2.45x
wuffs_decode_4002k_24bpp/clang9                    156MB/s ± 0%  1.50x

wuffs_decode_19k_8bpp/gcc10                        136MB/s ± 1%  2.34x
wuffs_decode_40k_24bpp/gcc10                       162MB/s ± 0%  2.22x
wuffs_decode_77k_8bpp/gcc10                        486MB/s ± 0%  2.75x
wuffs_decode_552k_32bpp_ignore_checksum/gcc10      388MB/s ± 0%  2.66x
wuffs_decode_552k_32bpp_verify_checksum/gcc10      373MB/s ± 0%  2.55x
wuffs_decode_4002k_24bpp/gcc10                     164MB/s ± 0%  1.58x

wuffs                                                   1.50x to 2.75x

----

libspng_decode_19k_8bpp/clang9                    59.3MB/s ± 0%  1.02x
libspng_decode_40k_24bpp/clang9                   78.4MB/s ± 0%  1.07x
libspng_decode_77k_8bpp/clang9                     189MB/s ± 0%  1.07x
libspng_decode_552k_32bpp_ignore_checksum/clang9   236MB/s ± 0%  1.62x
libspng_decode_552k_32bpp_verify_checksum/clang9   203MB/s ± 0%  1.39x
libspng_decode_4002k_24bpp/clang9                  110MB/s ± 0%  1.06x

libspng_decode_19k_8bpp/gcc10                     59.6MB/s ± 0%  1.03x
libspng_decode_40k_24bpp/gcc10                    77.5MB/s ± 0%  1.06x
libspng_decode_77k_8bpp/gcc10                      189MB/s ± 0%  1.07x
libspng_decode_552k_32bpp_ignore_checksum/gcc10    223MB/s ± 0%  1.53x
libspng_decode_552k_32bpp_verify_checksum/gcc10    194MB/s ± 0%  1.33x
libspng_decode_4002k_24bpp/gcc10                   109MB/s ± 0%  1.05x

libspng                                                 1.02x to 1.62x

----

lodepng_decode_19k_8bpp/clang9                    65.1MB/s ± 0%  1.12x
lodepng_decode_40k_24bpp/clang9                   72.1MB/s ± 0%  0.99x
lodepng_decode_77k_8bpp/clang9                     222MB/s ± 0%  1.25x
lodepng_decode_552k_32bpp_ignore_checksum/clang9               skipped
lodepng_decode_552k_32bpp_verify_checksum/clang9   162MB/s ± 0%  1.11x
lodepng_decode_4002k_24bpp/clang9                 70.5MB/s ± 0%  0.68x

lodepng_decode_19k_8bpp/gcc10                     61.1MB/s ± 0%  1.05x
lodepng_decode_40k_24bpp/gcc10                    62.5MB/s ± 1%  0.85x
lodepng_decode_77k_8bpp/gcc10                      176MB/s ± 0%  0.99x
lodepng_decode_552k_32bpp_ignore_checksum/gcc10                skipped
lodepng_decode_552k_32bpp_verify_checksum/gcc10    139MB/s ± 0%  0.95x
lodepng_decode_4002k_24bpp/gcc10      

Read More

Similar Products:

Recent Content