Horrifying PDF Experiments

Last modified on January 03, 2021

Contemporary: For further particulars about this hack and the contrivance through which it actually works, investigate cross-check
my speak at
!!con 2020, "Enjoying Breakout... interior a
PDF!!"

If you occur to should not viewing it appropriate now, attempt the
breakout.pdf file
in Chrome.

Like a lot of you, I in the slightest degree instances perception of PDF as principally a benign
format, the place the creator lays out some textual content and graphics, after which the
PDF sits in entrance of the reader and does not manufacture the rest. I heard
offhand about vulnerabilities in Adobe Reader years in the past, however did not
salvage too necessary about why or how they could perchance merely exist.

That become as quickly as why Adobe made PDF earlier than each factor[^ps], however I salvage we now have
established that or not it's not reasonably merely anymore. The
1,310-web web page PDF specification (actually a actually clear and
attention-grabbing learn) specifies a weird amount of performance,
alongside facet:

  • Embedded Flash
  • Audio and video annotations
  • 3D object annotations (!)
  • Web bewitch metadata
  • Custom math capabilities (alongside facet a Turing-incomplete subset of
    PostScript
    )
  • Effectively off textual content kinds using a subset of XHTML and CSS
  • File and file-sequence attachments

however most curiously...

  • JavaScript scripting, using a
    totally various frequent library from the browser one

Granted, most PDF readers (furthermore Adobe Reader) produce not enforce most
of this stuff. Nonetheless Chrome does enforce JavaScript! If you occur to start a
PDF file fancy this one in Chrome, it will scuttle the scripts. I came across
this truth out after following
this weblog submit about be taught methods to salvage PDFs with JS.

There is a bewitch, though. Chrome handiest implements a tiny subset of the
tall Acrobat JavaScript API floor. The API implementation in
Chrome's PDFium reader largely consists of
stubs fancy these:

FX_BOOL Doc::addAnnot(IJS_Contextcc,
                           const CJS_Parameters& params,
                           CJS_Value& vRet,
                           CFX_WideString& sError) {
  // No longer supported.
  return TRUE;
}
FX_BOOL Doc::addField(IJS_Contextcc,
                           const CJS_Parameters& params,
                           CJS_Value& vRet,
                           CFX_WideString& sError) {
  // No longer supported.
  return TRUE;
}
FX_BOOL Doc::exportAsText(IJS_Contextcc,
                               const CJS_Parameters& params,
                               CJS_Value& vRet,
                               CFX_WideString& sError) {
  // Unsafe, not supported.
  return TRUE;
}

And I understand their design again -- that custom-made Adobe JavaScript API has
a fully gigantic floor dwelling. Scripts can supposedly manufacture
issues fancy salvage arbitrary database connections,
detect related shows, import exterior sources, and
manipulate 3D objects.

So we now have this odd self-discipline in Chrome: we will manufacture arbitrary
computation, however we now have this extraordinary, constrained API floor, the place
or not it's tense to fabricate I/O and salvage data between this system and the
particular person.[^situation][^es6]

It will almost definitely effectively presumably merely be possible to embed a C compiler right into a PDF by compiling it
to JS with Emscripten, for example, however then your C compiler has to
seize enter via a ghastly-text salvage area and spit its output assist
via a salvage area.

[^ps]: In truth, I received interested in PDF a pair weeks in the past which functionality that of of
PostScript; I'd been studying these random Don Hopkins posts about
NeWS, the system supposedly fancy
AJAX however carried out within the 80s on PostScript.

Satirically, PDF become as quickly as a
[reaction](https://en.wikipedia.org/wiki/Portable_Document_Format#PostScript)
to PostScript, which become as quickly as too expressive (being a fleshy
programming language) and too laborious to research and motive
about. PDF stays a broad enchancment there, I salvage, however
or not it's mute humorous the contrivance through which or not it's grown all these sides.

It is miles additionally actually attention-grabbing: fancy each prolonged-lived digital format
(I actually have a factor for the FAT filesystem, personally), PDF is itself
a roughly historic file. That you simply would possibly perchance perchance presumably search generations of
engineers, alongside facet issues that they necessary of their time, whereas
making an attempt to not spoil the rest already accessible.

[^situation]: I'm not apparent why Chrome even stricken to repeat the JS
runtime. They
took the PDF reader code from Foxit,
so perchance Foxit had some express consumer who relied on JavaScript
salvage validation?

[^es6]: Chrome additionally makes use of the equivalent runtime it does within the browser, even
though it does not repeat any browser APIs. That suggests you need to use
ES6 sides fancy double-arrow capabilities and Proxies, thus far as I
can repeat.

Breakout

So what is going on to we manufacture with the API floor that Chrome gives us?

I'm sorry, by the methodology, that the collision detection is not broad and
the sport straggle is inconsistent. (No longer actually the purpose, though!) I
ripped off a great deal of the sport from
an instructional.

The precept person-viewed I/O sides I'd perchance effectively presumably fetch in Chrome's
implementation of the PDF API have been in
Self-discipline.cpp.

That you simply would possibly perchance perchance not location the possess shade of a textual content area at
runtime, however you possibly can swap its bounds rectangle and
location its border vogue. That you simply would possibly perchance perchance not
learn the specific mouse utter, however you possibly can location mouse-enter
and mouse-go scripts on fields at PDF creation. And as effectively now you can not add
fields at runtime: you're caught with what you arrange within the PDF at
creation time.[^fortran]. I'm actually irregular why they selected these
express options.

So the PDF file is generated by a
script
which emits a bunch of textual content fields upfront, alongside facet sport sides:

  • Stir
  • Bricks
  • Ball
  • Fetch
  • Lives

Nonetheless we additionally manufacture a pair of hacks right here to salvage the sport to work correctly.

First, we emit a thin, extended 'band' textual content area for each column of the
lower half of of the show conceal. Some band will get a mouse-enter occasion at any time when
you progress your mouse alongside the x-axis, so the breakout toddle can flow into
as you progress your mouse.

And second, we emit a area referred to as 'whole' which covers the overall prime
half of of the show conceal. Chrome does not quiz the PDF show conceal to swap,
so when you occur to progress fields round in JS, you salvage beautiful deplorable artifacts. This
'whole' area solves that design again after we toggle it on and off at some point of
body rendering. That trick seems to strain Chrome to fascinating out the
artifacts.

Also, transferring a area seems to discard its
look slither. The
fancy arbitrary PDF-graphics look you selected goes away, and it
will get modified with a traditional crammed and bordered rectangle. So my sport
objects in whole rely on the
further environment friendly look traits dictionary. At
the very least, a possess shade specified there stays intact as a widget
strikes.

[^fortran]: It is fancy some stereotype of programming in susceptible-college
FORTRAN. It is miles a should should expose your whole variables upfront so the
compiler can statically allocate them.

Precious sources

  • PDF Reference, sixth mannequin
  • JavaScript for Acrobat API Reference
  • Brendan Zagaeski's
    Minimal PDF and
    Hand-coded PDF tutorial
  • PDF Within and Out
    has suited examples.
  • The pdfrw Python library is at
    exactly the best stage of abstraction for this roughly work. Plenty
    of libraries are too excessive-stage and repeat correct graphics
    operators. Generating the PDF data your self is means however barely of
    tense, which functionality that of or not it's essential to salvage the data construction codecs and
    byte offsets appropriate.

Read More

Similar Products:

Recent Content