summaryrefslogtreecommitdiffhomepage
path: root/docs/jbig2.rst
blob: c49f0e9c8e6f9366dc9d04401a12b801ec9c0e97 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
.. _jbig2:

============================
Installing the JBIG2 encoder
============================

Most Linux distributions do not include a JBIG2 encoder since JBIG2
encoding was patented for a long time. All known JBIG2 US patents have
expired as of 2017, but it is possible that unknown patents exist.

JBIG2 encoding is recommended for OCRmyPDF and is used to losslessly
create smaller PDFs. If JBIG2 encoding is not available, lower quality
encodings will be used.

JBIG2 decoding is not patented and is performed automatically by most
PDF viewers. It is widely supported and has been part of the PDF
specification since 2001.

On macOS, Homebrew packages jbig2enc and OCRmyPDF includes it by
default. The Docker image for OCRmyPDF also builds its own JBIG2 encoder
from source.

For all other Linux, you must build a JBIG2 encoder from source:

.. code-block:: bash

   git clone https://github.com/agl/jbig2enc
   cd jbig2enc
   ./autogen.sh
   ./configure && make
   [sudo] make install

.. _jbig2-lossy:

Lossy mode JBIG2
================

OCRmyPDF provides lossy mode JBIG2 as an advanced feature. Users should
`review the technical concerns with JBIG2 in lossy
mode <https://en.wikipedia.org/wiki/JBIG2#Disadvantages>`__
and decide if this feature is acceptable for their use case.

JBIG2 lossy mode does achieve higher compression ratios than any other
monochrome (bitonal) compression technology; for large text documents
the savings are considerable. JBIG2 lossless still gives great
compression ratios and is a major improvement over the older CCITT G4
standard. As explained above, there is some risk of substitution errors.

To turn on JBIG2 lossy mode, add the argument ``--jbig2-lossy``.
``--optimize {1,2,3}`` are necessary for the argument to take effect
also required. Also, a JBIG2 encoder must be installed as described in
the previous section.

*Due to an oversight, ocrmypdf v7.0 and v7.1 used lossy mode by
default.*