summaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
-rw-r--r--debian/changelog6
-rw-r--r--debian/ocrmypdf.150
2 files changed, 45 insertions, 11 deletions
diff --git a/debian/changelog b/debian/changelog
index 9d1add11..5406d6c8 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+ocrmypdf (7.4.0-2) experimental; urgency=medium
+
+ * Regenerate manpage.
+
+ -- Sean Whitton <spwhitton@spwhitton.name> Fri, 04 Jan 2019 17:38:59 +0000
+
ocrmypdf (7.4.0-1) experimental; urgency=medium
* New upstream release.
diff --git a/debian/ocrmypdf.1 b/debian/ocrmypdf.1
index d9b3fd80..0468279f 100644
--- a/debian/ocrmypdf.1
+++ b/debian/ocrmypdf.1
@@ -1,5 +1,5 @@
-.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.6.
-.TH OCRMYPDF "1" "August 2018" "ocrmypdf 7.0.2" "User Commands"
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH OCRMYPDF "1" "January 2019" "ocrmypdf 7.4.0" "User Commands"
.SH NAME
ocrmypdf \- add an OCR text layer to PDF files
.SH DESCRIPTION
@@ -9,11 +9,12 @@ usage: ocrmypdf [\-h] [\-l LANGUAGE] [\-\-image\-dpi DPI]
[\-\-sidecar [FILE]] [\-\-version] [\-j N] [\-q] [\-v [VERBOSE]]
[\-\-title TITLE] [\-\-author AUTHOR] [\-\-subject SUBJECT]
[\-\-keywords KEYWORDS] [\-r] [\-\-remove\-background] [\-d] [\-c]
-[\-i] [\-\-oversample DPI] [\-f] [\-s] [\-\-skip\-big MPixels]
+[\-i] [\-\-oversample DPI] [\-\-remove\-vectors] [\-\-mask\-barcodes]
+[\-\-threshold] [\-f] [\-s] [\-\-redo\-ocr] [\-\-skip\-big MPixels]
[\-O {0,1,2,3}] [\-\-jpeg\-quality Q] [\-\-png\-quality Q]
-[\-\-max\-image\-mpixels MPixels] [\-\-tesseract\-config CFG]
-[\-\-tesseract\-pagesegmode PSM] [\-\-tesseract\-oem MODE]
-[\-\-pdf\-renderer {auto,hocr,sandwich}]
+[\-\-jbig2\-lossy] [\-\-max\-image\-mpixels MPixels]
+[\-\-tesseract\-config CFG] [\-\-tesseract\-pagesegmode PSM]
+[\-\-tesseract\-oem MODE] [\-\-pdf\-renderer {auto,hocr,sandwich}]
[\-\-tesseract\-timeout SECONDS]
[\-\-rotate\-pages\-threshold CONFIDENCE]
[\-\-pdfa\-image\-compression {auto,jpeg,lossless}]
@@ -126,12 +127,27 @@ in the final PDF. Might remove desired content.
\fB\-\-oversample\fR DPI
Oversample images to at least the specified DPI, to
improve OCR results slightly
+.TP
+\fB\-\-remove\-vectors\fR
+EXPERIMENTAL. Mask out any vector objects in the PDF
+so that they will not be included in OCR. This can
+eliminate false characters.
+.TP
+\fB\-\-mask\-barcodes\fR
+EXPERIMENTAL. Mask out any barcodes that appear in the
+PDF so they are not considered during OCR. Barcodes
+can introduce false characters into OCR.
+.TP
+\fB\-\-threshold\fR
+EXPERIMENTAL. Threshold image to 1bpp before sending
+it to Tesseract for OCR. Can improve OCR quality
+compared to Tesseract's thresholder.
.SS "OCR options:"
.IP
Control how OCR is applied
.TP
\fB\-f\fR, \fB\-\-force\-ocr\fR
-Rasterize any fonts or vector objects on each page,
+Rasterize any text or vector objects on each page,
apply OCR, and save the rastered output (this rewrites
the PDF)
.TP
@@ -141,6 +157,14 @@ include the page in final output; useful for PDFs that
contain a mix of images, text pages, and/or previously
OCRed pages
.TP
+\fB\-\-redo\-ocr\fR
+Attempt to detect and remove the hidden OCR layer from
+files that were previously OCRed with OCRmyPDF or
+another program. Apply OCR to text found in raster
+images. Existing visible text objects will not be
+changed. If there is no existing OCR, OCR will be
+added.
+.TP
\fB\-\-skip\-big\fR MPixels
Skip OCR on pages larger than the specified amount of
megapixels, but include skipped pages in final output
@@ -150,18 +174,22 @@ Control how the PDF is optimized after OCR
.TP
\fB\-O\fR {0,1,2,3}, \fB\-\-optimize\fR {0,1,2,3}
Control how PDF is optimized after processing:0 \- do
-not optimize;1 \- do safe, lossless optimizations
-(default);2 \- do lossy optimizations; 3 \- do
-aggressive lossy optimizations
+not optimize; 1 \- do safe, lossless optimizations
+(default); 2 \- do some lossy optimizations; 3 \- do
+aggressive lossy optimizations (including lossy JBIG2)
.TP
\fB\-\-jpeg\-quality\fR Q
Adjust JPEG quality level for JPEG optimization. 100
is best quality and largest output size; 1 is lowest
-quality and smallest output0 uses the default.
+quality and smallest output; 0 uses the default.
.TP
\fB\-\-png\-quality\fR Q
Adjust PNG quality level to use when quantizing PNGs.
Values have same meaning as with \fB\-\-jpeg\-quality\fR
+.TP
+\fB\-\-jbig2\-lossy\fR
+Enable JBIG2 lossy mode (better compression, not
+suitable for some use cases \- see documentation).
.SS "Advanced:"
.IP
Advanced options to control Tesseract's OCR behavior