GitHub - ndholmes/magazine-split: Quick python script to split and recombine magazine pages from sheet-fed scanning

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.txt		README.txt
cic-split.sh		cic-split.sh
ctcb-nocrop-split.sh		ctcb-nocrop-split.sh
ctcb-split.sh		ctcb-split.sh
de-split.sh		de-split.sh
fcj-split.sh		fcj-split.sh
febt-split.sh		febt-split.sh
imagesplit-bigprn-150dpi.py		imagesplit-bigprn-150dpi.py
imagesplit-bigprn.py		imagesplit-bigprn.py
imagesplit-facingpages.py		imagesplit-facingpages.py
imagesplit-norotate.py		imagesplit-norotate.py
imagesplit-rmc.py		imagesplit-rmc.py
imagesplit.py		imagesplit.py
mil-split.sh		mil-split.sh
nmra-split.sh		nmra-split.sh
obx-split.sh		obx-split.sh
plej-split.sh		plej-split.sh
pne-bigsplit-300dpi.sh		pne-bigsplit-300dpi.sh
pne-bigsplit.sh		pne-bigsplit.sh
pne-split-1side.sh		pne-split-1side.sh
pne-split.sh		pne-split.sh
rgmhs-split.sh		rgmhs-split.sh
rmc-split.sh		rmc-split.sh
rp-split.sh		rp-split.sh
rri-split.sh		rri-split.sh
rrtp-split.sh		rrtp-split.sh
x22s-split-2side.sh		x22s-split-2side.sh
x22s-split-8x11-2side.sh		x22s-split-8x11-2side.sh
x22s-split.sh		x22s-split.sh

Repository files navigation

1) Set up a venv with python 3.11
(not strictly necessary, but makes some things much happier)

You'll at least need to install the following packages:
ocrmypdf (which will need ghostscript 9.50 or later, which is an isssue on ubuntu 18.04)
Pillow


3) Expand imagemagick's very limited limits
Edit /etc/ImageMagick-6/policy.xml

There's a bunch of resource limits at the top. Just make them huge like this:

  <policy domain="resource" name="memory" value="2GiB"/>
  <policy domain="resource" name="map" value="2GiB"/>
  <policy domain="resource" name="width" value="16KP"/>
  <policy domain="resource" name="height" value="16KP"/>
  <policy domain="resource" name="area" value="2GiB"/>
  <policy domain="resource" name="disk" value="2GiB"/>

You'll also need to tell it that it can work on PDFs and PSs.  Otherwise it thinks (somewhat correctly) that
ghostscript is a security risk.  Down towards the bottom, you'll find policy lines about PDF and PS.  Remove them
and add this in their place:
  <policy domain="coder" rights="read|write" pattern="PDF|PS" />



3) Install tesseract OCR (v5)
sudo add-apt-repository ppa:alex-p/tesseract-ocr5
sudo apt update
sudo apt install tesseract-ocr