To detect headlines (see Issue #13) also the font size and style (bold/italic) should be extracted. See https://stackoverflow.com/questions/39324626/get-font-size-in-python-with-tesseract-and-pyocr and https://github.com/tesseract-ocr/tesseract/issues/1074 and in the issue especially the comment https://github.com/tesseract-ocr/tesseract/issues/1074#issuecomment-965063440.