What'd be the good/better pdf to text converter tool on here, Arch openRC (help inform exact package name and how to use it?
package "poppler" has an executable that works pretty well, called "pdftotext"
and if you looking for an OCR program, "tesseract" does a good job.
these tools have gui frontend.
to use them, read man page or tldr or arch wiki
$ tldr pdftotext
pdftotext
Convert PDF files to plain text format.
More information: https://www.xpdfreader.com/pdftotext-man.html.
- Convert `filename.pdf` to plain text and print it to `stdout`:
pdftotext filename.pdf -
- Convert `filename.pdf` to plain text and save it as `filename.txt`:
pdftotext filename.pdf
- Convert `filename.pdf` to plain text and preserve the layout:
pdftotext -layout filename.pdf
- Convert `input.pdf` to plain text and save it as `output.txt`:
pdftotext input.pdf output.txt
- Convert pages 2, 3 and 4 of `input.pdf` to plain text and save them as `output.txt`:
pdftotext -f 2 -l 4 input.pdf output.txt
$ tldr tesseract
tesseract
OCR (Optical Character Recognition) engine.
More information: https://github.com/tesseract-ocr/tesseract.
- Recognize text in an image and save it to `output.txt` (the `.txt` extension is added automatically):
tesseract image.png output
- Specify a custom language (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German):
tesseract -l deu image.png output
- List the ISO 639-2 codes of available languages:
tesseract --list-langs
- Specify a custom page segmentation mode (default is 3):
tesseract -psm 0_to_10 image.png output
- List page segmentation modes and their descriptions:
tesseract --help-psm