PDF scientific paper translation and bilingual comparison based on font rules and deep learning, preserving formula and figure layout.
pip install pdf2zhExecute the translation command in the command line to generate the translated document example-zh.pdf and the bilingual document example-dual.pdf in the current directory.
pdf2zh example.pdfpdf2zh example.pdf -p 1-3,5pdf2zh example.pdf -li en -lo japdf2zh example.pdf -s gemma2Hint: Starting from \ufb00 is English style ligature.
pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"Document merging: PyMuPDF
Document parsing: Pdfminer.six
Document extraction: MinerU
Multi-threaded translation: MathTranslate
Layout parsing: DocLayout-YOLO

