Probieren OCR auf pdf in r und es gibt mir den Fehler. Nach dem Ausführen des Codes wird auch die Datei "i.txt" generiert, aber der Fehler wird immer noch angezeigt.Fehler beim Ausführen von OCR auf PDF in r
pdftoppm version 4.00
Copyright 1996-2017 Glyph & Cog, LLC
Usage: pdftoppm [options] <PDF-file> <PPM-root>
-f <int> : first page to print
-l <int> : last page to print
-r <number> : resolution, in DPI (default is 150)
-mono : generate a monochrome PBM file
-gray : generate a grayscale PGM file
-freetype <string>: enable FreeType font rasterizer: yes, no
-aa <string> : enable font anti-aliasing: yes, no
-aaVector <string>: enable vector anti-aliasing: yes, no
-opw <string> : owner password (for encrypted files)
-upw <string> : user password (for encrypted files)
-q : don't print any messages or errors
-cfg <string> : configuration file to use in place of .xpdfrc
-v : print copyright and version info
-h : print usage information
-help : print usage information
--help : print usage information
-? : print usage information
convert.exe: unable to open image '*.ppm': Invalid argument @ error/blob.c/OpenBlob/3146.
convert.exe: no images defined `D:/PDF_OCR_File/test.pdf.tif' @ error/convert.c/ConvertImageCommand/3275.
Tesseract Open Source OCR Engine v3.05.01 with Leptonica
Error in fopenReadStream: file not found
Error in findFileFormat: image file not found
Error during processing.
[[1]]
[1] FALSE
Warning messages:
1: running command 'C:\Windows\system32\cmd.exe /c "D:/Software_for_PDF_OCR/xpdf-tools-win-4.00/bin64/pdftoppm.exe D:/PDF_OCR_File/test.pdf -f 1 -l 2 -r 600 ocrbook"' had status 99
2: In shell(shQuote(paste0("D:/Software_for_PDF_OCR/xpdf-tools-win-4.00/bin64/pdftoppm.exe ", :
'"D:/Software_for_PDF_OCR/xpdf-tools-win-4.00/bin64/pdftoppm.exe D:/PDF_OCR_File/test.pdf -f 1 -l 2 -r 600 ocrbook"' execution failed with error code 99
3: running command 'C:\Windows\system32\cmd.exe /c "D:/Software_for_PDF_OCR/ImageMagick-7.0.7-Q16/convert.exe *.ppm D:/PDF_OCR_File/test.pdf.tif"' had status 1
4: In shell(shQuote(paste0("D:/Software_for_PDF_OCR/ImageMagick-7.0.7-Q16/convert.exe *.ppm ", :
'"D:/Software_for_PDF_OCR/ImageMagick-7.0.7-Q16/convert.exe *.ppm D:/PDF_OCR_File/test.pdf.tif"' execution failed with error code 1
5: running command 'C:\Windows\system32\cmd.exe /c "D:/Software_for_PDF_OCR/Tesseract-OCR/tesseract.exe D:/PDF_OCR_File/test.pdf.tif D:/PDF_OCR_File/test.pdf -l eng"' had status 1
6: In shell(shQuote(paste0("D:/Software_for_PDF_OCR/Tesseract-OCR/tesseract.exe ", :
'"D:/Software_for_PDF_OCR/Tesseract-OCR/tesseract.exe D:/PDF_OCR_File/test.pdf.tif D:/PDF_OCR_File/test.pdf -l eng"' execution failed with error code 1
7: In file.remove(paste0(i, ".tiff")) :
cannot remove file 'D:/PDF_OCR_File/test.pdf.tiff', reason 'No such file or directory'
Mein setwd() ist diese "D:/PDF_OCR_File"
Dies ist der Code, auf dem ich Fehler
dest <- "D:/PDF_OCR_File"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
sapply(myfiles, FUN = function(i){
file.rename(from = i, to = paste0(dirname(i), "/", gsub(" ", "", basename(i))))
})
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i){
shell(shQuote(paste0("D:/Software_for_PDF_OCR/xpdf-tools-win-4.00/bin64/pdftoppm.exe ", i, " -f 1 -l 2 -r 600 ocrbook")))
# convert ppm to tif ready for tesseract
shell(shQuote(paste0("D:/Software_for_PDF_OCR/ImageMagick-7.0.7-Q16/convert.exe *.ppm ", i, ".tif")))
# convert tif to text file
shell(shQuote(paste0("D:/Software_for_PDF_OCR/Tesseract-OCR/tesseract.exe ", i, ".tif ", i, " -l eng")))
# delete tif file
file.remove(paste0(i, ".tiff"))
})
Ich weiß nicht, wo es ist immer falsch, oder welchen Fehler ich mache. Jeder Vorschlag wird hilfreich sein, Danke.
Was gibt 'file.exists (myfiles [[1]])' dir? –
> file.exists (myfiles [[1]]) gibt TRUE – deepesh
Sieht aus wie der PDF-ppm-Befehl irgendwie fehlgeschlagen ist, so dass der nächste Befehl fehlschlägt. Versuchen Sie, den ersten Befehl im Terminal zu erhalten. Sie können das 'magick' Paket für OCR verwenden –