Docx zu Pdf mit ersetzten Zeichen

-1

Ich habe eine Docx-Datei mit chinesischen Schriftzeichen und anderen asiatischen Sprachen. Ich bin in der Lage, die docx-Datei in eine PDF-Datei perfekt auf meinem Laptop mit den chinesischen Schriftzeichen korrekt in die PDF-Datei zu konvertieren, aber wenn derselbe Code als Runable jar auf dem Linux-Server ausgeführt wird, werden die chinesischen Zeichen durch # Symbol ersetzt. Kann mir bitte jemand mit diesem Problem helfen? Vielen Dank für die Hilfe im Voraus. Der Java-Code unterDocx zu Pdf mit ersetzten Zeichen

gegeben

public static void main(String[] args) throws Exception { 

    try { 

     Docx4jProperties.getProperties().setProperty("docx4j.Log4j.Configurator.disabled", "true"); 
     Log4jConfigurator.configure(); 
     org.docx4j.convert.out.pdf.viaXSLFO.Conversion.log.setLevel(Level.OFF); 

     System.out.println("Getting input Docx File"); 
     InputStream is = new FileInputStream(new File(
       "C:/Users/nithins/Documents/plugin docx to pdf/other documents/Contains Complex Fonts Verified.docx")); 
     WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is); 
     wordMLPackage.setFontMapper(new IdentityPlusMapper()); 

     System.out.println("Setting File Encoding"); 
     System.setProperty("file.encoding", "Identity-H"); 
     System.out.println("Generating PDF file"); 

     org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
       wordMLPackage); 
     File outFile = new File(
       "C:/Users/nithins/Documents/plugin docx to pdf/other documents/Contains Complex Fonts Verified.pdf"); 
     OutputStream os = new FileOutputStream(outFile); 
     c.output(os, new PdfSettings()); 
     os.close(); 

     System.out.println("Output pdf file generated"); 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 

} 

public static String changeExtensionToPdf(String path) { 
    int markerIndex = path.lastIndexOf(".docx"); 
    String pdfFile = path.substring(0, markerIndex) + ".pdf"; 
    return pdfFile; 
}

Quelle

2017-05-04 nithin subramanian

Sie verwenden eine Java-Lösung für die docx zu pdf Konvertierung. Das ist alles was du uns erzählst. Alles was wir sagen können ist, dass Sie in dieser Lösung etwas falsch machen. – mkl

Tut mir leid, ich habe gerade die Frage mit meinem Java-Code bearbeitet –

Ok, also verwenden Sie [tag: docx4j]. Ich habe dieses Tag hinzugefügt. Leider kenne ich dieses Produkt überhaupt nicht. Nur eine Anmerkung: 'System.setProperty (" file.encoding "," Identity-H ")" sollte überhaupt keinen Sinn ergeben, ** Identity-H ** ist eine PDF-interne Sache; Die Systemeigenschaft "file.encoding" bezieht sich im Allgemeinen auf Textdateien und daher nicht auf PDFs, bei denen es sich schließlich um Binärdateien und nicht um Textdateien handelt. Außerdem ist es komisch, dass Sie den Log-Level auf "Aus" setzen, obwohl Sie immer noch Probleme haben, denn es könnte Log-Ausgaben geben, die Ihnen helfen könnten. – mkl

Kopiert von docx4j der "Getting Started" Dokumentation:

docx4j can only use fonts which are available to it. 

These fonts come from 2 sources: 
• those installed on the computer 
• those embedded in the document 

Note that Word silently performs font substitution. When you open an existing document in 
Word, and select text in a particular font, the actual font you see on the screen won't be 
the font reported in the ribbon if it is not installed on your computer or embedded in the 
document. To see whether Word 2007 is substituting a font, go into Word Options 
> Advanced > Show Document Content and press the "Font Substitution" button. 

Word's font substitution information is not available to docx4j. As a developer, you 3 
options: 
• ensure the font is installed or embedded 
• tell docx4j which font to use instead, or 
• allow docx4j to fallback to a default font 

To embed a font in a document, open it in Word on a computer which has the font installed 
(check no substitution is occuring), and go to Word Options > Save > Embed Fonts in File. 

If you want to tell docx4j to use a different font, you need to add a font mapping. The 
FontMapper interface is used to do this. 

On a Windows computer, font names for installed fonts are mapped 1:1 to the corresponding 
physical fonts via the IdentityPlusMapper. 

A font mapper contains Map<String, PhysicalFont>; to add a font mapping, as per the example in the ConvertOutPDF sample: 
    // Set up font mapper 
    Mapper fontMapper = new IdentityPlusMapper(); 
    wordMLPackage.setFontMapper(fontMapper); 

    // .. example of mapping font Times New Roman which doesn't have certain Arabic glyphs 
    // eg Glyph "ي" (0x64a, afii57450) not available in font "TimesNewRomanPS-ItalicMT". 
    // eg Glyph "ج" (0x62c, afii57420) not available in font "TimesNewRomanPS-ItalicMT". 
    // to a font which does 
    PhysicalFont font 
      = PhysicalFonts.get("Arial Unicode MS"); 
     // make sure this is in your regex (if any)!!! 
    if (font!=null) { 
     fontMapper.put("Times New Roman", font); 
     fontMapper.put("Arial", font); 
    } 

You'll see the font names if you configure log4j debug level logging for 
org.docx4j.fonts.PhysicalFonts

Wenn Sie org.docx4j.fonts Anmeldung biegen, sollte es Ihnen über die fehlenden Glyphen sagen . Siehe https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/fonts/GlyphCheck.java

Quelle

2017-05-07 11:06:16 JasonPlutext

Docx zu Pdf mit ersetzten Zeichen

Antwort

Verwandte Themen