Der Grund, warum die PDFVisibleTextStripper von this answer verwiesen die OP nicht der Fall funktioniert, dass die Berechnung des Endes eines Zeichens Grundlinie end
im überschrieben processTextPosition
nicht Seitendrehung nicht berücksichtigt. obwohl, wenn Sie diese Methode ändern, um nur den Beginn jeden Zeichen Baseline-Test und das Ende ignorieren, es funktioniert ziemlich gut für das Dokument auf der Hand:
@Override
protected void processTextPosition(TextPosition text) {
Matrix textMatrix = text.getTextMatrix();
Vector start = textMatrix.transform(new Vector(0, 0));
PDGraphicsState gs = getGraphicsState();
Area area = gs.getCurrentClippingPath();
if (area == null || area.contains(start.getX(), start.getY()))
super.processTextPosition(text);
}
Damit processTextPosition
das Ergebnis der Textextraktion außer Kraft setzen (mit SortByPosition
auf true
) ist:
Profit & Loss 12 Month Recap
Property: 8151 W. 183rd Street
Monthly recap 05/01/16 - 04/30/17 (cash basis)
MAY 16 JUN 16 JUL 16 AUG 16 SEP 16 OCT 16 NOV 16 DEC 16 JAN 17 FEB 17 MAR 17 APR 17 TOTAL
INCOME
4000 RENTAL INCOME
4001 Base Rent 343,002.59 38,045.11 38,045.11 38,045.11 66,081.36 122,153.86 66,081.36 38,045.11 0.00 76,090.22 38,598.49 66,634.74 930,823.06
4004 Prepaid Rent Inco -165,742.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 38,045.11 -38,045.11 0.00 0.00 -165,742.50
4000 Total RENTAL INC 177,260.09 38,045.11 38,045.11 38,045.11 66,081.36 122,153.86 66,081.36 38,045.11 38,045.11 38,045.11 38,598.49 66,634.74 765,080.56
4200 INCOME CHARGEB
4205 Property Tax Reco 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 4,250.00 3,696.62 4,250.00 50,446.62
4210 CAM Recoveries 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 4,750.00 57,000.00
4200 Total INCOME CH 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 9,000.00 8,446.62 9,000.00 107,446.62
4600 OTHER INCOME
4610 Late/NSF Fees 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1,394.72 3,828.61 0.00 0.00 0.00 5,223.33
4600 Total OTHER INC 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1,394.72 3,828.61 0.00 0.00 0.00 5,223.33
TOTAL INCOME 186,260.09 47,045.11 47,045.11 47,045.11 75,081.36 131,153.86 75,081.36 48,439.83 50,873.72 47,045.11 47,045.11 75,634.74 877,750.51
EXPENSE
6000 PROFESSIONAL FE
6010 Professional Fees 0.00 0.00 0.00 2,500.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2,500.00
6020 Legal Fees 0.00 0.00 0.00 4,592.71 0.00 1,466.33 1,703.35 2,006.00 0.00 685.96 4,368.50 0.00 14,822.85
6000 Total PROFESSIO 0.00 0.00 0.00 7,092.71 0.00 1,466.33 1,703.35 2,006.00 0.00 685.96 4,368.50 0.00 17,322.85
6100 UTILITIES
6105 Water & Sewer 0.00 0.00 0.00 21.21 0.00 0.00 25.81 0.00 0.00 31.91 0.00 0.00 78.93
6110 Electricity 1,000.91 358.23 390.43 350.71 353.69 0.00 666.39 381.97 486.85 449.62 480.21 486.81 5,405.82
6125 Trash Removal 229.54 231.34 232.56 232.78 231.66 240.94 240.94 241.40 241.40 518.97 259.18 0.00 2,900.71
6100 Total UTILITIES 1,230.45 589.57 622.99 604.70 585.35 240.94 933.14 623.37 728.25 1,000.50 739.39 486.81 8,385.46
6200 REPAIR & MAINTEN
6210 Field & Grounds - 3,094.00 0.00 0.00 2,313.84 1,009.50 0.00 1,439.58 1,302.75 600.00 0.00 0.00 1,909.73 11,669.40
6211 Irrigation/Sprinkle 0.00 0.00 0.00 0.00 0.00 1,121.08 350.00 0.00 0.00 0.00 0.00 0.00 1,471.08
6215 Landscape/Lawn 565.71 565.71 565.71 565.71 565.71 565.71 1,165.71 0.00 0.00 0.00 0.00 495.00 5,054.97
6220 Sanitary Sewers 0.00 0.00 0.00 950.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 950.00
6221 Storm Drains 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2,500.00 0.00 2,500.00
6223 Snow Removal 1,365.00 3,440.00 0.00 0.00 0.00 0.00 0.00 1,350.00 4,440.00 4,106.00 790.00 2,340.00 17,831.00
6228 Ceiling Tiles 0.00 0.00 0.00 0.00 53.30 0.00 0.00 0.00 0.00 0.00 0.00 0.00 53.30
6231 Building - General 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 634.65 634.65
6233 Roof/Flashing 1,840.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 764.00 0.00 2,604.00
6234 Electrical Repairs 0.00 0.00 0.00 395.00 0.00 0.00 960.00 90.00 0.00 0.00 0.00 0.00 1,445.00
6236 Plumbing Repairs 0.00 0.00 3,316.59 0.00 2,315.95 0.00 930.00 812.17 0.00 0.00 0.00 0.00 7,374.71
6237 Fire & Life Safety 0.00 0.00 0.00 0.00 0.00 150.00 0.00 0.00 660.00 0.00 0.00 1,550.00 2,360.00
6238 Lighting Supplies 0.00 0.00 0.00 0.00 0.00 0.00 875.00 193.05 0.00 0.00 0.00 0.00 1,068.05
Profit & Loss 12 Month Recap 05/02/17 11:13 AM Page 1 of rentmanager.com - property management systems rev.12.180
MAY 16 JUN 16 JUL 16 AUG 16 SEP 16 OCT 16 NOV 16 DEC 16 JAN 17 FEB 17 MAR 17 APR 17 TOTAL
6240 Lock & Key 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 14.59 0.00 0.00 0.00 14.59
6242 HVAC Expense 4,375.00 0.00 1,370.00 2,043.25 0.00 0.00 0.00 415.00 1,326.00 1,835.00 0.00 0.00 11,364.25
6251 Pest Control 0.00 71.07 0.00 71.07 0.00 0.00 71.07 71.07 0.00 71.07 0.00 71.07 426.42
6200 Total REPAIR & M 11,239.71 4,076.78 5,252.30 6,338.87 3,944.46 1,836.79 5,791.36 4,234.04 7,040.59 6,012.07 4,054.00 7,000.45 66,821.42
6300 JANITORIAL
6310 Janitorial Services 1,935.00 1,935.00 1,935.00 1,935.00 1,935.00 0.00 3,870.00 1,935.00 1,935.00 1,935.00 1,995.00 1,995.00 23,340.00
6320 Janitorial Supplies 79.74 260.01 79.74 90.84 113.14 0.00 170.58 0.00 365.61 90.84 0.00 153.01 1,403.51
6300 Total JANITORIAL 2,014.74 2,195.01 2,014.74 2,025.84 2,048.14 0.00 4,040.58 1,935.00 2,300.61 2,025.84 1,995.00 2,148.01 24,743.51
6400 PAYROLL
6410 P/R Salaries - Offi 2,167.72 2,190.43 2,213.14 2,213.14 1,512.40 2,342.28 2,224.93 2,107.58 2,107.58 2,107.58 2,190.78 2,344.16 25,721.72
6412 P/R Taxes - Office 179.87 167.56 169.30 169.30 115.70 179.18 170.21 161.23 238.16 231.10 199.89 196.42 2,177.92
6420 Employee Insuran 76.06 76.14 76.22 199.23 104.30 161.06 152.29 137.91 139.14 139.14 143.91 175.02 1,580.42
6421 Employee Benefit 3.54 2.40 87.37 141.59 35.59 114.13 111.50 110.15 89.47 107.81 114.80 49.60 967.95
6423 Workers Compens 42.50 42.94 37.74 32.10 21.93 33.96 32.26 30.56 30.56 30.56 31.76 33.98 400.85
6400 Total PAYROLL 2,469.69 2,479.47 2,583.77 2,755.36 1,789.92 2,830.61 2,691.19 2,547.43 2,604.91 2,616.19 2,681.14 2,799.18 30,848.86
6500 TAXES INSURANCE
6510 Real Estate Tax E 69,570.07 0.00 0.00 0.00 0.00 69,570.07 0.00 0.00 0.00 0.00 0.00 0.00 139,140.14
6520 Insurance Expens 2,078.00 2,704.50 0.00 2,704.50 0.00 0.00 2,704.50 0.00 0.00 2,704.50 0.00 0.00 12,896.00
6500 Total TAXES INSU 71,648.07 2,704.50 0.00 2,704.50 0.00 69,570.07 2,704.50 0.00 0.00 2,704.50 0.00 0.00 152,036.14
6600 Property Manageme 9,575.44 8,381.70 2,117.03 2,117.03 2,117.03 3,378.66 5,901.92 3,378.66 2,179.79 2,000.00 3,829.06 2,117.03 47,093.35
6650 Receiver Fees 6,625.00 6,125.00 0.00 0.00 6,875.00 0.00 7,062.50 8,375.00 0.00 0.00 8,875.00 0.00 43,937.50
6700 GENERAL & ADMIN
6710 PM/Work Order S 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 95.00 1,140.00
6720 Postage/Messen 63.58 0.00 7.59 9.64 20.63 5.98 6.99 0.00 17.38 7.21 14.36 10.98 164.34
6725 Office Supplies 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 148.88 148.88
6735 Office Equipment 0.00 0.00 0.00 0.00 0.00 0.00 0.00 218.40 0.00 0.00 0.00 0.00 218.40
6740 Telephone 21.33 0.00 11.54 15.00 21.12 8.76 9.77 0.00 13.19 11.96 3.14 7.88 123.69
6760 Auto Mileage & Ex 100.44 0.00 68.75 140.24 104.14 61.29 142.59 29.00 56.04 0.00 23.14 0.00 725.63
6770 Leasing & Maint. O 0.00 0.00 0.00 0.00 0.00 0.00 75.00 0.00 0.00 0.00 0.00 0.00 75.00
6780 Bank Fees 129.45 0.00 0.00 105.91 87.62 0.00 53.61 0.00 120.92 56.46 77.49 79.74 711.20
6700 Total GENERAL & 409.80 95.00 182.88 365.79 328.51 171.03 382.96 342.40 302.53 170.63 213.13 342.48 3,307.14
TOTAL EXPENSE 105,212.90 26,647.03 12,773.71 24,004.80 17,688.41 79,494.43 31,211.50 23,441.90 15,156.68 17,215.69 26,755.22 14,893.96 394,496.23
NOI 81,047.19 20,398.08 34,271.40 23,040.31 57,392.95 51,659.43 43,869.86 24,997.93 35,717.04 29,829.42 20,289.89 60,740.78 483,254.28
N/O EXPENSE
7100 NON-OPERATING E
7110 Lease Commissio 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 33,203.00 0.00 0.00 0.00 33,203.00
7130 Professional Fees 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1,276.00 0.00 0.00 1,276.00
7100 Total NON-OPER 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 33,203.00 1,276.00 0.00 0.00 34,479.00
TOTAL N/O EXPENSE 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 33,203.00 1,276.00 0.00 0.00 34,479.00
NET INCOME 81,047.19 20,398.08 34,271.40 23,040.31 57,392.95 51,659.43 43,869.86 24,997.93 2,514.04 28,553.42 20,289.89 60,740.78 448,775.28
Profit & Loss 12 Month Recap 05/02/17 11:13 AM Page 2 of rentmanager.com - property management systems rev.12.180
auf dem ersten Blick der einzige sichtbare Text fehlt, ist die Gesamtzahl der Seiten in den Fußzeilen beiden Seiten.
:
vom OP in einem Kommentar sagte Wie
Es gleiche Sache scheint
Tat in deleteCharsInPath()
angewandt werden soll, sollte
deleteCharsInPath
auch geändert werden
void deleteCharsInPath() {
for (List<TextPosition> list : charactersByArticle) {
List<TextPosition> toRemove = new ArrayList<>();
for (TextPosition text : list) {
Matrix textMatrix = text.getTextMatrix();
Vector start = textMatrix.transform(new Vector(0, 0));
if (linePath.contains(start.getX(), start.getY())) {
toRemove.add(text);
}
}
if (toRemove.size() != 0) {
System.out.println("Removed " + toRemove.size() + " TextPosition objects as they are being covered.");
list.removeAll(toRemove);
}
}
}
Das OP präsentierte ein anderes Dokument, in dem selbst die oben korrigierte PDFVisibleTextStripper
die sichtbaren Zeichen nicht richtig erkannte.
Die Ursache ist eine weitere Normalisierung durch PDFBox-Text-Stripping, bei der der Ursprung in die untere linke Ecke der Crop-Box verschoben wird.
Das Patchen der PDFVisibleTextStripper
Methoden zum Hinzufügen der Koordinatenkoordinaten des unteren linken Felds führt wieder zu einer anständigen Extraktion von sichtbarem Text.
Zwingende processPage
ermöglicht es uns, die untere linke Zuschneiderahmen lesen Koordinaten:
float lowerLeftX = 0;
float lowerLeftY = 0;
@Override
public void processPage(PDPage page) throws IOException {
PDRectangle pageSize = page.getCropBox();
lowerLeftX = pageSize.getLowerLeftX();
lowerLeftY = pageSize.getLowerLeftY();
super.processPage(page);
}
processTextPosition
und deleteCharsInPath
Notwendigkeit, diese Werte zu berücksichtigen:
@Override
protected void processTextPosition(TextPosition text) {
Matrix textMatrix = text.getTextMatrix();
Vector start = textMatrix.transform(new Vector(0, 0));
PDGraphicsState gs = getGraphicsState();
Area area = gs.getCurrentClippingPath();
if (area == null || area.contains(lowerLeftX + start.getX(), lowerLeftY + start.getY()))
super.processTextPosition(text);
}
[...]
void deleteCharsInPath() {
for (List<TextPosition> list : charactersByArticle) {
List<TextPosition> toRemove = new ArrayList<>();
for (TextPosition text : list) {
Matrix textMatrix = text.getTextMatrix();
Vector start = textMatrix.transform(new Vector(0, 0));
if (linePath.contains(lowerLeftX + start.getX(), lowerLeftY + start.getY())) {
toRemove.add(text);
}
}
if (toRemove.size() != 0) {
System.out.println("Removed " + toRemove.size() + " TextPosition objects as they are being covered.");
list.removeAll(toRemove);
}
}
}
nun das Extraktionsergebnis ist für die ok neue Datei auch. ;)
Vielen Dank für schnelle Antwort, es funktioniert gut. Es scheint die gleiche Sache in deleteCharsInPath() angewendet werden, wo es Füllung –
In der Tat, Sie haben Recht. – mkl
Btw, dass die Fußzeile aus einigen Gründen nicht mit der Bedingung "area.contains (start.getX(), start.getY()" übereinstimmt. Das ist in diesem Fall OK, aber es wird übersprungen, aber warum. Zum Beispiel in diesem [link] (https://drive.google.com/open?id=1l0Yt9BJXs09bXcBD7pDbxFiZQQqnuaan) Beispiel, dass die Bedingung für viel Text in der Spitze fehlschlägt.Es ist möglich, dass Sie müssen weitere Klassen mit einigen zusätzlichen Anweisungen Verarbeitung zu PDFTextStripper Unterklasse hinzufügen? –