2017-01-03 3 views
0

Mein Ziel ist es, Daten zu extrahieren, insbesondere die Daten zu dem jüngsten Datum entsprechen (in diesem Fall 5/20), von einer HTML-Tabelle hereÄrger Pandas mit read_html

Hier der entsprechende HTML-Code:

<html> 
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>FW: NEFS 2 Available Quota 5/21</title> 
<link rel="important stylesheet" href=""> 
<style>div.headerdisplayname {font-weight:bold;}</style></head> 
<body> 
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 2 Available Quota 5/21</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <[email protected]></td></tr><tr><td><b>Date: </b>5/21/2014 10:08 AM</td></tr></table><br> 
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);} 
o\:* {behavior:url(#default#VML);} 
w\:* {behavior:url(#default#VML);} 
.shape {behavior:url(#default#VML);} 
</style><![endif]--><style><!-- 
/* Font Definitions */ 
@font-face 
    {font-family:"Cambria Math"; 
    panose-1:2 4 5 3 5 4 6 3 2 4;} 
@font-face 
    {font-family:Calibri; 
    panose-1:2 15 5 2 2 2 4 3 2 4;} 
@font-face 
    {font-family:Tahoma; 
    panose-1:2 11 6 4 3 5 4 4 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Book"; 
    panose-1:2 11 5 3 2 1 2 2 2 4;} 
@font-face 
    {font-family:"Franklin Gothic Demi"; 
    panose-1:2 11 7 3 2 1 2 2 2 4;} 
/* Style Definitions */ 
p.MsoNormal, li.MsoNormal, div.MsoNormal 
    {margin:0in; 
    margin-bottom:.0001pt; 
    font-size:11.0pt; 
    font-family:"Calibri","sans-serif";} 
a:link, span.MsoHyperlink 
    {mso-style-priority:99; 
    color:blue; 
    text-decoration:underline;} 
a:visited, span.MsoHyperlinkFollowed 
    {mso-style-priority:99; 
    color:purple; 
    text-decoration:underline;} 
span.EmailStyle17 
    {mso-style-type:personal; 
    font-family:"Calibri","sans-serif"; 
    color:windowtext;} 
span.title1 
    {mso-style-name:title1; 
    font-family:"Arial","sans-serif"; 
    color:#1F487E; 
    font-weight:normal;} 
span.EmailStyle19 
    {mso-style-type:personal-reply; 
    font-family:"Calibri","sans-serif"; 
    color:#1F497D;} 
.MsoChpDefault 
    {mso-style-type:export-only; 
    font-size:10.0pt;} 
@page WordSection1 
    {size:8.5in 11.0in; 
    margin:1.0in 1.0in 1.0in 1.0in;} 
div.WordSection1 
    {page:WordSection1;} 
--></style><!--[if gte mso 9]><xml> 
<o:shapedefaults v:ext="edit" spidmax="1026" /> 
</xml><![endif]--><!--[if gte mso 9]><xml> 
<o:shapelayout v:ext="edit"> 
<o:idmap v:ext="edit" data="1" /> 
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal><span style='color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p>&nbsp;</o:p></span></i></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats.&nbsp; Big Ideas. ~</span></b><b><span style='color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='color:#1F497D'><o:p>&nbsp;</o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> David Leveille [mailto:[email protected]] <br><b>Sent:</b> Wednesday, May 21, 2014 8:50 AM<br><b>To:</b> David Leveille<br><b>Subject:</b> NEFS 2 Available Quota 5/21<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><span style='font-size:12.0pt;font-family:"Arial","sans-serif";color:#1F487E'>AVAILABLE QUOTA FY 2014</span><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'><o:p></o:p></span></p><table class=MsoNormalTable border=0 cellspacing=0 cellpadding=0 width="71%" style='width:71.28%'><tr><td width=220 style='width:164.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><b><span style='font-size:9.0pt;font-family:"Arial","sans-serif";color:black'>ID <o:p></o:p></span></b></p></td><td width=161 style='width:120.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Available Quota <o:p></o:p></span></b></p></td><td width=189 style='width:141.75pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Live Weight Pounds <o:p></o:p></span></b></p></td><td width=126 style='width:94.55pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Price <o:p></o:p></span></b></p></td><td width=168 style='width:125.95pt;border:none;border-bottom:solid windowtext 1.0pt;background:#8BCDFF;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='mso-line-height-alt:15.0pt'><b><span style='font-size:18.0pt;font-family:"Arial","sans-serif";color:black'>Date Posted <o:p></o:p></span></b></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1724<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>GOM BB<br>GREYSOLE<br>DABS<br>GOM YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>2328<br>445<br>3007<br>850<br>3101<br>1995<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$9,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1578<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB BB<br>GB YT<br>SNE BB<br>SNE YT<br>GOM BB<br>Whake<br>POLL<br>RED<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>538<br>5894<br>1755<br>243<br>490<br>153<br>3965<br>2727<br>9227<br>15060<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.00<br>$0.40<br>$0.20<br>$1.00<br>$0.45<br>$0.50<br>$0.15<br>$0.20<br>$0.01<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>310<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>DABS<br>WHAKE<br>POLL<br>RED<br>SNE BB<br>GOM BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>825<br>9033<br>1241<br>3120<br>65234<br>76610<br>1688<br>1195<br>2121<br>7285<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<o:p></o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'><o:p>&nbsp;</o:p></span></p><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$15,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/20<o:p></o:p></span></p></td></tr><tr style='height:23.25pt'><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>347<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>SNE BB<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>8,000<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$0.50<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt;height:23.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/7<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878A<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GOM COD<br>GOM HADD<br>SNE BB<br>GOM BB<br>GB BB<br>GREYSOLE<br>GOM YT<br>SNE YT<br>POLL<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>6188<br>635<br>3916<br>7873<br>6762<br>3358<br>9776<br>271<br>186550<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>$1.95<br>$1.35<br>$0.50<br>$0.50<br>$0.20<br>$1.40<br>$1.20<br>$0.50<br>$0.01<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr><tr><td width=220 style='width:164.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1878B<o:p></o:p></span></p></td><td width=161 style='width:120.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>GBE COD<br>GBW COD<br>GB YT<o:p></o:p></span></p></td><td width=189 style='width:141.75pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>1113<br>12186<br>850<o:p></o:p></span></p></td><td width=126 style='width:94.55pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>Package<br>$10,000<o:p></o:p></span></p></td><td width=168 style='width:125.95pt;border:solid windowtext 1.0pt;background:white;padding:2.25pt 2.25pt 2.25pt 2.25pt'><p class=MsoNormal style='line-height:15.0pt'><span style='font-size:13.5pt;font-family:"Arial","sans-serif";color:black'>5/12<o:p></o:p></span></p></td></tr></table><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><p class=MsoNormal>David Leveille<o:p></o:p></p><p class=MsoNormal>II Northeast Fishery Sector Inc.<o:p></o:p></p><p class=MsoNormal>10 Witham Street<o:p></o:p></p><p class=MsoNormal>Gloucester, MA. 01930<o:p></o:p></p><p class=MsoNormal>Cell 978 375 3509<o:p></o:p></p><p class=MsoNormal>Fax 978 281 1555<o:p></o:p></p><p class=MsoNormal>Web <a href="http://nefs2.com/">http://nefs2.com/</a><o:p></o:p></p><p class=MsoNormal><o:p>&nbsp;</o:p></p><div class=MsoNormal align=center style='text-align:center'><span style='font-size:12.0pt;font-family:"Times New Roman","serif"'></body></html> 
</body> 
</html> 

Ich versuchte BeautifulSoup, um auf die Daten innerhalb jeder Zelle zuzugreifen. Ich würde sehen, wenn das Datum 5/20 unten auftritt, erfassen Sie alle entsprechenden Daten. Das funktionierte ziemlich gut, aber wenn ich die Daten in einen Pandas DataFrame legen würde, würde es scheitern. Nach vielen fehlgeschlagenen Versuchen wurde mir gesagt, dass die Verwendung von Pandas read_html eine klügere Wahl für diese Aufgabe wäre.

Bisher ist mein Code:

from bs4 import BeautifulSoup 
import pandas as pd 
import lxml 
import html5lib 

path = 'Z:\\blub' 

df = pd.pandas.read_html(path) 
print (df) 

Aber wenn ich es die Linie df = pd.pandas.read_html(path) laufen erzeugt die Fehler ValueError: No tables found.

Bedeutet das, dass der Befehl read_html die Datentabelle nicht erkennt? Verwenden Sie read_html den weisesten Ansatz für die Analyse dieser Datentabelle ordentlich? Ich sage ordentlich, weil der letzte Schritt nach dem Parsing ist, alles in eine Oracle-Datenbank zu exportieren.

Ich würde alle und alle Ratschläge zur Lösung dieses Problems zu schätzen wissen. Vielen Dank.

+0

Pfad eine Variable ist. Sie verwenden es als Zeichenfolge. 'pd.pandas.read_html (Pfad)' –

+0

Entfernen der Anführungszeichen gab den gleichen Fehler zurück :( – theprowler

+1

Können Sie meine Antwort überprüfen. Ich hoffe, dass hilft :) –

Antwort

1

Zuerst können Sie die Daten aus der Datei gelesen und es dann

from bs4 import BeautifulSoup 
import pandas as pd 
import lxml 
import html5lib 

path = 'file.html' 
with open(path, 'rt') as myfile: 
    data = myfile.read().replace("<br>", '\n') 


df = pd.read_html(data) 

mit parsen Dies wird Ihnen eine Liste von Datenrahmen. Bei df [1] Sie erhalten den Datenrahmen erhalten Sie

df[1] 

enter image description here

Da dies keine zuverlässige Möglichkeit ist, Daten von einer HTML zu analysieren ich Sie durch das Parsen und Erstellen von Datenrahmen gehen Rat würde mit schöner Suppe.

from bs4 import BeautifulSoup 
path = 'file.html' 
ecj_data = open(path,'r').read() 

soup = BeautifulSoup(ecj_data) 


tabulka = soup.find("table", {"class" : "MsoNormalTable"}) 

column_headers = ['ID','Available Quota', 'Live Weight Pounds', 'Price', 'Date Posted'] 
records = [] 
for idy, row in enumerate(tabulka.findAll('tr')): 
    if idy == 0: 
     continue 
    cols = row.findAll('td') 
    record = {} 
    for idx, col in enumerate(cols): 
     record[column_headers[idx]] = col.text.strip() 
    records.append(record) 

df = pd.DataFrame.from_dict(records) 

df[column_headers] 

Ausgabe wird wie folgt aussehen:

enter image description here

+0

Ihre zweite Methode ist tatsächlich ähnlich wie ich es versucht habe. Die Ausgabe, die Sie bekommen haben, sieht perfekt aus ... aber ich laufe immer wieder in denselben Fehler, wenn ich beide oben genannten Methoden ausprobiere: Ich erhalte 'Errno 13 Berechtigung verweigert: 'Z: \\ blub''. Ich habe diesen Fehler in der Vergangenheit gesehen, aber es verwirrt mich immer noch.Ich bin nicht sicher, welche spezifische Sache mein Code anders macht, der diesen Fehler vermeidet, aber es sind die 'open()' Linien, die es verursachen. Ich werde versuchen, Ihren Code mit mir zu kombinieren und zu sehen, was ich tun kann – theprowler

+0

versuchen, die HTML-Datei und das Python-Programm in der gleichen Datei. –

+0

Ich lege beide in den Ordner "Blub" und es schlägt immer noch mit der Berechtigung verweigert. Ich konnte auf die Datei zugreifen, indem ich 'for filename in os.listidr (Pfad) verwendete: \ n file_path = os.path.join (Pfad, Dateiname) \ n if os.path.isfile (Dateipfad): \ n mit open (file_path, 'r') als f: ' – theprowler

0

Ich denke, Ihr Fehler liegt in den Anführungszeichen von 'Pfad' Argument in read_html() -Methode. Stellen Sie sicher, dass Sie sie entfernen, andernfalls rufen Sie sie mit der Zeichenfolge "Pfad" anstelle des Variablenpfads auf, der Ihren tatsächlichen Pfad enthält.

path = r'Z:\\blub' 

if os.path.isfile(path): 
    df = pd.pandas.read_html(path) 
    print (df) 
+0

Das Entfernen der Anführungszeichen gibt leider den gleichen Fehler zurück – theprowler

+0

Vielleicht sollten Sie das 'r' vor die Zeichenfolge setzen, um \ b Sonderzeichen zu vermeiden. Stellen Sie dann sicher, dass diese Datei existiert. Ich habe versucht, die genaue HTML und mit Erfolg zu lesen. Aber, wie @ Vikash bereits erwähnt, ist dies nicht sehr zuverlässige Methode, da ich nur für sehr einfache HTML-Tabellen arbeiten ... –