Das Problem ist, dass es eine falsche Markup große Teile des Codes zu machen, kommentierte heraus d.h
<!-->.
das Update war diese Elemente dann analysieren, um die HTML zu ersetzen.
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://etfdb.com/compare/market-cap/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read()
page = rawpage.replace("<!-->", "")
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table mm-mobile-table table-striped table-bordered"})
print (table)
auf Python Getestet 2.7.12
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
site = 'http://etfdb.com/compare/market-cap/'
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site, headers=hdr)
res = urlopen(req)
rawpage = res.read().decode("utf-8")
page = rawpage.replace('<!-->', '')
soup = BeautifulSoup(page, "html.parser")
table = soup.find("table", {"class":"table mm-mobile-table table-striped table-bordered"})
print (table)
Getestet auf Python 3.5.2
Gibt:
<table class="table mm-mobile-table table-striped table-bordered" data-icons='{"columns":"fa-th"}' data-icons-prefix="fa" data-striped="true" data-toggle="table"><thead><tr><th class="show-td" data-field="symbol">Symbol</th> <th class="show-td" data-field="name">Name</th> <th class="show-td" data-field="aum">AUM</th> <th class="show-td" data-field="avg-volume">Avg Volume</th></tr></thead><tbody><tr><td class="show-td" data-th="Symbol"><a href="/etf/SPY/">SPY</a></td> <td class="show-td" data-th="Name"><a href="/etf/SPY/">SPDR S&P 500 ETF</a></td> <td class="show-td" data-th="AUM">$236,737,519.17</td> <td class="show-td" data-th="Avg Volume">73,039,883</td></tr> <tr><td class="show-td" data-th="Symbol"><a href="/etf/IVV/">IVV</a></td> <td class="show-td" data-th="Name"><a href="/etf/IVV/">iShares Core S&P 500 ETF</a></td> <td class="show-td" data-th="AUM">$115,791,603.10</td> <td class="show-td" data-th="Avg Volume">3,502,931</td></tr> ...
Es gibt keine Tabellen in der 'soup' (noch im' page'). Wahrscheinlich erkennt der Server 'Mozilla/5.0' nicht als gültigen Agenten. – DyZ