Unten ist das div-Tag direkt von espncricinfo.com genommen.webscraping mit Beautifulsoup 4
<div id="rectPlyr_Playerlistt20" style="display: none; visibility: hidden;
background:url(http://i.imgci.com/espncricinfo/ciPlayerTablebottom-bg.gif) bottom left no-repeat;">
<table class="playersTable" cellpadding="0" cellspacing="0" style="margin-top:15px; margin-bottom:14px;">
<td class="divider"><a href="/ci/content/player/26421.html">R Ashwin</a></td>
<td class="divider"><a href="/ci/content/player/27223.html">STR Binny</a></td>
<td class=""><a href="/ci/content/player/625383.html">JJ Bumrah</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/430246.html">YS Chahal</a></td>
<td class="divider"><a href="/ci/content/player/290727.html">R Dhawan</a></td>
<td class=""><a href="/ci/content/player/28235.html">S Dhawan</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/28081.html">MS Dhoni</a></td>
<td class="divider"><a href="/ci/content/player/28671.html">FY Fazal</a></td>
<td class=""><a href="/ci/content/player/28763.html">G Gambhir</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/234675.html">RA Jadeja</a></td>
<td class="divider"><a href="/ci/content/player/290716.html">KM Jadhav</a></td>
<td class=""><a href="/ci/content/player/253802.html">V Kohli</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/277955.html">DS Kulkarni</a></td>
<td class="divider"><a href="/ci/content/player/326016.html">B Kumar</a></td>
<td class=""><a href="/ci/content/player/398506.html">Mandeep Singh</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/31107.html">A Mishra</a></td>
<td class="divider"><a href="/ci/content/player/481896.html">Mohammed Shami</a></td>
<td class=""><a href="/ci/content/player/290630.html">MK Pandey</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/554691.html">AR Patel</a></td>
<td class="divider"><a href="/ci/content/player/32540.html">CA Pujara</a></td>
<td class=""><a href="/ci/content/player/277916.html">AM Rahane</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/422108.html">KL Rahul</a></td>
<td class="divider"><a href="/ci/content/player/33141.html">AT Rayudu</a></td>
<td class=""><a href="/ci/content/player/279810.html">WP Saha</a></td>
</tr>
<tr class="">
<td class="divider"><a href="/ci/content/player/236779.html">I Sharma</a></td>
<td class="divider"><a href="/ci/content/player/34102.html">RG Sharma</a></td>
<td class=""><a href="/ci/content/player/537126.html">BB Sran</a></td>
</tr>
<tr class="odd">
<td class="divider"><a href="/ci/content/player/390484.html">JD Unadkat</a></td>
<td class="divider"><a href="/ci/content/player/237095.html">M Vijay</a></td>
<td class=""><a href="/ci/content/player/376116.html">UT Yadav</a></td>
</tr>
<tr class="">
</tr>
</table>
</div>
Ich möchte über HTML-Datei kratzen:
from bs4 import BeautifulSoup
import os
import urllib2
BASE_URL = "http://www.espncricinfo.com"
espn_ = urllib2.urlopen("http://www.espncricinfo.com/ci/content/player/index.html?country=6")
soup = BeautifulSoup(espn_ , 'html.parser')
#print soup.prettify().encode('utf-8')
t20 = soup.find_all('div' , {"id" : "rectPlyr_Playerlistt20"})
for row in t20:
print(row.find('tr' , {"class":"odd"}))
Nehmen wir an, ich den Code genommen haben von oben angegebenen URL. Wenn ich scrape bekomme ich die Ausgabe als KEINE
Auch wenn ich t20 drucke ich nicht volle Ausgabe, es zeigt nur bis JJ Bumrah, d. H. Nur die erste <tr>
Tag. Wenn Sie mit den obigen Daten nicht klar sind, gehen Sie zu der in espn_ bereitgestellten URL. wähle das Team Indien und gehe zum Tab "t20". Ich möchte die href-Links aller Spieler, die wir unter t20 sehen, verwerfen.