Ich versuche Tabellen zu scrape und konvertieren sie in data.tables in Python, aber ich habe wenig Glück von Wahldaten in den USA. Dies ist HTML der Daten, die ich kratzen möchte.Tabellen mit Python kratzen
<tr class="type-republican">
<th class="results-name" scope="row"><a href="xxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Republican">R</abbr></span> <span class="token token-winner"><b aria-hidden="true" class="icon icon-check"></b> <span class="icon-text">Winner</span></span> D. Trump</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">62.9%</span><span class="graph"><span class="bar"><span class="index" style="width:62.9%;"></span></span></span></span></td>
<td class="results-popular">1,306,925</td>
<td class="delegates-cell">9</td>
</tr>
<tr class="type-democrat">
<th class="results-name" scope="row"><a href="xxxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Democratic">D</abbr></span> H. Clinton</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">34.6%</span><span class="graph"><span class="bar"><span class="index" style="width:34.6%;"></span></span></span></span></td>
<td class="results-popular">718,084</td>
<td class="delegates-cell"></td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> G. Johnson</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">2.1%</span><span class="graph"><span class="bar"><span class="index" style="width:2.1%;"></span></span></span></span></td>
<td class="results-popular">43,869</td>
<td class="delegates-cell"></td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> J. Stein</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">0.4%</span><span class="graph"><span class="bar"><span class="index" style="width:0.4%;"></span></span></span></span></td>
<td class="results-popular">9,287</td>
<td class="delegates-cell"></td>
</tr>
</tbody>
</table>, <table class="results-table">
<tbody>
<tr class="type-republican">
<th class="results-name" scope="row"><a href="xxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Republican">R</abbr></span> D. Trump</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">73.4%</span><span class="graph"><span class="bar"><span class="index" style="width:73.4%;"></span></span></span></span></td>
<td class="results-popular">18,110</td>
</tr>
<tr class="type-democrat">
<th class="results-name" scope="row"><a href="xxxxxx"><span class="name-combo"><span class="token token-party"><abbr title="Democratic">D</abbr></span> H. Clinton</span></a></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">24.0%</span><span class="graph"><span class="bar"><span class="index" style="width:24.0%;"></span></span></span></span></td>
<td class="results-popular">5,908</td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> G. Johnson</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">2.2%</span><span class="graph"><span class="bar"><span class="index" style="width:2.2%;"></span></span></span></span></td>
<td class="results-popular">538</td>
</tr>
<tr class="type-independent">
<th class="results-name" scope="row"><span class="name-combo"><span class="token token-party"><abbr title="Independent">I</abbr></span> J. Stein</span></th>
<td class="results-percentage"><span class="percentage-combo"><span class="number">0.4%</span><span class="graph"><span class="bar"><span class="index" style="width:0.4%;"></span></span></span></span></td>
<td class="results-popular">105</td>
</tr>
</tbody>
Und so weiter ... So sieht mein Code wie folgt.
Percentage = []
Count = []
page = requests.get('xxxx')
soup = BeautifulSoup(page.text, "lxml")
table = soup.find('div', class_='content-alpha')
for row in table.find_all('tr'):
col = row.find_all('td')
Percentage = col[0].find(text=True)
Count = col[1].find(text=True
print (Count)
Aber was ich hier bekomme, ist eine Information von nur ein paar Tabellen, aber nicht alle von ihnen. Wie kann ich Informationen von allen Tabellen abrufen? Und warum bekomme ich Informationen nur von wenigen Tischen?
Ich hoffe, Sie verstehen die Frage.
HTML ist wirklich groß, also füge ich Link zur Website hinzu http://www.politico.com/2016-election/results/map/president/alabama/. Ich möchte in Alabama 2016 US-Wahl-Daten von jedem Landkreis kratzen
Die Klasse 'Content-Alpha' ist in Ihren Daten hier nicht enthalten. Können Sie die Daten, die Sie abkratzen möchten, und die erwarteten Ergebnisse aktualisieren? – Stergios
Es ist viel einfacher für uns, Ihnen zu helfen, wenn Sie die URL angeben, die Sie versuchen zu kratzen – wpercy
Ich habe den Link zur Website hinzugefügt. – Extria