Wie kann ich durch Tags mit verschiedenen Kennungen mit BeautifulSoup in Python iterieren

Dies ist wahrscheinlich eine einfache Frage, aber ich möchte durch die Tags mit id iterieren = dgrdAcquired_hyplnkacquired_0, dgrdAcquired_hyplnkacquired_1 usw.Wie kann ich durch Tags mit verschiedenen Kennungen mit BeautifulSoup in Python iterieren

Gibt es eine einfachere Weg dies zu tun als der Code, den ich unten habe? Das Problem ist, dass die Anzahl dieser Tags für jede Webseite unterschiedlich ist. Ich bin mir nicht sicher, wie ich den Text in diesen Tags erhalten soll, wenn jede Webseite eine andere Anzahl von Tags hat.

html = """ 
<tr> 
<td colspan="3"><table class="datagrid" cellspacing="0" cellpadding="3" rules="rows" id="dgrdAcquired" width="100%"> 
<tr class="datagridH"> 
<th scope="col"><font face="Arial" color="Blue" size="2"><b>Name (RSSD ID)</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Acquisition Date</b></font></th><th scope="col"><font face="Arial" color="Blue" size="2"><b>Description</b></font></th> 
</tr><tr class="datagridI"> 
<td nowrap="nowrap"><font face="Arial" size="2"> 
<a id="dgrdAcquired_hyplnkacquired_0" href="InstitutionProfile.aspx?parID_RSSD=3557617&parDT_END=20110429">FIRST CHOICE COMMUNITY BANK                        (3557617)</a> 
</font></td><td><font face="Arial" size="2"> 
<span id="dgrdAcquired_lbldtAcquired_0">2011-04-30</span> 
</font></td><td><font face="Arial" size="2"> 
<span id="dgrdAcquired_lblAcquiredDescText_0">The acquired institution failed and disposition was arranged of by a regulatory agency. Assets were distributed to the acquiring institution.</span> 
</font></td> 
</tr><tr class="datagridAI"> 
<td nowrap="nowrap"><font face="Arial" size="2"> 
<a id="dgrdAcquired_hyplnkacquired_1" href="InstitutionProfile.aspx?parID_RSSD=104038&parDT_END=20110429">PARK AVENUE BANK, THE                         (104038)</a> 
</font></td> 
""" 
soup = BeautifulSoup(html) 
firm1 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_0"}) 
data1 = ''.join(firm1.findAll(text=True)) 
print data1 

firm2 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_1"}) 
data2 = ''.join(firm2.findAll(text=True)) 
print data2

Quelle

2012-03-27 myname

Ich würde folgendes tun, vorausgesetzt, dass, wenn es n solche Tags sind, werden sie 0...n nummeriert:

soup = BeautifulSoup(html) 
i = 0 
data = [] 
while True: 
    firm1 = soup.find('a', { "id" : "dgrdAcquired_hyplnkacquired_%s" % i}) 
    if not firm1: 
     break 
    data.append(''.join(firm1.findAll(text=True))) 
    print data[-1] 
    i += 1

Quelle

2012-03-27 18:19:58

Sie Aaron danken. Das funktioniert perfekt. – myname

+1 für einen neuen Ansatz! – bernie

Regex ist wahrscheinlich übertrieben in diesem speziellen Fall.
hier ist dennoch eine weitere Option:

import re 
soup.find_all('a', id=re.compile(r'[dgrdAcquired_hyplnkacquired_]\d+'))

Bitte beachten Sie: s/find_all/findAll/g wenn BS3 verwendet wird.
Ergebnis (ein bisschen von Leerzeichen zum Zwecke der Anzeige entfernt):

[<a href="InstitutionProfile.aspx?parID_RSSD=3557617&amp;parDT_END=20110429" 
    id="dgrdAcquired_hyplnkacquired_0">FIRST CHOICE COMMUNITY BANK (3557617)</a>, 
<a href="InstitutionProfile.aspx?parID_RSSD=104038&amp;parDT_END=20110429" 
    id="dgrdAcquired_hyplnkacquired_1">PARK AVENUE BANK, THE (104038)</a>]

Quelle

2012-03-27 18:37:42 bernie

Wie kann ich durch Tags mit verschiedenen Kennungen mit BeautifulSoup in Python iterieren

Antwort

Verwandte Themen