2017-06-08 1 views
-1

Es gibt viele Beispiele dafür, wie XML mithilfe von Tags in der Struktur analysiert wird, aber was ist, wenn (wie im Beispiel unten) viele der Tags denselben Namen haben?Python - Verwenden von Attributen zum Analysieren der XML-Struktur

<SoccerFeed timestamp="20161221T144346+0000"> 
    <SoccerDocument Type="SQUADS Latest"> 
    <Team country="USA"> 
     <Founded>1998</Founded> 
     <Name>Chicago Fire</Name> 
     <Player uID="p113757"> 
     <Name>Patrick McLain</Name> 
     <Position>Goalkeeper</Position> 
     <Stat Type="first_name">Patrick</Stat> 
     <Stat Type="last_name">McLain</Stat> 
     <Stat Type="birth_date">1988-08-22</Stat> 
     <Stat Type="birth_place">Eau Claire</Stat> 
     <Stat Type="first_nationality">USA</Stat> 
     <Stat Type="weight">94</Stat> 
     <Stat Type="height">191</Stat> 
     <Stat Type="jersey_num">23</Stat> 
     <Stat Type="real_position">Goalkeeper</Stat> 
     <Stat Type="real_position_side">Unknown</Stat> 
     <Stat Type="join_date">2016-01-18</Stat> 
     <Stat Type="country">USA</Stat> 
     </Player> 
    </Team> 
    </SoccerDocument> 
</SoccerFeed> 

Wenn ich nur Elemente mit einem ‚Stat‘ Tag und ein ‚first_name‘ Attribute, wie würde ich das tun analysieren wollte?

Antwort

1

Mit R und die xml2 Bibliothek:

library("xml2") 
myxml<-read_xml('<SoccerFeed timestamp="20161221T144346+0000"> 
<SoccerDocument Type="SQUADS Latest"> 
<Team country="USA"> 
<Founded>1998</Founded> 
<Name>Chicago Fire</Name> 
<Player uID="p113757"> 
<Name>Patrick McLain</Name> 
<Position>Goalkeeper</Position> 
<Stat Type="first_name">Patrick</Stat> 
<Stat Type="last_name">McLain</Stat> 
<Stat Type="birth_date">1988-08-22</Stat> 
<Stat Type="birth_place">Eau Claire</Stat> 
<Stat Type="first_nationality">USA</Stat> 
<Stat Type="weight">94</Stat> 
<Stat Type="height">191</Stat> 
<Stat Type="jersey_num">23</Stat> 
<Stat Type="real_position">Goalkeeper</Stat> 
<Stat Type="real_position_side">Unknown</Stat> 
<Stat Type="join_date">2016-01-18</Stat> 
<Stat Type="country">USA</Stat> 
</Player> 
</Team> 
</SoccerDocument> 
</SoccerFeed>') 

#get all of the Stat nodes 
statnodes<-xml_nodes(myxml, "Stat") 
#filter for first_name node 
firstname<- statnodes[xml_attr(statnodes, "Type")== "first_name"] 
#get text value 
xml_text(firstname) 
1

Sie können BeautifulSoup innerhalb XML Parser, wie in diesem Beispiel verwenden:

from bs4 import BeautifulSoup as bs 

data = '''<SoccerFeed timestamp="20161221T144346+0000"> 
    <SoccerDocument Type="SQUADS Latest"> 
    <Team country="USA"> 
     <Founded>1998</Founded> 
     <Name>Chicago Fire</Name> 
     <Player uID="p113757"> 
     <Name>Patrick McLain</Name> 
     <Position>Goalkeeper</Position> 
     <Stat Type="first_name">Patrick</Stat> 
     <Stat Type="last_name">McLain</Stat> 
     <Stat Type="birth_date">1988-08-22</Stat> 
     <Stat Type="birth_place">Eau Claire</Stat> 
     <Stat Type="first_nationality">USA</Stat> 
     <Stat Type="weight">94</Stat> 
     <Stat Type="height">191</Stat> 
     <Stat Type="jersey_num">23</Stat> 
     <Stat Type="real_position">Goalkeeper</Stat> 
     <Stat Type="real_position_side">Unknown</Stat> 
     <Stat Type="join_date">2016-01-18</Stat> 
     <Stat Type="country">USA</Stat> 
     </Player> 
    </Team> 
    </SoccerDocument> 
</SoccerFeed>''' 

sub = bs(data, 'xml') 
# Find all the 'Stat' tags 
stat_tags = sub.findAll('Stat') 
for k in stat_tags: 
    # Extract the text between 'Stat' tags 
    print(k.text) 

Ausgang:

Patrick 
McLain 
1988-08-22 
Eau Claire 
USA 
94 
191 
23 
Goalkeeper 
Unknown 
2016-01-18 
USA 
Verwandte Themen