2017-06-02 3 views
0

Ich kann dieses XML nicht analysieren, das keine Verweise auf eine Klasse zu haben scheint.Schöne Soup Parsing Probleme

Ein Ausschnitt aus meinem Code:

sock = urllib2.urlopen(l) 
link = sock.read() 

soup = BeautifulSoup(link,"xml") 

FirstNameHome=soup.find('home_probable_pitcher','first_name') 

Ich mag sowohl für das Heim der Vornamen zu finden und Gastteam:

(Theres nur zwei Fälle, also nicht sicher, ob ich verwenden soll findAll)

Hier ist die Quelle mit soup.prettify

LookupError: unknown encoding: <?xml version="1.0" encoding="UTF-8"?><!--Copyright 2017 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt--> 
<game id="2017/06/02/nyamlb-tormlb-1" venue="Rogers Centre" game_pk="490921" 
     time="7:07" 
     time_date="2017/06/02 7:07" 
     time_date_aw_lg="2017/06/02 7:07" 
     time_date_hm_lg="2017/06/02 7:07" 
     time_zone="ET" 
     ampm="PM" 
     first_pitch_et="" 
     away_time="7:07" 
     away_time_zone="ET" 
     away_ampm="PM" 
     home_time="7:07" 
     home_time_zone="ET" 
     home_ampm="PM" 
     game_type="R" 
     tiebreaker_sw="N" 
     original_date="2017/06/02" 
     time_zone_aw_lg="-4" 
     time_zone_hm_lg="-4" 
     time_aw_lg="7:07" 
     aw_lg_ampm="PM" 
     tz_aw_lg_gen="ET" 
     time_hm_lg="7:07" 
     hm_lg_ampm="PM" 
     tz_hm_lg_gen="ET" 
     venue_id="14" 
     scheduled_innings="9" 
     away_name_abbrev="NYY" 
     home_name_abbrev="TOR" 
     away_code="nya" 
     away_file_code="nyy" 
     away_team_id="147" 
     away_team_city="NY Yankees" 
     away_team_name="Yankees" 
     away_division="E" 
     away_league_id="103" 
     away_sport_code="mlb" 
     home_code="tor" 
     home_file_code="tor" 
     home_team_id="141" 
     home_team_city="Toronto" 
     home_team_name="Blue Jays" 
     home_division="E" 
     home_league_id="103" 
     home_sport_code="mlb" 
     day="FRI" 
     gameday_sw="P" 
     double_header_sw="N" 
     game_nbr="1" 
     tbd_flag="N" 
     venue_w_chan_loc="CAXX0504" 
     location="Toronto, Canada" 
     gameday_link="2017_06_02_nyamlb_tormlb_1" 
     away_win="30" 
     away_loss="20" 
     home_win="26" 
     home_loss="27" 
     game_data_directory="/components/game/mlb/year_2017/month_06/day_02/gid_2017_06_02_nyamlb_tormlb_1" 
     league="AA" 
     inning_state="" 
     note="" 
     status="Preview" 
     ind="S" 
     tv_station="SNET-1, MLBN (out-of-market only)"> 
    <home_probable_pitcher id="434538" first_name="Francisco" first="Francisco" last_name="Liriano" 
          last="Liriano" 
          name_display_roster="Liriano" 
          number="45" 
          throwinghand="LHP" 
          wins="2" 
          losses="2" 
          era="6.35" 
          s_wins="2" 
          s_losses="2" 
          s_era="6.35" 
          stats_season="2017" 
          stats_type="R"/> 
    <away_probable_pitcher id="501381" first_name="Michael" first="Michael" last_name="Pineda" 
          last="Pineda" 
          name_display_roster="Pineda" 
          number="35" 
          throwinghand="RHP" 
          wins="6" 
          losses="2" 
          era="3.32" 
          s_wins="6" 
          s_losses="2" 
          s_era="3.32" 
          stats_season="2017" 
          stats_type="R"/> 
    <game_media> 
     <media type="game" calendar_event_id="14-490921-2017-06-02" 
      start="2017-06-02T19:07:00-0400" 
      title="NYY @ TOR" 
      has_mlbtv="true" 
      free="NO" 
      enhanced="N" 
      media_state="media_off" 
      thumbnail="http://mediadownloads.mlb.com/mlbam/preview/nyator_490921_th_7_preview.jpg"/> 
    </game_media> 
</game> 
+0

Bitte fügen Sie Beispiel-URL geben ('l' Objekt in Ihrem Beispiel) –

+0

http://gd2.mlb.com /components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_liamlb_1/linescore.xml –

+0

Beachten Sie, dass sich die URL-Ausgabe nach dem 6/03/2017 ändert. –

Antwort

3

wenn wir schreiben

# for Python 3 
# import urllib.request 

import urllib2 

from bs4 import BeautifulSoup 

l = 'http://gd2.mlb.com/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1/linescore.xml' 

sock = urllib2.urlopen(l) 
# for Python 3 
# sock = urllib.request.urlopen(l) 
link = sock.read() 

soup = BeautifulSoup(link, "xml") 

FirstNameHome = soup.find('home_probable_pitcher').attrs['first_name'] 
print(FirstNameHome) 

es

Edinson 

auch

print(soup.prettify(encoding='utf-8')) 

gibt

<?xml version="1.0" encoding="utf-8"?> 
<!--Copyright 2017 MLB Advanced Media, L.P. Use of any content on this page acknowledges agreement to the terms posted here http://gdx.mlb.com/components/copyright.txt--> 
<game ampm="PM" aw_lg_ampm="PM" away_ampm="PM" away_code="ari" away_division="W" away_file_code="ari" away_league_id="104" away_loss="22" away_name_abbrev="ARI" away_sport_code="mlb" away_team_city="Arizona" away_team_id="109" away_team_name="D-backs" away_time="1:10" away_time_zone="MST" away_win="34" day="SAT" double_header_sw="N" first_pitch_et="" game_data_directory="/components/game/mlb/year_2017/month_06/day_03/gid_2017_06_03_arimlb_miamlb_1" game_nbr="1" game_pk="490927" game_type="R" gameday_link="2017_06_03_arimlb_miamlb_1" gameday_sw="P" hm_lg_ampm="PM" home_ampm="PM" home_code="mia" home_division="E" home_file_code="mia" home_league_id="104" home_loss="31" home_name_abbrev="MIA" home_sport_code="mlb" home_team_city="Miami" home_team_id="146" home_team_name="Marlins" home_time="4:10" home_time_zone="ET" home_win="21" id="2017/06/03/arimlb-miamlb-1" ind="S" inning_state="" league="NN" location="Miami, FL" note="" original_date="2017/06/03" scheduled_innings="9" status="Preview" tbd_flag="N" tiebreaker_sw="N" time="4:10" time_aw_lg="4:10" time_date="2017/06/03 4:10" time_date_aw_lg="2017/06/03 4:10" time_date_hm_lg="2017/06/03 4:10" time_hm_lg="4:10" time_zone="ET" time_zone_aw_lg="-4" time_zone_hm_lg="-4" tv_station="FS-F, MLBN (out-of-market only)" tz_aw_lg_gen="ET" tz_hm_lg_gen="ET" venue="Marlins Park" venue_id="4169" venue_w_chan_loc="USFL0316"> 
<home_probable_pitcher era="4.44" first="Edinson" first_name="Edinson" id="450172" last="Volquez" last_name="Volquez" losses="7" name_display_roster="Volquez" number="36" s_era="4.44" s_losses="7" s_wins="1" stats_season="2017" stats_type="R" throwinghand="RHP" wins="1"/> 
<away_probable_pitcher era="3.47" first="Randall" first_name="Randall" id="517414" last="Delgado" last_name="Delgado" losses="0" name_display_roster="Delgado" number="48" s_era="3.47" s_losses="0" s_wins="1" stats_season="2017" stats_type="R" throwinghand="RHP" wins="1"/> 
<game_media> 
    <media calendar_event_id="14-490927-2017-06-03" enhanced="N" free="NO" has_mlbtv="true" media_state="media_off" start="2017-06-03T16:10:00-0400" thumbnail="http://mediadownloads.mlb.com/mlbam/preview/arimia_490927_th_7_preview.jpg" title="ARI @ MIA" type="game"/> 
</game_media> 
</game> 

EDIT

gibt nur

ich Ihre Fehler reproduzieren kann, wenn i link Objekt (oder str(soup)) zu prettify Methode

soup.prettify(link) 

gut passieren, ist es nicht, was Sie brauchen, weil prettify Argumente encoding ('utf-8' zum Beispiel) sein kann und formatter (standardmäßig 'minimal'), nicht roh Inhalt, so schreiben sie einfach

pretty = soup.prettify() 

und es wird

geben
>>> type(pretty) 
<type 'unicode'> 

oder geben Sie Codierung

>>> pretty = soup.prettify(encoding='utf-8') 

und es wird

>>> type(pretty) 
<type 'str'> 
+0

Vielen Dank. Ich war nicht so sehr besorgt über den Nachschlagefehler, als Ich musste nur wissen, wie man den Namen analysiert. Sieht aus wie FirstNameHome = soup.find ('home_probable_pitcher'). Attrs ['first_name'] macht den Trick. Ich werde es gleich nochmal überprüfen. –

+0

@DannyW: lassen Sie mich wissen, wenn es verbessert werden kann –