2016-10-02 2 views
0

Also ich suchte nach dem besten, was ich an Software mag. Dann fand ich heraus, über Web-Scraping Ich fand es wirklich erstaunlich, so mit meinem Python-Erfahrung, die ich einige Hands-on bekam irgend Schöne Suppe und Anfragen und hier ist der CodeWeb Scraping funktioniert nicht?

import html5lib 
 
import requests 
 
from bs4 import BeautifulSoup as BS 
 

 
# Get all the a strings , next siblings and next siblings 
 
def makeSoup(urls): 
 
    url = requests.get(urls).text 
 
    return BS(url,"html5lib") 
 

 
def something(soup): 
 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
 
     print(anchor.string) 
 
     next_sibling = anchor.nextSibling 
 
     water = str(next_sibling.string) 
 
     water = water[0:5] 
 
     while water != "(202)": 
 
      next_sibling = next_sibling.nextSibling 
 
      if next_sibling == None: 
 
       continue 
 
      if next_sibling.string != None: 
 
       print(next_sibling.string) 
 
       water = str(next_sibling.string) 
 
       water = water[0:5] 
 

 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide.htm") 
 
something(soup) 
 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_2.htm") 
 
something(soup) 
 
soup = makeSoup("http://dc.about.com/od/communities/a/EmbassyGuide_3.htm") 
 
something(soup) 
 
<!-- begin snippet: js hide: false console: true babel: false -->

Aber leider alle Programmierer Albtraum FEHLER.

Traceback (most recent call last): 
 
    File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 26, in <module> 
 
    something(soup) 
 
    File "C:\Users\Raj\Desktop\kunal projects\Python\listing_out_all_embassies.py", line 17, in something 
 
    next_sibling = next_sibling.nextSibling 
 
AttributeError: 'NoneType' object has no attribute 'nextSibling'

was falsch mache ich, und ich bin ein Neuling auf die Programmierung sowie Web-Scraping. Also, was sind einige gute Praktiken, die ich nicht befolge Anyway.thanks zum Lesen bis zum Ende.

+0

Das 'CONTINUE sieht nicht richtig aus . – user2357112

Antwort

0

Sie haben next_sibling == None zu überprüfen, bevor Sie next_sibling.nextSibling verwenden können (und break wenn es None ist)

def something(soup): 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
     print(anchor.string) 
     next_sibling = anchor.nextSibling 
     water = str(next_sibling.string) 
     water = water[0:5] 
     while water != "(202)": 
      if next_sibling == None: 
       break 
      next_sibling = next_sibling.nextSibling 
      if next_sibling == None: 
       break 
      if next_sibling.string != None: 
       print(next_sibling.string) 
       water = str(next_sibling.string) 
       water = water[0:5] 

Aber ich konnte es kürzer schreiben

def something(soup): 
    for anchor in soup.findAll("a",{"data-type":"externalLink"}): 
     water = None # create variable to use it first time in "while" 
     while anchor and water != "(202)": 
      if anchor.string: 
       print(anchor.string) 
       water = anchor.string[:5] 
      anchor = anchor.nextSibling