Mehrere Seiten mit Python scrappen

from bs4 import BeautifulSoup 
import urllib, time 
class scrap(object): 
    def __init__(self): 
     self.urls = ['https://www.onthemarket.com/for-sale/property/wigan/', 'https://www.onthemarket.com/for-sale/property/wigan/?page=1', 'https://www.onthemarket.com/for-sale/property/wigan/?page=2', 'https://www.onthemarket.com/for-sale/property/wigan/?page=3', 'https://www.onthemarket.com/for-sale/property/wigan/?page=4', 'https://www.onthemarket.com/for-sale/property/wigan/?page=6'] 
     self.telephones = [] 
    def extract_info(self): 
     for link in self.urls: 
      data = urllib.request.urlopen(link).read() 
      soup = BeautifulSoup(data, "lxml") 
      for tel in soup.findAll("span", {"class":"call"}): 
       self.telephones.append(tel.text.strip()) 
      time.sleep(1) 
     return self.telephones 

to = scrap() 
print(to.extract_info())

Was ist los? Dieser Code hängt nach der zweiten Website. Es sollte Telefonnummern von jeder Webseite in Liste extrahieren self.urlsMehrere Seiten mit Python scrappen

Quelle

2017-12-04 FootAdministration

Wenn Sie einen Fehler erhalten, bitte posten Sie es auch – csharpcoder

Ich habe Ihren Code versucht, alles funktioniert gut. [Fertig in 9.3s] – ventik

Es gibt keinen Fehler. Python-Shell macht Arbeit, aber nichts zurückgeben. Ich benutze Spyder mit Python 3.6. Ich warte mehr als 5 Minuten und passiert nichts. – FootAdministration

Alles, was Sie tun müssen, ist ein headers in Ihrem Anfrage-Parameter setzen und machen Sie es sich. Versuchen Sie dies:

from bs4 import BeautifulSoup 
import requests, time 

class scrape(object): 

    def __init__(self): 
     self.urls = ['https://www.onthemarket.com/for-sale/property/wigan/', 'https://www.onthemarket.com/for-sale/property/wigan/?page=1', 'https://www.onthemarket.com/for-sale/property/wigan/?page=2', 'https://www.onthemarket.com/for-sale/property/wigan/?page=3', 'https://www.onthemarket.com/for-sale/property/wigan/?page=4', 'https://www.onthemarket.com/for-sale/property/wigan/?page=6'] 
     self.telephones = [] 

    def extract_info(self): 
     for link in self.urls: 
      data = requests.get(link,headers={"User-Agent":"Mozilla/5.0"}) #it should do the trick 
      soup = BeautifulSoup(data.text, "lxml") 
      for tel in soup.find_all("span",{"class":"call"}): 
       self.telephones.append(tel.text.strip()) 
      time.sleep(1) 
     return self.telephones 

crawl = scrape() 
print(crawl.extract_info())

Quelle

2017-12-04 10:53:08 SIM

Btw, in Ihrem Fall haben Sie zwei Seiten gefunden, die funktionieren und der Rest ist nicht, aber in meinem Fall, was ich hatte, ist eine leere Liste. Nachdem ich Header in den Request-Parameter gesetzt hatte, funktionierte es jedoch einwandfrei @FootAdministration. – SIM

Danke Shahin, es hat für mich funktioniert! Gute Antwort! Einen schönen Tag noch! – FootAdministration

Mehrere Seiten mit Python scrappen

Antwort

Verwandte Themen