Ich habe versucht, diese Seite zu kratzen:Scrapy lieferte keine Daten
http://www.homeimprovementpages.com.au/connect/hypowerelectrical/service/261890
Und ich habe diesen Code:
import scrapy
class HipSpider(scrapy.Spider):
name = "hip"
allowed_domains = ["homeimprovementpages.com.au"]
start_urls = [
"http://www.homeimprovementpages.com.au/connect/protecelectricalservices/service/163729",
]
def parse(self, response):
item = HomeimprovementItem()
item['name'] = response.xpath('//h2[@class="media-heading text-strong"]/text()').extract()
item['contact'] = response.xpath('//div/span[.="Contact Name:"]/following-sibling::div[1]/text()').extract()
item['phone'] = response.xpath('//div/span[.="Phone:"]/following-sibling::div[1]/text()').extract()
yield item
Und das Ergebnis ist:
C:\Python27\homeimprovement>scrapy crawl hip -o h.csv
2016-04-08 17:49:33 [scrapy] INFO: Scrapy 1.0.5 started (bot: homeimprovement)
2016-04-08 17:49:33 [scrapy] INFO: Optional features available: ssl, http11
2016-04-08 17:49:33 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'ho
meimprovement.spiders', 'FEED_FORMAT': 'csv', 'SPIDER_MODULES': ['homeimprovemen
t.spiders'], 'FEED_URI': 'h.csv', 'BOT_NAME': 'homeimprovement'}
2016-04-08 17:49:34 [scrapy] INFO: Enabled extensions: CloseSpider, FeedExporter
, TelnetConsole, LogStats, CoreStats, SpiderState
2016-04-08 17:49:34 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddl
eware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultH
eadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMidd
leware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-04-08 17:49:34 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddlewa
re, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-04-08 17:49:34 [scrapy] INFO: Enabled item pipelines:
2016-04-08 17:49:34 [scrapy] INFO: Spider opened
2016-04-08 17:49:34 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 i
tems (at 0 items/min)
2016-04-08 17:49:34 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-04-08 17:49:34 [scrapy] DEBUG: Crawled (403) <GET http://www.homeimprovemen
tpages.com.au/connect/protecelectricalservices/service/163729> (referer: None)
2016-04-08 17:49:34 [scrapy] DEBUG: Ignoring response <403 http://www.homeimprov
ementpages.com.au/connect/protecelectricalservices/service/163729>: HTTP status
code is not handled or not allowed
2016-04-08 17:49:34 [scrapy] INFO: Closing spider (finished)
2016-04-08 17:49:34 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 276,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 2488,
'downloader/response_count': 1,
'downloader/response_status_count/403': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2016, 4, 8, 12, 19, 34, 946000),
'log_count/DEBUG': 3,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2016, 4, 8, 12, 19, 34, 537000)}
2016-04-08 17:49:34 [scrapy] INFO: Spider closed (finished)
Und Es gab eine CSV im Spider-Ordner erstellt und es war leer. Ich kann nicht verstehen, was schief gelaufen ist. Ich hoffe, dass mich jemand führen kann.
Ok, ich werde das versuchen und lassen Sie wissen, ob es funktioniert hat, danke. – neenkart
Ich habe es versucht und nichts geändert. Vielleicht habe ich es falsch gemacht, ich bin neu in Python und Scrapy. – neenkart