Ich versuche, Zeitpläne von Basketballteams in eine CSV-Datei mit Scrapy zu speichern. Ich habe den folgenden Code in diesen Dateien geschrieben:Scrapy Spider speichert keine Daten
settings.py
BOT_NAME = 'test_project'
SPIDER_MODULES = ['test_project.spiders']
NEWSPIDER_MODULE = 'test_project.spiders'
FEED_FORMAT = "csv"
FEED_URI = "cportboys.csv"
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'test_project (+http://www.yourdomain.com)'
# Obey robots.txt rules
ROBOTSTXT_OBEY = True
khsaabot.py
import scrapy
class KhsaabotSpider(scrapy.Spider):
name = 'khsaabot'
allowed_domains = ['https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978']
start_urls = ['http://https://scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/']
def parse(self, response):
date = response.css('.mdate::text').extract()
opponent = response.css('.opponent::text').extract()
place = response.css('.schedule-loc::text').extract()
for item in zip(date,opponent,place):
scraped_info = {
'date' : item[0],
'opponent' : item[1],
'place' : item[2],
}
yield scraped_info
Nun, ich bin nicht sicher, was hier schief geht, Wenn ich es mit "scrapy crawl khsaabot" im Terminal austrage, gibt es keine Fehler und scheint gut zu funktionieren. Allerdings nur für den Fall gibt es ein Problem mit dem, was im Terminal geschieht, inklusive ich die Ausgabe, die ich dort zu bekommen:
2017-12-27 17:21:49 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: test_project)
2017-12-27 17:21:49 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'test_project', 'FEED_FORMAT': 'csv', 'FEED_URI': 'cportboys.csv', 'NEWSPIDER_MODULE': 'test_project.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['test_project.spiders']}
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-12-27 17:21:49 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Spider opened
2017-12-27 17:21:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-12-27 17:21:49 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https/robots.txt> (failed 1 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https/robots.txt> (failed 2 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://https/robots.txt> (failed 3 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.robotstxt] ERROR: Error downloading <GET http://https/robots.txt>: DNS lookup failed: no results for hostname lookup: https.
Traceback (most recent call last):
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 954, in startConnectionAttempts
"no results for hostname lookup: {}".format(self._hostStr)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 1 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 2 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/> (failed 3 times): DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.core.scraper] ERROR: Error downloading <GET http://https//scoreboard.12dt.com/scoreboard/khsaa/kybbk17/?id=51978/>
Traceback (most recent call last):
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_request
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/defer.py", line 653, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/Users/devsandbox/anaconda3/lib/python3.6/site-packages/twisted/internet/endpoints.py", line 954, in startConnectionAttempts
"no results for hostname lookup: {}".format(self._hostStr)
twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: https.
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Closing spider (finished)
2017-12-27 17:21:49 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/exception_count': 6,
'downloader/exception_type_count/twisted.internet.error.DNSLookupError': 6,
'downloader/request_bytes': 1416,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 12, 27, 23, 21, 49, 579649),
'log_count/DEBUG': 7,
'log_count/ERROR': 2,
'log_count/INFO': 7,
'memusage/max': 50790400,
'memusage/startup': 50790400,
'retry/count': 4,
'retry/max_reached': 2,
'retry/reason_count/twisted.internet.error.DNSLookupError': 4,
'scheduler/dequeued': 3,
'scheduler/dequeued/memory': 3,
'scheduler/enqueued': 3,
'scheduler/enqueued/memory': 3,
'start_time': datetime.datetime(2017, 12, 27, 23, 21, 49, 323652)}
2017-12-27 17:21:49 [scrapy.core.engine] INFO: Spider closed (finished)
Der Ausgang rechts sieht für mich, aber ich bin noch neu in Scrapy so ich könnte etwas fehlen.
Dank y'all
Vielen Dank! Ich habe getan, was Sie gesagt haben, und es wurde diesen Fehler los, aber jetzt bekomme ich einen an der gleichen Stelle mit dem Namen "NotImplementedError". Was kann ich dagegen tun? – Hunter
Es macht nichts, nachdem ich ein wenig gesucht habe, habe ich herausgefunden, wie ich diesen Fehler beheben kann und einige andere, die ich gefunden habe. Danke, dass du mit diesem hier geholfen hast! – Hunter