scrapy.selector.Selector.__init__()
expects a Response
object as first argument.
Wenn Sie einen Selektor für eine HTTP-Antwort Körper zu bauen, verwenden Sie das text=
Argument:
$ scrapy shell https://docs.python.org/2/howto/curses.html
2016-11-21 11:05:34 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-21 11:05:35 [scrapy] INFO: Spider opened
2016-11-21 11:05:35 [scrapy] DEBUG: Crawled (200) <GET https://docs.python.org/2/howto/curses.html> (referer: None)
(...)
>>>
>>> #
>>> # passing response.body (bytes) instead of a Response object fails
>>> #
>>> scrapy.Selector(response.body)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/paul/.virtualenvs/scrapy12/local/lib/python2.7/site-packages/scrapy/selector/unified.py", line 67, in __init__
text = response.text
AttributeError: 'str' object has no attribute 'text'
>>>
>>> #
>>> # use text= argument to pass response body
>>> #
>>> scrapy.Selector(text=response.body)
<Selector xpath=None data=u'<html xmlns="http://www.w3.org/1999/xhtm'>
>>>
>>> scrapy.Selector(text=response.body).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
>>>
Ein einfacher Weg ist, direkt die Antwort Objekt zu übergeben:
>>> scrapy.Selector(response).xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
und ein noch einfacherer Weg ist die Verwendung .xpath()
method on the response
instance (es ist eine bequeme Methode, die einen Selector für Sie erstellt), vorausgesetzt, Ihre Antwort ist eine HtmlResponse
oder XmlResponse
(die in der Regel für Web Scraping ist)
>>> response.xpath('//div[@class="section"]')
[<Selector xpath='//div[@class="section"]' data=u'<div class="section" id="curses-programm'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="what-is-curses"'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="the-python-curs'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="starting-and-en'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="windows-and-pad'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="displaying-text'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="attributes-and-'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="user-input">\n<h'>, <Selector xpath='//div[@class="section"]' data=u'<div class="section" id="for-more-inform'>]
Können Sie den Einzug Ihrer Code einfügen? –