Wie gruppiere ich XPath?

Ich habe HTML-Elemente, die wie folgt aussehen:Wie gruppiere ich XPath?

Ich mag würde h1, gruppieren div.article-meta und div.article-content, so kann ich seine Daten Zeile für Zeile auf meinem Scrapy Projekt Schleife schreiben.

Ich denke darüber nach, jede von ihnen in eine Var zu gruppieren, dann loop diese Var, ich bin mir nicht sicher, wie es geht.

Bitte vorschlagen. Danke,

Bisher habe ich versucht, dies:

def parse(self, response): 
    now = time.strftime('%Y-%m-%d %H:%M:%S') 
    hxs = scrapy.Selector(response) 

    titles = hxs.xpath('//div[@class="list-article"]/h1') 
    images = hxs.xpath('//div[@class="list-article"]/feature-image') 
    contents = hxs.xpath('//div[@class="list-article"]/article-content') 

    for i, title in titles: 
     item = DapnewsItem() 
     item['categoryId'] = '1' 

     name = titles[i].xpath('a/text()') 
     if not name: 
      print('DAP => [' + now + '] No title') 
     else: 
      item['name'] = name.extract()[0] 

     description = contents[i].xpath('p/text()') 
     if not description: 
      print('DAP => [' + now + '] No description') 
     else: 
      item['description'] = description[1].extract() 

     url = titles[i].xpath("a/@href") 
     if not url: 
      print('DAP => [' + now + '] No url') 
     else: 
      item['url'] = url.extract()[0] 

     imageUrl = images[i].xpath('img/@src') 
     if not imageUrl: 
      print('DAP => [' + now + '] No imageUrl') 
     else: 
      item['imageUrl'] = imageUrl.extract()[0] 

     yield item

Dies ist die Fehler, die ich bekomme.

Quelle

2016-10-10 Vicheanak

dort Hallo, ich habe meine Antwort für Sofar – Vicheanak

Lassen Sie uns diesen HTML-Snippet verwenden, um darzustellen:

<div class="list-article"> 

    <h1><a href="http//www.example.com/article1.html">Title 1</h1> 
    <div class="article-meta">Something for 1</div> 
    <div class="feature-image"><img src="http://www.example.com/image1.jpg"></div> 
    <div class="article-content"><p>Content 1</p></div> 

    <h1><a href="http//www.example.com/article2.html">Title 2</h1> 
    <div class="article-meta">Something for 2</div> 
    <div class="feature-image"><img src="http://www.example.com/image2.jpg"></div> 
    <div class="article-content"><p>Content 2</p></div> 

    <h1><a href="http//www.example.com/article3.html">Title 3</h1> 
    <div class="article-meta">Something for 3</div> 
    <div class="feature-image"><img src="http://www.example.com/image3.jpg"></div> 
    <div class="article-content"><p>Content 3</p></div> 

</div>

Sie können Schleife auf jeder <h1> und mit XPath's following-sibling axis zu prüfen, welche Elemente kommen, nachdem auf der gleichen Ebene in dem Baum, und dann Filterung auf den ersten: z following-sibling::div[@class="feature-image"][1] zum ersten <div class="feature-image">

>>> selector = scrapy.Selector(text='''<div class="list-article"> 
... 
...  <h1><a href="http//www.example.com/article1.html">Title 1</h1> 
...  <div class="article-meta">Something for 1</div> 
...  <div class="feature-image"><img src="http://www.example.com/image1.jpg"></div> 
...  <div class="article-content"><p>Content 1</p></div> 
... 
...  <h1><a href="http//www.example.com/article2.html">Title 2</h1> 
...  <div class="article-meta">Something for 2</div> 
...  <div class="feature-image"><img src="http://www.example.com/image2.jpg"></div> 
...  <div class="article-content"><p>Content 2</p></div> 
... 
...  <h1><a href="http//www.example.com/article3.html">Title 3</h1> 
...  <div class="article-meta">Something for 3</div> 
...  <div class="feature-image"><img src="http://www.example.com/image3.jpg"></div> 
...  <div class="article-content"><p>Content 3</p></div> 
...  
... </div>''') 

>>> for h in selector.css('div.list-article > h1'): 
...  item = { 
...   'title': h.xpath('a/text()').extract_first(), 
...   'image': h.xpath(''' 
...    following-sibling::div[@class="feature-image"][1] 
...     /img/@src''').extract_first(), 
...   'content': h.xpath(''' 
...    following-sibling::div[@class="article-content"][1] 
...     /p/text()''').extract_first(), 
...  } 
...  print(item) 
... 
{'content': u'Content 1', 'image': u'http://www.example.com/image1.jpg', 'title': u'Title 1'} 
{'content': u'Content 2', 'image': u'http://www.example.com/image2.jpg', 'title': u'Title 2'} 
{'content': u'Content 3', 'image': u'http://www.example.com/image3.jpg', 'title': u'Title 3'} 
>>>

Quelle

2016-10-10 11:03:17

Arbeit groß aktualisiert! Vielen Dank. – Vicheanak

Wie gruppiere ich XPath?

Antwort

Verwandte Themen