Ich habe echte Probleme, den XPath für eine "Nächste Seite" URL für eine Website zu bekommen.Scrapy - XPath für nächste Seite
Die HTML ist wie folgt:
<div class="pagingcont">
<div class="right margintop" id="save_search_header_popup" style="width:550px;">
<div class="left marginleft" style="padding-top:1px;">
<div class="left save_search_env"><img src="/themes/LW1/refresh/images/envelope_icon.gif" alt="Save" /> </div>
<div class="left">
Save this search and receive email alerts of new listings
<input type="text" maxlength="100" value="Name this search" onfocus="doSavedSearchFocus(this,'Name this search');" style="width:120px;height:14px;color:Gray;"/>
</div>
</div>
<div class="left save_search_btn" style="margin-right:10px;"><img class="pointer" src="/themes/LW1/refresh/images/btn_save.gif" alt="Save" onclick="showPopup(document.getElementById('save_search_header_popup'), null, 'In order to be notified of new or updated properties, you need to be registered and signed in.');return false;"/></div>
</div>
<div class="left margintop marginleft" style="cursor:pointer;height:27px;" onclick="javascript:docompare(true);">
<div class="left"><img src="//www.landwatch.com/themes/LW1/images/comparebtn_btm.gif" style="margin-bottom:0px;"> </div>
<div class="left active" style="margin-top:4px;">COMPARE</div>
</div>
<div class="clear topline"></div>
<div class="clear margin">
<b>Page </b>
<span class="active" style="padding:3px 3px 3px 4px;border:solid 1px black;">1 </span> <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=2">2</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=3">3</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=4">4</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=5">5</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=6">6</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=7">7</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=8">8</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=9">9</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=10">10</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=11">11</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=12">12</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=13">13</a> | <a href="https://www.landwatch.com/default.aspx?ct=r&type=5,37;268,6843&=&px=2000000&r.PSIZ=500%2c&pg=2">Next</a>
</div>
(Die href ich suche ist ganz unten rechts, die hier zu sehen ist unbequem ...)
Mein scrapy versucht folgende:
next_page_url = response.xpath("//div[@class='pagingcont']//span//a[text()='Next']/href")
next_page_url = response.urljoin(next_page_url)
for href in response.css('div.propName a::attr(href)'):
url = response.urljoin(href.extract())
yield scrapy.Request(url, callback=self.parse_product_page)
yield scrapy.Request(next_page_url, callback=self.parse)
Aber jedes Mal, die Scrapy gibt mir die erste Seite der Ergebnisse und dann nichts anderes. Ich denke also nicht, dass es die nächste Seite effektiv findet. Was ist daran falsch next_page_url?
Da haben wir es. Danke sehr, sehr, jschnurr. – JMP