2013-09-21 3 views
6

Ich kratze eine Website, um den Lagerstatus verschiedener Produkte zu überprüfen. Leider erfordert dies, dass tatsächlich auf der Produktseite auf "In den Einkaufswagen" geklickt wird und die Nachricht auf der nächsten Seite geprüft wird, um festzustellen, ob der Bestand verfügbar ist (d. H., Dass zwei Antworten analysiert werden müssen).Warum folgt meine Scrapy-Spinne nicht dem Request-Rückruf in meiner Artikelanalysefunktion?

Ich folgte der excellent documentation für dieses Szenario und schrieb meine Parse-Funktion, um ein Request Objekt mit einem Rückruf an meine sekundäre Parse-Funktion zurückzugeben. Diese Funktion wird jedoch selten aufgerufen. Die meisten Produkte führen dazu, dass nur "Vor der Rückgabeanforderung" im Protokoll angezeigt wird, es wird jedoch für eine kleine Portion von Produkten ordnungsgemäß aufgerufen.

Irgendwelche Hinweise, was hier falsch läuft? Mir sind die Ideen ausgegangen.

foo/spiders/atlantic_firearms_spider.py:

from scrapy.contrib.spiders import CrawlSpider, Rule 
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor 
from scrapy.selector import HtmlXPathSelector 
from scrapy.http import FormRequest 
from foo.items import AtlanticFirearmsItem 

import datetime 
import re 

class AtlanticFirearmsSpider(CrawlSpider): 
    name = "atlantic_firearms" 
    allowed_domains = ["atlanticfirearms.com"] 
    start_urls = [ 
     "http://www.atlanticfirearms.com" 
    ] 

    rules = (
     Rule(SgmlLinkExtractor(allow=['detail.html']), callback='parse_product'), 
     Rule(SgmlLinkExtractor(allow=[], deny=['/bro', '/news', '/howtobuy', '/component/search', 'askquestion'])), 
    ) 

    def parse_product(self, response): 
     hxs = HtmlXPathSelector(response) 
     product = AtlanticFirearmsItem() 
     add_to_cart = any([hxs.select("descendant-or-self::input[@name = 'addtocart']"), 
         hxs.select("descendant-or-self::input[@value = 'Add to Cart']"), 
         hxs.select("//a[text() = 'Add to Cart']")]) 
     product['url'] = response.url 
     product['as_of_time'] = datetime.datetime.now() 

     if add_to_cart: 
      # attempt to add to cart to verify availability 
      request = FormRequest.from_response(response, formname="addtocartForm", callback=self.parse_add_to_cart) 
      request.meta['product'] = product 
      print "Before return request" 
      return request 
     else: 
      product['in_stock'] = False 
      return product 

    def parse_add_to_cart(self, response): 
     print "Inside parse_add_to_cart" 
     product = response.meta['product'] 
     hxs = HtmlXPathSelector(response) 
     product['in_stock'] = not(hxs.select("//text()[contains(.,'We regret to inform you that this product')]")) 
     return product 

foo/items.py:

from scrapy.item import Item, Field 

class AtlanticFirearmsItem(Item): 
    in_stock = Field() 
    url = Field() 
    as_of_time = Field() 

bearbeiten: Hinzufügen von Log-Datei wie gewünscht:

2013-09-21 07:25:14-0500 [scrapy] INFO: Scrapy 0.18.2 started (bot: foo) 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Optional features available: ssl, http11 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Overridden settings: {'SPIDER_MODULES': ['foo.spiders'], 'BOT_NAME': 'foo'} 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRef 
reshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Spider opened 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com> (referer: None) 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cloudflare.com': <GET http://www.cloudflare.com/email-protection> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.constantcontact.com': <GET http://www.constantcontact.com/jmml/email-marketing.jsp> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.fdicreative.com': <GET http://www.fdicreative.com/> 
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redjacketfirearms.com': <GET https://www.redjacketfirearms.com/> 
2013-09-21 07:25:17-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-ammunition-45acp-500-round-case-detail. 
html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.h 
tml?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Filtered duplicate request: <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pisto 
l-9mm-detail.html?Itemid=0> - no more duplicates will be shown (see DUPEFILTER_CLASS) 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-223-ar15-rifle-ammo-500-round-case-deta 
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/us-palm-air-save-plate-carrier-detail.html?I 
temid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/545-x-39-russian-ak74-ammo-1080-round-case-d 
etail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/red-army-standard-7-62x39mm-360-round-range- 
pack-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-mp5-style-rifle-detail.html?Itemid=0> (
referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/wolf-ammunition-for-sale-ak47-detail.html?Item 
id=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/dsa-zm4-flat-top-ar15-carbine-dszm4cv1r-detail.html 
?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/m92-ak47-yugoslavian-7-62x39mm-bolt-hold-open- 
metal-mags-pack-of-two-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-v94-9mm-mp5-style-pistol-full-size-deta 
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/zastava-ak-47-m70b1-pap-7-62x39mm-rifles-w-2-hi-cap 
-mags-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ptr-91-gi-rifle-939-atlanticfirearms-com-detail.htm 
l?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/pap-m92-7-62x39-pistol-detail.html?Itemid=0> (refer 
er: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/content/article/86-static-pages/159-resources.html> (referer: http://www.atlan 
ticfirearms.com) 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atsconsultingcorp.com': <GET http://www.atsconsultingcorp.com/>         [52/1905] 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bullseyemarket.com': <GET http://www.bullseyemarket.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.corilam.com': <GET http://www.corilam.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'chancebrownrealestate.com': <GET http://chancebrownrealestate.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.delsolservices.com': <GET http://www.delsolservices.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.elkhornoutfitters.com': <GET http://www.elkhornoutfitters.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.frontierlogistics.com': <GET http://www.frontierlogistics.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gpstrackingkey.com': <GET http://www.gpstrackingkey.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.hanshawkennedy.com': <GET http://www.hanshawkennedy.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'worldenv.com': <GET http://worldenv.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'purgexonline.com': <GET http://purgexonline.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bumpfirestocks.com': <GET http://bumpfirestocks.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texrestaurantequipment.com': <GET http://www.texrestaurantequipment.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.houston-refinance.com': <GET http://www.houston-refinance.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'johnson-bryan.com': <GET http://johnson-bryan.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'kanesforms.com': <GET http://kanesforms.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.markfoxrealestate.com': <GET http://www.markfoxrealestate.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.mphoa.org': <GET http://www.mphoa.org/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.outfitterwebsites.com': <GET http://www.outfitterwebsites.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'outdoortrailsnetwork.com': <GET http://outdoortrailsnetwork.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.psychologicalriskservices.com': <GET http://www.psychologicalriskservices.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rcshouston.com': <GET http://www.rcshouston.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rollingcreekcarwash.com': <GET http://www.rollingcreekcarwash.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'slammc.com': <GET http://slammc.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texassaltwaterfishingguide.com': <GET http://www.texassaltwaterfishingguide.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.waynepigment.com': <GET http://www.waynepigment.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bancroftfeldman.com': <GET http://bancroftfeldman.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'elilanddesign.com': <GET http://elilanddesign.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpharms.com': <GET http://dpharms.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'contractlandstaff.com': <GET http://contractlandstaff.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'knightsplumbing.com': <GET http://knightsplumbing.com/> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ati-omni-5-56-poly-competition-m4-carbine-detail.ht 
ml?Itemid=0> (referer: http://www.atlanticfirearms.com) 
Before return request 
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/featured-not-published/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-accessories/index.php> 
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/dallas-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-accessories/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/component/content/?Itemid=803&id=148> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/houston-texas-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/california-gun-shop.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi 
rtuemart/shipping-rifles/index.php> 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/browse-our-products.html> (referer: http://www.atlanticfirearms.com/component/virtuemart 
/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0) 
Inside parse_add_to_cart 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Scraped from <200 http://www.atlanticfirearms.com/browse-our-products.html> 
     {'as_of_time': datetime.datetime(2013, 9, 21, 7, 25, 18, 365559), 
     'in_stock': True, 
     'url': 'http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0'} 
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/www.atlanticfirearms.com> (referer: http://www.atlanticfirearms.com/dallas-gun-shop.html 
) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/login-or-register/editaddress.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/privacy-policy.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/subscribe.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/links.html> (referer: http://www.atlanticfirearms.com) 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunbroker.com': <GET http://www.gunbroker.com/user/dealernetwork.asp> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.auctionarms.com': <GET http://www.auctionarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunsamerica.com': <GET http://www.gunsamerica.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ar15.com': <GET http://www.ar15.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.olyarms.com': <GET http://www.olyarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cheaperthandirt.com': <GET http://www.cheaperthandirt.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ammoman.com': <GET http://www.ammoman.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.net': <GET http://www.ak47.net/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atf.treas.gov': <GET http://www.atf.treas.gov/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'caag.state.ca.us': <GET http://caag.state.ca.us/firearms/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.nra.org': <GET http://www.nra.org/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.masterpiecearms.com': <GET http://www.masterpiecearms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'atlantic1.readyhosting.com': <GET http://atlantic1.readyhosting.com/programming/listview.asp?CatId=2> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vulcanarmament.com': <GET http://www.vulcanarmament.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bushmaster.com': <GET http://www.bushmaster.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rockriverarms.com': <GET http://www.rockriverarms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpmsinc.com': <GET http://dpmsinc.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.colt.com': <GET http://www.colt.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.armalite.com': <GET http://www.armalite.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redstick-firearms.com': <GET http://www.redstick-firearms.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vectorarms.com': <GET http://www.vectorarms.com/indexframe.html> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.arsenalinc.com': <GET http://www.arsenalinc.com/about.htm> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.com': <GET http://www.ak47.com/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.jldenter.com': <GET http://www.jldenter.com/store/> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.springfield-armory.com': <GET http://www.springfield-armory.com/index.shtml> 
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.dsarms.com': <GET http://www.dsarms.com/> 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT twice, forcing unclean shutdown 
+2

Vielleicht ist die URL, die Sie angefordert sind nicht in den 'allowed_domains' – ben

+0

könnten Sie Ihr Konsolenprotokoll veröffentlichen? (mit LOG_LEVEL = 'DEBUG', der Standardwert in settings.py) –

+0

@pault. Logbuch hinzugefügt wie gewünscht. Wie Sie sehen können, werden die meisten Elemente so weit wie "Vorher Rückgabeanforderung" gedruckt, verschwinden dann aber in einem schwarzen Loch, da der Aufruf "parse_add_to_cart" für sie nie aufgerufen wird. ['vector-arms-sp89-k-style-pistole-9mm'] (http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol- 9mm-detail.html) macht es aus irgendeinem Grund konsequent bis 'parse_add_to_cart'. –

Antwort

23

der Veröffentlichung meiner früheren Kommentar als Antwort.

Wie alle Ihre POST-Anfragen (von FormRequest.from_response() kommend) zu http://www.atlanticfirearms.com/browse-our-products.html weitergeleitet, sollten Sie setzen dont_filter=True:

if add_to_cart: 
     # attempt to add to cart to verify availability 
     request = FormRequest.from_response(response, formname="addtocartForm", 
         callback=self.parse_add_to_cart, dont_filter=True) 

Scrapy docs on requests Siehe:

dont_filter (boolean) -, dass dies zeigt, Anfrage sollte nicht vom Scheduler gefiltert werden. Dies wird verwendet, wenn Sie eine identische Anforderung mehrmals ausführen möchten, um den Duplikatfilter zu ignorieren.

Zusätzlich können Sie CONCURRENT_REQUESTS = 1 einstellen Artikel hinzufügen 1-by-1 in den Warenkorb gelegt (ich frage mich, wie der Server parallel Warenkorb Ergänzungen behandelt.)

+0

Danke für 'dont_filter = True', das hat mir viel Zeit gespart. – marw

Verwandte Themen