2017-09-02 1 views
0

Ich habe ein Scrapy-Projekt, die Middleware-Installation über Pip verwenden. Genauer gesagt scrapy-random-useragent.
Wie pip verwenden, um Middleware auf Scrapinghub zu installieren

Datei einstellen # - - Programmierung: utf-8- -

# Scrapy settings for batdongsan project 
# 
# For simplicity, this file contains only settings considered important or 
# commonly used. You can find more settings consulting the documentation: 
# 
#  http://doc.scrapy.org/en/latest/topics/settings.html 
#  http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html 
#  http://scrapy.readthedocs.org/en/latest/topics/spider-middleware.html 

BOT_NAME = 'batdongsan' 

SPIDER_MODULES = ['batdongsan.spiders'] 
NEWSPIDER_MODULE = 'batdongsan.spiders' 
FEED_EXPORT_ENCODING = 'utf-8' # make output in json become human readable utf-8 
CLOSESPIDER_PAGECOUNT = 10 # limit the number of page crawl 
LOG_LEVEL = 'INFO' # write less log 

# Obey robots.txt rules 
ROBOTSTXT_OBEY = True 

# Enable or disable downloader middlewares 
# See http://scrapy.readthedocs.org/en/latest/topics/downloader-middleware.html 
DOWNLOADER_MIDDLEWARES = { 
    #'batdongsan.middlewares.MyCustomDownloaderMiddleware': 543, 
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None, 
    'random_useragent.RandomUserAgentMiddleware': 400 
} 
USER_AGENT_LIST = "agents.txt" 

Das scrapy Projekt gut läuft auf meinem Rechner.
Ich deploy auf Scrapinghub mit verknüpften Github-Projekt.
Ich habe den Fehler auf Protokolle auf Scrapinghub.

File "/usr/local/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 57, in run 
    self.crawler_process.crawl(spname, **opts.spargs) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 168, in crawl 
    return self._crawl(crawler, *args, **kwargs) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 172, in _crawl 
    d = crawler.crawl(*args, **kwargs) 
    File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1445, in unwindGenerator 
    return _inlineCallbacks(None, gen, Deferred()) 
--- <exception caught here> --- 
    File "/usr/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks 
    result = g.send(result) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 95, in crawl 
    six.reraise(*exc_info) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 77, in crawl 
    self.engine = self._create_engine() 
    File "/usr/local/lib/python2.7/site-packages/scrapy/crawler.py", line 102, in _create_engine 
    return ExecutionEngine(self, lambda _: self.stop()) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 69, in __init__ 
    self.downloader = downloader_cls(crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/core/downloader/__init__.py", line 88, in __init__ 
    self.middleware = DownloaderMiddlewareManager.from_crawler(crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 58, in from_crawler 
    return cls.from_settings(crawler.settings, crawler) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/middleware.py", line 34, in from_settings 
    mwcls = load_object(clspath) 
    File "/usr/local/lib/python2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object 
    mod = import_module(module) 
    File "/usr/local/lib/python2.7/importlib/__init__.py", line 37, in import_module 
    __import__(name) 
exceptions.ImportError: No module named random_useragent 

ist es klar, dass das Problem No module named random_useragent ist.

Aber ich weiß nicht, wie man dieses Modul über pip auf Scrapinghub installiert.

+0

Lasen Sie diese https://shub.readthedocs.io/en/stable/deploying.html? –

+0

Siehe meine Antwort https://stackoverflow.com/a/43427263/4094231 – Umair

Antwort

1

Wenn GitHub Repositories mit Python Abhängigkeiten von Scrapinghub verknüpfen, werden Sie 2 Dateien auf der Wurzel Ihrer Repository haben müssen (dh auf dem gleichen Niveau wie Ihre scrapy.cfg-Datei):

  • scrapinghub.yml
  • requirements.txt

Sie sollten die gleichen Dinge enthalten, wie in der shub deploy section from their docs beschrieben:

scrapinghub.yml:

requirements: 
    file: requirements.txt 

requirements.txt

scrapy-random-useragent 
Verwandte Themen