2017-03-16 4 views
0

Diese Curl funktioniert.Kann nicht API mit Benutzername und Passwort in Scrapy

https://user:[email protected]/v1/convert_from.json/?from=1000000&to=SGD&amount=AED,AUD,BDT&inverse=True 

Aber diese Scrapy Anfrage funktioniert nicht.

yield scrapy.Request("https://justanalyticspteltd65986537:[email protected]/v1/convert_from.json/?from=1000000&to=SGD&amount=AED,AUD,BDT&inverse=True") 

It returns this error: 

Traceback (most recent call last): 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\internet\defer.py", line 1297, in _inlineCallbacks 
    result = result.throwExceptionIntoGenerator(g) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\python\failure.py", line 389, in throwExceptionIntoGenerator 
    return g.throw(self.type, self.value, self.tb) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\core\downloader\middleware.py", line 43, in process_request 
    defer.returnValue((yield download_func(request=request,spider=spider))) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\utils\defer.py", line 45, in mustbe_deferred 
    result = f(*args, **kw) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\core\downloader\handlers\__init__.py", line 65, in download_request 
    return handler.download_request(request, spider) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 61, in download_request 
    return agent.download_request(request) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\core\downloader\handlers\http11.py", line 286, in download_request 
    method, to_bytes(url, encoding='ascii'), headers, bodyproducer) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\web\client.py", line 1596, in request 
    endpoint = self._getEndpoint(parsedURI) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\web\client.py", line 1580, in _getEndpoint 
    return self._endpointFactory.endpointForURI(uri) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\web\client.py", line 1456, in endpointForURI 
    uri.port) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\scrapy\core\downloader\contextfactory.py", line 59, in creatorForNetloc 
    return ScrapyClientTLSOptions(hostname.decode("ascii"), self.getContext()) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\internet\_sslverify.py", line 1201, in __init__ 
    self._hostnameBytes = _idnaBytes(hostname) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\twisted\internet\_sslverify.py", line 87, in _idnaBytes 
    return idna.encode(text) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\idna\core.py", line 355, in encode 
    result.append(alabel(label)) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\idna\core.py", line 276, in alabel 
    check_label(label) 
    File "d:\kerja\hit\python~1\justan~1\curren~1\lib\site-packages\idna\core.py", line 253, in check_label 
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label))) 
InvalidCodepoint: Codepoint U+003A at position 28 of u'xxxxxxxxxxxxxxxxxxxxxxxxxxxx:[email protected]' not allowed 
+0

Sie haben Ihre Anmeldeinformationen in der zweiten URL. Und Code 500 bedeutet, dass auf dem Server während der Verarbeitung Ihrer Anfrage ein Fehler aufgetreten ist, damit etwas nicht stimmt. – Granitosaurus

+0

Ich aktualisiere meine Frage. Bisher habe ich Crawlera nicht deaktiviert –

Antwort

1

Scrapy unterstützt keine HTTP-Authentifizierung über URL. Wir müssen stattdessen HTTPAuthMiddleware verwenden.

in settings.py:

DOWNLOADER_MIDDLEWARES = { 
    'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware': 811, 
} 

in der Spinne:

from scrapy.spiders import CrawlSpider 

class SomeIntranetSiteSpider(CrawlSpider): 

    http_user = 'someuser' 
    http_pass = 'somepass' 
    name = 'intranet.example.com' 

    # .. rest of the spider code omitted ... 
+0

Beachten Sie, dass es eine offene Pull-Anforderung mit einer Implementierung zum Lesen von Anmeldeinformationen aus URLs gibt: https://github.com/scrapy/scrapy/pull/1466 –

Verwandte Themen