2016-12-05 10 views
1

Ich arbeite derzeit an einem Projekt, für das ich ein paar tausend Zitate aus PubMed herunterladen muss. Ich bin derzeit biopython und haben diesen Code geschrieben:Urllib Fehler bei der Verwendung von BioPython

from Bio import Entrez 
from Bio import Medline 
from pandas import * 
from sys import argv 
import os 

Entrez.email = "email" 
df = read_csv("/Users/.../Desktop/sr_dataset/adhd/excluded/adhdExcluded.csv") 
i=0 
withoutMesh = 0 
withoutMeshID = "" 
withoutAbstract = 0 
withoutAbstractID = "" 
path = '/Users/.../Desktop/sr_dataset/adhd/excluded' 

for index, row in df.iterrows(): 
    print (row.id) 
    handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(row.id)) 
    records = Medline.parse(handle) 
    for record in records: 
     try: 
      abstract = str(record["AB"]) 
     except: 
      abstract = "none" 
      withoutAbstract = withoutAbstract +1 
      withoutAbstractID = withoutAbstractID + str(row.id) + "\n" 
     try: 
      title = str(record["TI"]) 
     except: 
      title = "none" 
     try: 
      mesh = str(record["MH"]) 
     except: 
      mesh = "none" 
      withoutMesh = withoutMesh +1 
      withoutMeshID = withoutMeshID + str(row.id) + "\n" 
    filename= str(row.id) + '.txt' 
    filename = os.path.join(path, filename) 
    file = open(filename, "w") 
    output = "title: "+str(title) + "\n\n" + "abstract: "+str(abstract) + "\n\n" + "mesh: "+str(mesh) + "\n\n" 
    file.write(output) 
    file.close() 
    print (i) 
    i=i+1 

filename = os.path.join(path, "overview.txt") 
file = open(filename, "w") 
output = "Without MeSH terms:" + str(withoutMesh) + "\n" + "ID's: "+str(withoutMeshID) + "\n\n" + "Without abstract: "+str(withoutAbstract) + "\n" + "ID's: "+str(withoutAbstractID) 
file.write(output) 
file.close() 

Der Code funktioniert für die ersten paar hundert Zeilen in der Tabelle, aber dann stoppt die Ausführung und die Fehler, die ich erhalten ist:

Traceback (most recent call last): 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 1254, in do_open 
    h.request(req.get_method(), req.selector, req.data, headers) 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 1106, in request 
    self._send_request(method, url, body, headers) 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 1151, in _send_request 
    self.endheaders(body) 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 1102, in endheaders 
    self._send_output(message_body) 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 934, in _send_output 
    self.send(msg) 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 877, in send 
    self.connect() 
    File "/Users/.../anaconda/lib/python3.5/http/client.py", line 1260, in connect 
    server_hostname=server_hostname) 
    File "/Users/.../anaconda/lib/python3.5/ssl.py", line 377, in wrap_socket 
    _context=self) 
    File "/Users/.../anaconda/lib/python3.5/ssl.py", line 752, in __init__ 
    self.do_handshake() 
    File "/Users/.../anaconda/lib/python3.5/ssl.py", line 988, in do_handshake 
    self._sslobj.do_handshake() 
    File "/Users/.../anaconda/lib/python3.5/ssl.py", line 633, in do_handshake 
    self._sslobj.do_handshake() 
ConnectionResetError: [Errno 54] Connection reset by peer 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "/Users/.../Desktop/sr_dataset/ace_inhibitor/excluded/pumbedMedline.py", line 18, in <module> 
    handle = Entrez.efetch(db="pubmed",rettype="medline",retmode="text", id=str(row.id)) 
    File "/Users/.../anaconda/lib/python3.5/site-packages/biopython-1.68-py3.5-macosx-10.6-x86_64.egg/Bio/Entrez/__init__.py", line 180, in efetch 
    return _open(cgi, variables, post=post) 
    File "/Users/.../anaconda/lib/python3.5/site-packages/biopython-1.68-py3.5-macosx-10.6-x86_64.egg/Bio/Entrez/__init__.py", line 524, in _open 
    handle = _urlopen(cgi) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 163, in urlopen 
    return opener.open(url, data, timeout) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 466, in open 
    response = self._open(req, data) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 484, in _open 
    '_open', req) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 444, in _call_chain 
    result = func(*args) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 1297, in https_open 
    context=self._context, check_hostname=self._check_hostname) 
    File "/Users/.../anaconda/lib/python3.5/urllib/request.py", line 1256, in do_open 
    raise URLError(err) 
urllib.error.URLError: <urlopen error [Errno 54] Connection reset by peer> 

Hier sind die ersten paar Spalten der CSV-Datei:

id 
10029645 
10073846 
10078088 
10080457 
10088066 
... 
+0

dass die vollen Zurückverfolgungs? Was ist die Fehlermeldung? –

+0

@ cricket_007 die vollständige Nachricht hinzugefügt hat. – testing

+1

die Kommentare Siehe zu diesem Beitrag. http://stackoverflow.com/q/21334966/2308683 –

Antwort

1

biopython tut den „bis zu drei Anfragen pro Sekunde-Regel“ folgen, um zu vermeiden, dass der NCBI-Server zu missbrauchen, aber Sie haben die erste Kugel verpaßt haben Punkt in unserem Tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html zu den Leitlinien:

„Für jede Serie von mehr als 100 Anfragen, dies tun am Wochenende oder außerhalb USA Spitzenzeiten. Dies liegt an Ihnen zu gehorchen.“

dieser sagte, manchmal werden Sie intermittierende Fehler von Entrez erhalten, und mit einem Try/außer Block zu handhaben mit einem erneuten Versuch vorgeschlagen wird. Es ist ein Beispiel im Tutorial ist.

Verwandte Themen