2016-05-03 15 views
1

Ich habe einige Probleme, Xpath zu verstehen.Scrapy XPath spezifisches Element von file-extension.net auswählen

Im Versuch, all die magische Zahl scrapp von http://file-extension.net

Lassen Sie uns den Link als Beispiel: http://file-extension.net/seeker/file_extension_c10

Teil des Quellcodes:

<table border=4 RULES=ROWS FRAME=HSIDES width=728> 
      <tr class="tabhead"> 
       <td></td> 
       <td><b>Website</b></td> 
       <td><b>&nbsp;EXT&nbsp;</b></td> 
       <td><b>&nbsp;Filetype description</b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</td> 
      </tr> 

<tr class="rre"><td>&nbsp;<img src="images/icon-filext.png" width="16" height="16"> &nbsp;</td><td><a href="http://filext.com/file-extension/C10">FILExt</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_irig'>IRIG</a> 106 <a class='fesl' href='program_extension_original'>Original</a> <a class='fesl' href='program_extension_recording'>Recording</a> <a class='fesl' href='program_extension_file'>File</a> (<a class='fesl' href='program_extension_range'>Range</a> <a class='fesl' href='program_extension_commanders'>Commanders</a> <a class='fesl' href='program_extension_council'>Council</a>)</td></tr> 

<tr class="rro"><td>&nbsp;<img src="images/icon-fsorg.png" width="16" height="16"> &nbsp;</td><td><a href="http://www.file-extensions.org/c10-file-extension">File Extensions</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_irig'>IRIG</a> 106 <a class='fesl' href='program_extension_original'>original</a> <a class='fesl' href='program_extension_recording'>recording</a> <a class='fesl' href='program_extension_file'>file</a></td></tr> 

<tr class="rre"><td>&nbsp;<img src="images/icon-dotwhat.png" width="16" height="16"> &nbsp;</td><td><a href="http://dotwhat.net/c10/9166/">DotWhat</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_archive'>Archive</a> <a class='fesl' href='program_extension_file'>File</a> <a class='fesl' href='program_extension_part'>Part</a> 10</td></tr> 

<tr class="rro"><td>&nbsp;<img src="images/icon-fsorg.png" width="16" height="16"> &nbsp;</td><td><a href="http://www.file-extensions.org/c10-file-extension">File Extensions</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_multi'>Multi</a>-<a class='fesl' href='program_extension_volume'>volume</a> ACE <a class='fesl' href='program_extension_compressed'>compressed</a> <a class='fesl' href='program_extension_file'>file</a> <a class='fesl' href='program_extension_archive'>archive</a></td></tr> 

<tr class="rre"><td>&nbsp;<img src="images/icon-trid.png" width="16" height="16"> &nbsp;</td><td><a href="http://mark0.net/soft-trid-e.html">TrID</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_virtual'>Virtual</a> MC-10 <a class='fesl' href='program_extension_tape'>tape</a> <a class='fesl' href='program_extension_image'>image</a><br>&nbsp;<b><small>Header Hexdump</b>: <span class='hexdump'>&nbsp;55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 &nbsp;</span></small></td></tr> 

<tr class="rro"><td>&nbsp;<img src="images/icon-filext.png" width="16" height="16"> &nbsp;</td><td><a href="http://filext.com/file-extension/C10">FILExt</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_winace'>WinAce</a> <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_file'>File</a> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_portion'>Portion</a> of <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_file'>File</a> (e-<a class='fesl' href='program_extension_merge'>merge</a> <a class='fesl' href='program_extension_gmbh'>GmbH</a>)</td></tr> 

<tr class="rre"><td>&nbsp;<img src="images/icon-fileinfo.png" width="16" height="16"> &nbsp;</td><td><a href="http://www.fileinfo.com/extension/c10">FileInfo</a></td><td>&nbsp;<a class='fesl' href='file_extension_c10'>C10</a>&nbsp;</td><td>&nbsp;<a class='fesl' href='program_extension_winace'>WinAce</a> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_archive'>Archive</a> <a class='fesl' href='program_extension_part'>Part</a> 10</td></tr> 

      </table> 

Ich will nur das bekommen Dateityp Beschreibung von Trid (der mit dem Hexadezimalwert)

Problem ist, ich weiß nicht, wie jedes Wort verursacht von Filtype Descrition sind Links.

hier ist mein Code:

for sel in response.xpath('//table[@border=4]'): 
    hex = sel.xpath('//span[@class="hexdump"]/text()').extract_first(default='Rien t nul') 
    if len(hex) > 7: 
     ext = sel.xpath('//a[text()="TrID"]/@href.a[@class="fesl"]/text()').extract() 
     print "Nom : %s Hex %s " % (ext,hex) 

Natürlich //a[text()="TrID"]/@href.a[@class="fesl" dosnt Arbeit Aber das ist, was ich will:

If you find a link name wich contains "Trid" give me it's filedescription 

Jede Idee?

Antwort

0
'//td[./a[contains(text(), "TrID")]]/following-sibling::td[2]//text()' 

Gerade TrID in der Zeile für einen anderen Text ändern, die Sie wollen.

+0

Thnks viel Herr! – Soroboruo

+0

Sie sind willkommen, denken Sie daran, die Antwort zu akzeptieren, wenn es Ihnen geholfen hat. – eLRuLL