Ich habe einige Probleme, Xpath zu verstehen.Scrapy XPath spezifisches Element von file-extension.net auswählen
Im Versuch, all die magische Zahl scrapp von http://file-extension.net
Lassen Sie uns den Link als Beispiel: http://file-extension.net/seeker/file_extension_c10
Teil des Quellcodes:
<table border=4 RULES=ROWS FRAME=HSIDES width=728>
<tr class="tabhead">
<td></td>
<td><b>Website</b></td>
<td><b> EXT </b></td>
<td><b> Filetype description</b> </td>
</tr>
<tr class="rre"><td> <img src="images/icon-filext.png" width="16" height="16"> </td><td><a href="http://filext.com/file-extension/C10">FILExt</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_irig'>IRIG</a> 106 <a class='fesl' href='program_extension_original'>Original</a> <a class='fesl' href='program_extension_recording'>Recording</a> <a class='fesl' href='program_extension_file'>File</a> (<a class='fesl' href='program_extension_range'>Range</a> <a class='fesl' href='program_extension_commanders'>Commanders</a> <a class='fesl' href='program_extension_council'>Council</a>)</td></tr>
<tr class="rro"><td> <img src="images/icon-fsorg.png" width="16" height="16"> </td><td><a href="http://www.file-extensions.org/c10-file-extension">File Extensions</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_irig'>IRIG</a> 106 <a class='fesl' href='program_extension_original'>original</a> <a class='fesl' href='program_extension_recording'>recording</a> <a class='fesl' href='program_extension_file'>file</a></td></tr>
<tr class="rre"><td> <img src="images/icon-dotwhat.png" width="16" height="16"> </td><td><a href="http://dotwhat.net/c10/9166/">DotWhat</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_archive'>Archive</a> <a class='fesl' href='program_extension_file'>File</a> <a class='fesl' href='program_extension_part'>Part</a> 10</td></tr>
<tr class="rro"><td> <img src="images/icon-fsorg.png" width="16" height="16"> </td><td><a href="http://www.file-extensions.org/c10-file-extension">File Extensions</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_multi'>Multi</a>-<a class='fesl' href='program_extension_volume'>volume</a> ACE <a class='fesl' href='program_extension_compressed'>compressed</a> <a class='fesl' href='program_extension_file'>file</a> <a class='fesl' href='program_extension_archive'>archive</a></td></tr>
<tr class="rre"><td> <img src="images/icon-trid.png" width="16" height="16"> </td><td><a href="http://mark0.net/soft-trid-e.html">TrID</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_virtual'>Virtual</a> MC-10 <a class='fesl' href='program_extension_tape'>tape</a> <a class='fesl' href='program_extension_image'>image</a><br> <b><small>Header Hexdump</b>: <span class='hexdump'> 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 55 </span></small></td></tr>
<tr class="rro"><td> <img src="images/icon-filext.png" width="16" height="16"> </td><td><a href="http://filext.com/file-extension/C10">FILExt</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_winace'>WinAce</a> <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_file'>File</a> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_portion'>Portion</a> of <a class='fesl' href='program_extension_compressed'>Compressed</a> <a class='fesl' href='program_extension_file'>File</a> (e-<a class='fesl' href='program_extension_merge'>merge</a> <a class='fesl' href='program_extension_gmbh'>GmbH</a>)</td></tr>
<tr class="rre"><td> <img src="images/icon-fileinfo.png" width="16" height="16"> </td><td><a href="http://www.fileinfo.com/extension/c10">FileInfo</a></td><td> <a class='fesl' href='file_extension_c10'>C10</a> </td><td> <a class='fesl' href='program_extension_winace'>WinAce</a> <a class='fesl' href='program_extension_split'>Split</a> <a class='fesl' href='program_extension_archive'>Archive</a> <a class='fesl' href='program_extension_part'>Part</a> 10</td></tr>
</table>
Ich will nur das bekommen Dateityp Beschreibung von Trid (der mit dem Hexadezimalwert)
Problem ist, ich weiß nicht, wie jedes Wort verursacht von Filtype Descrition sind Links.
hier ist mein Code:
for sel in response.xpath('//table[@border=4]'):
hex = sel.xpath('//span[@class="hexdump"]/text()').extract_first(default='Rien t nul')
if len(hex) > 7:
ext = sel.xpath('//a[text()="TrID"]/@href.a[@class="fesl"]/text()').extract()
print "Nom : %s Hex %s " % (ext,hex)
Natürlich //a[text()="TrID"]/@href.a[@class="fesl"
dosnt Arbeit Aber das ist, was ich will:
If you find a link name wich contains "Trid" give me it's filedescription
Jede Idee?
Thnks viel Herr! – Soroboruo
Sie sind willkommen, denken Sie daran, die Antwort zu akzeptieren, wenn es Ihnen geholfen hat. – eLRuLL