Scrapy Extract Werte der Tags

Ich handle eine JSON Anfrage, die ein div als Wert hat. Jetzt möchte ich nur die Werte von Daten-content-WertScrapy Extract Werte der Tags

<li id="term_100800962" data-content-value='{"nl_term_id":100800962,"c_price_from":33415,"nd_price_discount":0,"nl_tour_id":1017864,"nl_hotel_id":[49316],"d_start":"2017-04-12","d_end":"2017-04-17"}' >

und speichern sie in ‚id‘ ‚Preis‘ ‚Termine‘ bekommen und ich kann nicht einen Weg finden, dies zu tun.

Gibt es einen einfachen Weg?

Quelle

2017-01-31 Kostas

In [2]: from scrapy.selector import Selector 

In [3]: text = """<li id="term_100800962" data-content-value='{"nl_term_id":100 
    ...: 800962,"c_price_from":33415,"nd_price_discount":0,"nl_tour_id":1017864," 
    ...: nl_hotel_id":[49316],"d_start":"2017-04-12","d_end":"2017-04-17"}' >""" 

In [4]: sel = Selector(text=text) 

In [5]: data_string = sel.xpath('//li/@data-content-value').extract_first() 

In [6]: import json 

In [7]: json.loads(data_string) 
Out[7]: 
{'c_price_from': 33415, 
'd_end': '2017-04-17', 
'd_start': '2017-04-12', 
'nd_price_discount': 0, 
'nl_hotel_id': [49316], 
'nl_term_id': 100800962, 
'nl_tour_id': 1017864}

Zunächst erhalten die Zeichenfolge des Attributs, dann verwenden json.loads() es Python dict konvertieren.

Diese URL wird eine Json Antwort zurück, wir lädt alle Reaktion auf json sollte und wählen Sie die Informationen, die wir brauchen:

In [11]: fetch('https://dovolena.invia.cz/direct/tour_search/ajax-next-boxes/?nl 
...: _country_id%5B0%5D=28&nl_locality_id%5B0%5D=19&d_start_from=23.01.2017& 
...: d_end_to=19.04.2017&nl_transportation_id%5B0%5D=3&sort=nl_sell&page=1&g 
...: etOptionsCount=true&base_url=https%3A%2F%2Fdovolena.invia.cz%2F') 

In [12]: j = json.loads(response.text) 
In [15]: j['boxes_html'] # this will renturn the html in json file. 
In [15]: from scrapy.selector import Selector 

In [16]: sel = Selector(text=j['boxes_html']) # loads html to selector 

In [17]: datas = sel.xpath('//li/@data-content-value').extract() # return all data in a list 
In [21]: [json.loads(d) for d in datas] # loads text to value 
      |---dict-----| 
# this will return a list of dict which generated by json.loads(d), and you can use json.loads(d)['d_end'] to access it's element.

aus:

[{'c_price_from': 15690, 
    'd_end': '2017-04-16', 
    'd_start': '2017-04-09', 
    'nd_price_discount': 27, 
    'nl_hotel_id': [24810], 
    'nl_term_id': 93902083, 
    'nl_tour_id': 839597}, 
{'c_price_from': 27371, 
    'd_end': '2017-04-17', 
    'd_start': '2017-04-12', 
    'nd_price_discount': 4, 
    'nl_hotel_id': [49316], 
    'nl_term_id': 100804770, 
    'nl_tour_id': 1017864}, 
{'c_price_from': 32175, 
    'd_end': '2017-04-17', 
    'd_start': '2017-04-12', 
    'nd_price_discount': 4, 
    'nl_hotel_id': [49316], 
    'nl_term_id': 100800962, 
    'nl_tour_id': 1017864},

Quelle

2017-01-31 15:55:57

ich eine Störung erhalte, wenn ich versuche das. – Kostas

Ich würde mich sehr freuen, wenn Sie mir helfen könnten, das ist die Codierung im Ausführen in scrapy Shell [Link] (http://pastebin.com/MYuER5xf) – Kostas

Ok überprüfen Sie den letzten Fehler, denke ich, dass ich konfrontiert bin. Ich habe oben geschrieben, dass ich das separat speichern möchte, aber es erlaubt mir nicht. Ich denke, das ist der letzte Schritt, Entschuldigung für meine Fehlbedienung. [link] (http://imgur.com/a/brMVq) – Kostas

Antwort

Verwandte Themen