Ich versuche ItemLoader zu verwenden drei Elemente in einem Array wie folgt zu kombinieren:Scrapy ItemLoader Artikel kombiniert
[
{
site_title: "Some Site Title",
anchor_text: "Click Here",
link: "http://example.com/page"
}
]
Wie Sie in der unten stehenden JSON sehen können, ist es alle Elemente eines Typs zusammen zu kombinieren.
Wie sollte ich damit umgehen, um ein JSON mit Arrays wie ich suche ausgeben?
Spinne Datei:
import scrapy
from linkfinder.items import LinkfinderItem
from scrapy.loader import ItemLoader
class LinksSpider(scrapy.Spider):
name = "links"
allowed_domains = ["wpseotest.com"]
start_urls = ["https://wpseotest.com"]
def parse(self, response):
l = ItemLoader(item=LinkfinderItem(), response=response)
l.add_xpath('site_title', '//title/text()')
l.add_xpath('anchor_text', '//a//text()')
l.add_xpath('link', '//a/@href')
return l.load_item()
pass
Items.py
import scrapy
from scrapy import item, Field
class LinkfinderItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
site_title = Field()
anchor_text = Field()
link = Field()
pass
JSON Ausgabe
[
{"anchor_text": ["Globex Corporation", "Skip to content", "Home", "About", "Globex News", "Events", "Contact Us", "3999 Mission Boulevard,\r", "San Diego, CA 92109", "This is a test scheduled\u00a0post.", "Test Title", "Globex Subsidiary Ice Cream Inc. Creates Chicken Wing\u00a0Flavor", "Globex Inc.", "\r\n", "Blog at WordPress.com."], "link": ["https://wpseotest.com/", "#content", "https://wpseotest.com/", "https://wpseotest.com/about/", "https://wpseotest.com/globex-news/", "https://wpseotest.com/events/", "https://wpseotest.com/contact-us/", "http://maps.google.com/maps?z=16&q=3999+mission+boulevard,+san+diego,+ca+92109", "https://wpseotest.com/2016/08/19/this-is-a-test-scheduled-post/", "https://wpseotest.com/2016/06/28/test-title/", "https://wpseotest.com/2015/10/18/globex-subsidiary-ice-cream-inc-creates-chicken-wing-flavor/", "https://wpseotest.wordpress.com", "https://wordpress.com/?ref=footer_blog"], "site_title": ["Globex Corporation \u2013 We make things better, or, sometimes, worse."]}
]
Sie Pipelines verwenden können, um Ihren Wunsch Ausgang zu machen/erstellen –