2017-11-09 11 views
0

Ich habe den folgenden Code und es funktioniert ganz gut, dank der Hilfe von jedermann hier. Ich habe versucht, nach einem relevanten Thread zu suchen, der die Frage beantwortet, die ich habe, aber ich konnte einen nicht finden, also hier geht es.Webscraping Mehrere Seiten - Python

Wie kann ich mehrere Seiten zu diesem Code hinzufügen, damit er in eine CSV-Datei entsprechend gedruckt wird?

Hier sind ein paar der Seiten, die ich hinzufügen möchte (es wäre mehr als nur die extra 3) Vielen Dank für Ihre Hilfe.

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL'

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL'

'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'

Unten ist der Code:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 


#setting my_url to the wesite 
my_url = 'https://www.publicstorage.com/north-carolina/self-storage- 
charlotte-nc/28206-self-storage/2334? 
lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 

#Opening up connection, grabbing the page 
uClient = uReq(my_url) 

#naming uClient to page_html 
page_html = uClient.read() 

#closing uClient 
uClient.close() 

#this does my html parsing 
page_soup = soup(page_html, "html.parser") 

#setting container to capture where the actual info is using inspect element 
#grabs each product 
containers = page_soup.findAll("li",{"class":"srp_res_row plp"}) 
store_locator = page_soup.findAll("div", {"itemprop":"address"}) 

filename = "product.csv" 
f = open(filename, "w") 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, 
street_address, store_city\n" 

f.write(headers) 

for container in containers: 
    for store_location in store_locator: 
     street_address = store_location.findAll("span", 
{"itemprop":"streetAddress"}) 
     store_city = store_location.findAll("span", 
{"itemprop":"addressLocality"}) 
    title_container = container.div.div 
    unit_size = title_container.text 
    size_dim = container.findAll("div", {"class":"srp_label srp_font_14"}) 
    unit_container = container.li 
    unit_type = unit_container.text 
    online_price = container.findAll("div", {"class":"srp_label alt-price"}) 
    reg_price = container.findAll("div", {"class":"reg-price"}) 


    for item in zip(unit_size,size_dim,unit_container,online_price,reg_price,street_address,stor 
e_city): 
     csv=item[0] + "," + item[1].text + "," + item[2] + "," + 
item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text 
+ "\n" 
     f.write(csv) 

Hier ist die HTML-Skript;

<li class="srp_res_row plp"> 
 
    <div class="srp_res_clm srp_clm160"> 
 
     <div class="srp_label plp">Small</div> 
 
     <div class="srp_v-space_3"></div> 
 
     <div class="srp_label srp_font_14" style="padding-left: 5px;">5' x 10'</div> 
 
     <div class="srp_v-space_3"></div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm120"> 
 
     <ul class="srp_list"> 
 
      <li>Outside unit/Drive-up access</li> 
 
     </ul> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label">$1<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_10">1st Month</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm90"> 
 
     <div class="srp_label alt-price">$56/mo.</div> 
 
     <div class="online-special">Online Special<span class="srp_label_symbol">†</span></div> 
 
     <div class="srp_v-space_15"></div> 
 
     <div class="reg-price">$70 In-store</div> 
 
    </div> 
 
    <div class="srp_res_clm srp_clm100 srp_vcenter"><a class="srp_continue unit-no-deposit" data-deposit-amount="0" data-deposit-days="0" data-features="Outside unit/Drive-up access" data-marketing-size="5x10" data-ppk="altproduct_price" data-promotionid="132" data-siteid="2334" data-size-description="5' x 10'" data-sizeid="613573" data-wc2-unit="false" href="/ReservationDetails.aspx?st=2334&amp;sz=613573&amp;key=[rnd]&amp;location=&amp;plp=1&amp;rk=&amp;ismi=1&amp;sp=Charlotte%7c35.2270869%7c-80.8431267&amp;clp=1"><img alt="Continue" src="/images/srp-cont-new-80.png" style="width: 80px; height: 32px"/></a></div> 
 
</li>

+0

Sie die URLs in einer Liste und Schleife über jede speichern URL und dann verschrotten und speichern Sie die CSV. – Ali

+0

@Ali - danke für die schnelle Antwort. Würde es Ihnen etwas ausmachen, mir zu zeigen, wie ich das mache? –

+0

Bitte beachten Sie die Antwort unten. – Ali

Antwort

0

Code:

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

# setting my_url to the wesite 
urls = ['https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28206-self-storage/2334?lat=35.23552&lng=-80.83296&clp=1&sp=Charlotte|35.2270869|-80.8431267&ismi=1' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28205-self-storage/1796?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28215-self-storage/2079?PID=PSLocalSearch&CID=1341&CHID=LL' 
    , 'https://www.publicstorage.com/north-carolina/self-storage-charlotte-nc/28213-self-storage/2441?PID=PSLocalSearch&CID=1341&CHID=LL'] 

filename = "product.csv" 
open(filename, 'w').close() 
f = open(filename, "a") 
num = 0 

headers = "unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city\n" 

f.write(headers) 

for my_url in urls: 
    # Opening up connection, grabbing the page 
    uClient = uReq(my_url) 

    # naming uClient to page_html 
    page_html = uClient.read() 

    # closing uClient 
    uClient.close() 

    # this does my html parsing 
    page_soup = soup(page_html, "html.parser") 

    # setting container to capture where the actual info is using inspect element 
    # grabs each product 
    containers = page_soup.findAll("li", {"class": "srp_res_row plp"}) 
    store_locator = page_soup.findAll("div", {"itemprop": "address"}) 

    f.write("website " + str(num) + ": \n") 
    for container in containers: 
     for store_location in store_locator: 
      street_address = store_location.findAll("span", {"itemprop": "streetAddress"}) 
      store_city = store_location.findAll("span", {"itemprop": "addressLocality"}) 
      title_container = container.div.div 
      unit_size = title_container.text 
      size_dim = container.findAll("div", {"class": "srp_label srp_font_14"}) 
      unit_container = container.li 
      unit_type = unit_container.text 
      online_price = container.findAll("div", {"class": "srp_label alt-price"}) 
      reg_price = container.findAll("div", {"class": "reg-price"}) 

     for item in zip(unit_size, size_dim, unit_container, online_price, reg_price, street_address, store_city): 
      csv = item[0] + "," + item[1].text + "," + item[2] + "," + item[3].text + "," + item[4].text + "," + item[5].text + "," + item[6].text + "\n" 
      f.write(csv) 
    num += 1 

Ausgang (der Gehalt an product.csv):

unit_size, size_dim1, unit_type, online_price, reg_price, street_address, store_city 
website 0: 
S,5' x 10',Outside unit/Drive-up access,$55/mo.,$68 In-store,1001 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$68/mo.,$84 In-store,1001 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$101/mo.,$126 In-store,1001 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$154/mo.,$187 In-store,1001 N Tryon St,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$167/mo.,$208 In-store,1001 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$172/mo.,$209 In-store,1001 N Tryon St,Charlotte 
L,15' x 20',Outside unit/Drive-up access,$193/mo.,$241 In-store,1001 N Tryon St,Charlotte 
website 1: 
S,5' x 5',Outside unit/Drive-up access,$50/mo.,$60 In-store,3710 Monroe Road,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$53/mo.,$66 In-store,3710 Monroe Road,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$55/mo.,$68 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$97/mo.,$118 In-store,3710 Monroe Road,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$100/mo.,$124 In-store,3710 Monroe Road,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$128/mo.,$159 In-store,3710 Monroe Road,Charlotte 
M,10' x 10',Climate Controlled,$129/mo.,$157 In-store,3710 Monroe Road,Charlotte 
L,20' x 30',Outside unit/Drive-up access,$292/mo.,$356 In-store,3710 Monroe Road,Charlotte 
website 2: 
S,5' x 10',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$36/mo.,$45 In-store,5301 N Sharon Amity Rd,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$42/mo.,$53 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$80/mo.,$99 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$87/mo.,$108 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$100/mo.,$124 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Outside unit/Drive-up access,$100/mo.,$125 In-store,5301 N Sharon Amity Rd,Charlotte 
M,10' x 10',Climate Controlled,$112/mo.,$139 In-store,5301 N Sharon Amity Rd,Charlotte 
L,10' x 25',Outside unit/Drive-up access,$121/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 10',Climate Controlled,$123/mo.,$153 In-store,5301 N Sharon Amity Rd,Charlotte 
L,20' x 20',Outside unit/Drive-up access,$135/mo.,$168 In-store,5301 N Sharon Amity Rd,Charlotte 
website 3: 
S,3' x 3',Inside unit/1st Floor,$17/mo.,$22 In-store,4730 N Tryon St,Charlotte 
S,5' x 5',Outside unit/Drive-up access,$35/mo.,$43 In-store,4730 N Tryon St,Charlotte 
S,5' x 10',Outside unit/Drive-up access,$39/mo.,$49 In-store,4730 N Tryon St,Charlotte 
S,10' x 5',Outside unit/Drive-up access,$40/mo.,$50 In-store,4730 N Tryon St,Charlotte 
M,5' x 15',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,20' x 5',Outside unit/Drive-up access,$65/mo.,$81 In-store,4730 N Tryon St,Charlotte 
M,10' x 10',Outside unit/Drive-up access,$66/mo.,$82 In-store,4730 N Tryon St,Charlotte 
L,10' x 15',Outside unit/Drive-up access,$84/mo.,$105 In-store,4730 N Tryon St,Charlotte 
L,10' x 20',Outside unit/Drive-up access,$136/mo.,$169 In-store,4730 N Tryon St,Charlotte 
+0

@ Ali - Danke! –