2016-10-05 3 views

Noch ziemlich neu zu Python und das erste Mal mit .replace und ich renne in ein seltsames Problem.Python .replace läuft zweimal

url_base = 'http://sfbay.craigslist.org/search/eby/apa' 
params = dict(bedrooms=1, is_furnished=1) 
rsp = requests.get(url_base, params=params) 
# BS4 can quickly parse our text, make sure to tell it that you're giving   html 
html = bs4(rsp.text, 'html.parser') 

# BS makes it easy to look through a document 

# BS4 can quickly parse our text, make sure to tell it that you're giving html 
html = bs4(rsp.text, 'html.parser') 

# BS makes it easy to look through a document 
# find_all will pull entries that fit your search criteria. 
# Note that we have to use brackets to define the `attrs` dictionary 
# Because "class" is a special word in python, so we need to give a string. 
apts = html.find_all('p', attrs={'class': 'row'}) 

# We can see that there's a consistent structure to a listing. 
# There is a 'time', a 'name', a 'housing' field with size/n_brs, etc. 
this_appt = apts[15] 

# So now we'll pull out a couple of things we might be interested in: 
# It looks like "housing" contains size information. We'll pull that. 
# Note that `findAll` returns a list, since there's only one entry in 
# this HTML, we'll just pull the first item. 
size = this_appt.findAll(attrs={'class': 'housing'})[0].text 
print(size) , 'this is the size' 

def find_size_and_brs(size): 
    split = size.strip('/- ').split(' - ') 
    print len(split) 
    if 'br' in split[0] and 'ft2' in split[0]: 
     print 'We made it into 1' 
     n_brs = split[0].replace('br -', '',) 
     this_size = split[0].replace('ft2 -', '') 
    elif 'br' in split[0]: 
     print 'we are in 2' 
     # It's the n_bedrooms 
     n_brs = split[0].replace('br', '') 
     this_size = np.nan 
    elif 'ft2' in split[0]: 
     print 'we are in 3' 
     # It's the size 
     this_size = split[0].replace('ft2', '') 
     n_brs = np.nan 
     print n_brs 
     print this_size 
    return float(this_size), float(n_brs) 
this_size, n_brs = find_size_and_brs(size) 

Diese Ausgänge:

We made it into 1 

      800ft2 - 

      1br - 

ich nicht herausfinden kann, warum es zweimal die Daten druckt, die Daten ein einziges Mal für jeden Datenpunkt ersetzt werden.

Gedanken? Danke


Was meinen Sie "die Daten ein einziges Mal ersetzen"? Was genau erwarten Sie stattdessen die Ausgabe? – BrenBarn


es funktioniert nicht für mich. Ich bekomme 'ValueError: ungültiges Literal für float(): 1br - 800'. Sind Sie sicher, dass Sie dieses Ergebnis mit diesem Code erhalten? Vielleicht hast du einen anderen Code? – furas


@BrenBarn Ich suche eine Ausgabe von 1 800. im Grunde die Daten ohne die br oder ft2. Macht das Sinn? –



Jetzt funktioniert für mich. Ich habe einige Änderungen mit strip, split und fügen Sie Kommentar # <- here

url_base = 'http://sfbay.craigslist.org/search/eby/apa' 
params = dict(bedrooms=1, is_furnished=1) 
rsp = requests.get(url_base, params=params) 
# BS4 can quickly parse our text, make sure to tell it that you're giving   html 
html = bs4(rsp.text, 'html.parser') 

# BS makes it easy to look through a document 

# BS4 can quickly parse our text, make sure to tell it that you're giving html 
html = bs4(rsp.text, 'html.parser') 

# BS makes it easy to look through a document 
# find_all will pull entries that fit your search criteria. 
# Note that we have to use brackets to define the `attrs` dictionary 
# Because "class" is a special word in python, so we need to give a string. 
apts = html.find_all('p', attrs={'class': 'row'}) 

# We can see that there's a consistent structure to a listing. 
# There is a 'time', a 'name', a 'housing' field with size/n_brs, etc. 
this_appt = apts[15] 

# So now we'll pull out a couple of things we might be interested in: 
# It looks like "housing" contains size information. We'll pull that. 
# Note that `findAll` returns a list, since there's only one entry in 
# this HTML, we'll just pull the first item. 
size = this_appt.findAll(attrs={'class': 'housing'})[0].text 
#print(size) , 'this is the size' 

def find_size_and_brs(size): 
    split = size.strip().split(' - ') # <- here strip() 
    #print len(split) 
    if 'br' in split[0] and 'ft2' in split[0]: 
     print 'We made it into 1' 
     two = split[0].split('\n') # <- here split() 
     n_brs = two[0].replace('br -', '',).strip() # <- here two[0] and strip() 
     this_size = two[1].replace('ft2 -', '').strip() # <- here two[1] and strip() 
     #print '>', n_brs, '<' 
     #print '>', this_size, '<' 
    elif 'br' in split[0]: 
     print 'we are in 2' 
     # It's the n_bedrooms 
     n_brs = split[0].replace('br', '') 
     this_size = np.nan 
    elif 'ft2' in split[0]: 
     print 'we are in 3' 
     # It's the size 
     this_size = split[0].replace('ft2', '') 
     n_brs = np.nan 
     print n_brs 
     print this_size 
    return float(this_size), float(n_brs) 
this_size, n_brs = find_size_and_brs(size) 
print '>', this_size, '<' 
print '>', n_brs, '<' 

ps. Ich benutze >, < in print, um Räume zu sehen.


Funktioniert gut! zusätzlich liebe ich den Tipp! lässt die Drucke so gut aussehen! –