Python Funktion Teilstring Spiel

Ich habe einen Pandas Datenrahmen wie folgt aus:Python Funktion Teilstring Spiel

a  b  c 
foo bar baz 
bar foo baz 
foobar barfoo baz

ich die folgende Funktion in Python definiert haben:

def somefunction (row): 
    if row['a'] == 'foo' and row['b'] == 'bar': 
     return 'yes' 
    return 'no'

Es funktioniert völlig in Ordnung. Aber ich muss eine kleine Änderung an der if Funktion vornehmen, um partial string Übereinstimmungen in Betracht zu ziehen.

Ich habe mehrere Kombinationen ausprobiert, aber ich kann es nicht funktionieren. Ich erhalte die folgende Fehlermeldung:

("'str' object has no attribute 'str'", 'occurred at index 0')

Die Funktion Iv'e versucht ist:

def somenewfunction (row): 
    if row['a'].str.contains('foo')==True and row['b'] == 'bar': 
     return 'yes' 
    return 'no'

Quelle

2017-11-02 Kvothe

Verwenden contains für boolean Maske und dann numpy.where:

m = df['a'].str.contains('foo') & (df['b'] == 'bar') 
print (m) 
0  True 
1 False 
2 False 
dtype: bool 

df['new'] = np.where(m, 'yes', 'no') 
print (df) 
     a  b c new 
0  foo  bar baz yes 
1  bar  foo baz no 
2 foobar barfoo baz no

Oder wenn Bedarf alo Überprüfen Sie die Spalte b für Teilstrings:

m = df['a'].str.contains('foo') & df['b'].str.contains('bar') 
df['new'] = np.where(m, 'yes', 'no') 
print (df) 
     a  b c new 
0  foo  bar baz yes 
1  bar  foo baz no 
2 foobar barfoo baz yes

Wenn Bedarf benutzerdefinierte Funktion sollte, was in größeren DataFrame slowier sein:

def somefunction (row): 
    if 'foo' in row['a'] and row['b'] == 'bar': 
     return 'yes' 
    return 'no' 

print (df.apply(somefunction, axis=1)) 
0 yes 
1  no 
2  no 
dtype: object

def somefunction (row): 
    if 'foo' in row['a'] and 'bar' in row['b']: 
     return 'yes' 
    return 'no' 

print (df.apply(somefunction, axis=1)) 
0 yes 
1  no 
2 yes 
dtype: object

Timings:

df = pd.concat([df]*1000).reset_index(drop=True) 

def somefunction (row): 
    if 'foo' in row['a'] and row['b'] == 'bar': 
     return 'yes' 
    return 'no' 

In [269]: %timeit df['new'] = df.apply(somefunction, axis=1) 
10 loops, best of 3: 60.7 ms per loop 

In [270]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no') 
100 loops, best of 3: 3.25 ms per loop

df = pd.concat([df]*10000).reset_index(drop=True) 

def somefunction (row): 
    if 'foo' in row['a'] and row['b'] == 'bar': 
     return 'yes' 
    return 'no' 

In [272]: %timeit df['new'] = df.apply(somefunction, axis=1) 
1 loop, best of 3: 614 ms per loop 

In [273]: %timeit df['new1'] = np.where(df['a'].str.contains('foo') & (df['b'] == 'bar'), 'yes', 'no') 
10 loops, best of 3: 23.5 ms per loop

Quelle

2017-11-02 12:12:32 jezrael

Ah ja. Es ist das "in der Reihe", das ich vermisste. Vielen Dank! – Kvothe

Ihre Ausnahme ist wahrscheinlich aus der Tatsache, dass Sie

if row['a'].str.contains('foo')==True

entfernen '.str' schreiben:

if row['a'].contains('foo')==True

Quelle

2017-11-02 12:29:58 alkanen

Python Funktion Teilstring Spiel

Antwort

Verwandte Themen