Wie erstellt man eine neue Spalte, indem man zwei Spalten in demselben Datenrahmen mit Python vergleicht?

Ich habe Datenrahmen unten gezeigt. df:Wie erstellt man eine neue Spalte, indem man zwei Spalten in demselben Datenrahmen mit Python vergleicht?

col_1 col_2 
EDU facebook 
EDU google 
EDU google_usa 
EDU tabula 
EDU xyz 
EDU abc 
IAR facebook 
IAR google

Wenn col_1 hat 'EDU' und col_2 hat 'facebook', 'google' new_col gleiche Zeichenfolge dh facebook and google haben sollte, wenn col_2 enthält 'google_usa',tabula' new_col sollte 'gusa' enthält und wenn col_2 andere Saiten hat, sollte ne_col haben others in derselbe Datenrahmen. Wenn col_1 hat 'IAR'und col_2 hat 'facebook' new_col sollte Facebook haben und für jede andere Zeichenfolge in der col_2 sollte es 'other' in demselben Datenrahmen enthalten.

Erwarteter Ausgang:

col_1 col_2  new_col 
EDU facebook facebook 
EDU google  google 
EDU google_usa gusa 
EDU tabula  gusa 
EDU xyz   others 
EDU abc   others 
IAR facebook facebook 
IAR google  others

ich unten Code versucht, aber nicht out.Please helfen Sie mir in dieser Hinsicht gearbeitet. danke im voraus.

if df['col_1'].str.contains('EDU').any(): 

     df['new_col'] = ['facebook' if 'facebook' in x else 
          'google' if 'google' == x else 
          'gcusa_tb' if 'taboola' in x else 
          'gcusa_tb' if 'google_cusa' in x else 
          'Others' for x in df['col_2']]

Quelle

2017-03-20 S.Akhil

is_edu = df.col_1 == 'EDU' 
g_or_f = df.col_2.isin(['google', 'facebook']) 
g_or_t = df.col_2.isin(['google_usa', 'tabula']) 
is_iar = df.col_1 == 'IAR' 
is_fac = df.col_2 == 'facebook' 

df.assign(
    new_col=np.where(
     is_edu, 
     np.where(
      g_or_f, df.col_2, 
      np.where(g_or_t, 'gusa', 'other') 
     ), 
     np.where(
      is_iar & is_fac, 'facebook', 'other' 

     ) 
    ) 
) 

    col_1  col_2 new_col 
0 EDU facebook facebook 
1 EDU  google google 
2 EDU google_usa  gusa 
3 EDU  tabula  gusa 
4 EDU   xyz  other 
5 EDU   abc  other 
6 IAR facebook facebook 
7 IAR  google  other

Quelle

2017-03-20 16:52:16 piRSquared

Einfach diese für zukünftige Referenz für andere Stolpern über diesen Beitrag: Dies funktioniert für das Beispiel perfekt. Aber verschachtelte 'np.where' sind immer schwer für andere Leute zu folgen. Leistung und Effizienz sind großartig, aber die Lesbarkeit kann mangelhaft sein. – MattR

@MattR auch für die Nachwelt, dieses Problem dreht sich alles um verschachtelt, wenn, dann, sonst. Wenn Lesbarkeit eine Priorität ist, können Sie das 'np.where' mit einer schöneren Funktion umhüllen. – piRSquared

würde ich ein paar numpy-Befehle verwenden:

df['new_col'] = 'others' 
df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['facebook','google'])), 'new_col'] = df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['facebook','google'])), 'col_2'] 
df.loc[np.logical_and(df.col_1=='EDU', np.in1d(df.col_2, ['google_usa','tabula'])), 'new_col'] = 'gusa'

P. S. Ihre Anfrage stimmt nicht genau mit der von Ihnen vorgeschlagenen Ausgabe überein, ich hoffe, ich habe die Anfrage richtig interpretiert. Mein Code ausgeben würde:

col_1 col_2 new_col 
0 EDU facebook facebook 
1 EDU google  google 
2 EDU google_usa gusa 
3 EDU tabula  gusa 
4 EDU xyz   others 
5 EDU abc   others 
6 IAR facebook others 
7 IAR google  others

Quelle

2017-03-20 16:53:47

Ich glaube, das ist der einfachste Weg zu verstehen, wie der Code funktioniert so, dass Sie es auf mehr Situationen als nur dieses Beispiel gelten können. Es ist ziemlich intuitiv. Sie können Logik hinzufügen, während Sie gehen.

1) Zuerst erstellen wir eine Funktion

2) die Funktion anwenden

def new_col(col): 
    if col['col1'] == 'EDU' and col['col2'] == 'facebook': 
     return 'facebook' 
    if col['col1'] == 'EDU' and col['col2'] == 'google': 
     return 'google' 
    if col['col2'] == 'google_usa' or col['col2'] == 'tabula': 
     return 'gusa' 
    if col['col1'] == 'IAR' and col['col2'] == 'facebook': 
     return 'facebook' 
    return 'others' 

df['new_col'] = df.apply(lambda col: new_col (col),axis=1)

Ausgang (meine col1 und col2 nach hinten sind. Diese Mitteilung nichts dagegen. Es war nur für mich einfacher zu lesen auf diese Weise):

  col2 col1 new_col 
0 facebook EDU facebook 
1  google EDU google 
2 google_usa EDU  gusa 
3  tabula EDU  gusa 
4   xyz EDU others 
5   abc EDU others 
6 facebook IAR facebook 
7  google IAR others

Quelle

2017-03-20 18:25:13 MattR

Wie erstellt man eine neue Spalte, indem man zwei Spalten in demselben Datenrahmen mit Python vergleicht?

Antwort

Verwandte Themen