2017-03-21 2 views
0

Ich habe folgende Pandas Datenrahmen my_df:Pandas: Erstellen Sie eine neue Spalte basierend auf den Werten von mehreren anderen Spalten

col_A  col_B 
------------------- 
blue  medium 
red   small 
yellow  big 

Ich möchte ein neues col_C auf den folgenden Bedingungen basierend hinzuzufügen:

if col_A == 'blue', col_C = 'A_blue' 
if col_B == 'big', col_C = 'B_big' 

For all other cases, col_C = '' 

Um dies zu erreichen, habe ich folgendes:

def my_bad_data(row): 
    if row['col_A'] == 'blue': 
     return 'A_blue' 
    elif row['col_B'] == 'big': 
     return 'B_big' 
    else: 
     return '' 

my_df['col_C'] = my_df.apply(lambda row: my_bad_data(row)) 

Aber ich bekam die folgenden Fehler:

--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)() 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)() 

TypeError: an integer is required 

During handling of the above exception, another exception occurred: 

KeyError         Traceback (most recent call last) 
<ipython-input-20-3898742c4378> in <module>() 
----> 1 my_df['col_C'] = my_df.apply(lambda row: my_bad_data(row)) 
     2 asset_df 

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds) 
    4161      if reduce is None: 
    4162       reduce = True 
-> 4163      return self._apply_standard(f, axis, reduce=reduce) 
    4164    else: 
    4165     return self._apply_broadcast(f, axis) 

/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce) 
    4257    try: 
    4258     for i, v in enumerate(series_gen): 
-> 4259      results[i] = func(v) 
    4260      keys.append(v.name) 
    4261    except Exception as e: 

<ipython-input-20-3898742c4378> in <lambda>(row) 
----> 1 asset_df['quality_flag'] = my_df.apply(lambda row: my_bad_data(row)) 
     2 my_df 

<ipython-input-19-2a09810e2dd4> in my_bad_data(row) 
     1 def bug_function(row): 
----> 2  if row['col_A'] == 'blue': 
     3   return 'A_blue' 
     4  elif row['col_B'] == 'big': 
     5   return 'B_big' 

/usr/local/lib/python3.4/dist-packages/pandas/core/series.py in __getitem__(self, key) 
    599   key = com._apply_if_callable(key, self) 
    600   try: 
--> 601    result = self.index.get_value(self, key) 
    602 
    603    if not is_scalar(result): 

/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_value(self, series, key) 
    2167   try: 
    2168    return self._engine.get_value(s, k, 
-> 2169           tz=getattr(series.dtype, 'tz', None)) 
    2170   except KeyError as e1: 
    2171    if len(self) > 0 and self.inferred_type in ['integer', 'boolean']: 

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3342)() 

pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3045)() 

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4094)() 

KeyError: ('col_A', 'occurred at index id') 

Jede Idee, was ich hier falsch gemacht haben? Vielen Dank!

Antwort

1

Ja, ich laufe in diese halb-häufig, Sie wollen dataframe.apply(func, axis=1). Siehe die Dokumente here:

axis : {0 or ‘index’, 1 or ‘columns’}, default 0 
    0 or ‘index’: apply function to each column 
    1 or ‘columns’: apply function to each row 
Verwandte Themen