2017-02-24 3 views

Antwort

0

Ich habe einen Ansatz skizziert, den Sie unten nehmen könnten.

Beachten Sie, dass zum Runden eines Werts auf die nächste ganze Zahl die integrierte Funktion round() von Python verwendet werden sollte. Details siehe round() in der Python documentation.

import pandas as pd 
import numpy as np 
# set random seed for reproducibility 
np.random.seed(748) 

# initialize base example dataframe 
df = pd.DataFrame({"date":np.arange(10), 
        "score":np.random.uniform(size=10)}) 

duplicate_dates = np.random.choice(df.index, 5) 

df_dup = pd.DataFrame({"date":np.random.choice(df.index, 5), 
         "score":np.random.uniform(size=5)}) 

# finish compiling example data 
df = df.append(df_dup, ignore_index=True) 

# calculate 0.7 quantile result with specified parameters 
result = df.groupby("date").quantile(q=0.7, axis=0, interpolation='midpoint') 

# print resulting dataframe 
# contains one unique 0.7 quantile value per date 
print(result) 

""" 
0.7  score 
date   
0  0.585087 
1  0.476404 
2  0.426252 
3  0.363376 
4  0.165013 
5  0.927199 
6  0.575510 
7  0.576636 
8  0.831572 
9  0.932183 
""" 

# to apply the resulting quantile information to 
# a new column in our original dataframe `df` 
# we can apply a dictionary to our "date" column 

# create dictionary 
mapping = result.to_dict()["score"] 

# apply to `df` to produce desired new column 
df["quantile_0.7"] = [mapping[x] for x in df["date"]] 

print(df) 

""" 
    date  score quantile_0.7 
0  0 0.920895  0.585087 
1  1 0.476404  0.476404 
2  2 0.380771  0.426252 
3  3 0.363376  0.363376 
4  4 0.165013  0.165013 
5  5 0.927199  0.927199 
6  6 0.340008  0.575510 
7  7 0.695818  0.576636 
8  8 0.831572  0.831572 
9  9 0.932183  0.932183 
10  7 0.457455  0.576636 
11  6 0.650666  0.575510 
12  6 0.500353  0.575510 
13  0 0.249280  0.585087 
14  2 0.471733  0.426252 
""" 
Verwandte Themen