Python: Binning basierend auf 2 Spalten in Pandas

Suchen Sie nach einem schnellen und eleganten Weg zu bin basiert auf 2 Spalten in Pandas.Python: Binning basierend auf 2 Spalten in Pandas

Hier ist meine Datenrahmen

       filename height width 
0  shopfronts_23092017_3_285.jpg 750.0 560.0 
1     shopfronts_200.jpg 4395.0 6020.0 
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 
3     shopfronts_101.jpg 480.0 640.0 
4     shopfronts_138.jpg 3733.0 8498.0 
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 
6  shopfronts_25092017_neon_33.jpg 100.0 200.0 
7     shopfronts_322.jpg 682.0 1024.0 
8     shopfronts_171.jpg 800.0 600.0 
9   shopfronts_23092017_3_35.jpg 120.0 210.0

ich bin müssen die Datensätze anhand von 2 Spalten Höhe & Breite (Bildauflösungen)

Ich bin auf der Suche nach so etwas wie dieses

       filename height width group 
0  shopfronts_23092017_3_285.jpg 750.0 560.0  g3 
1     shopfronts_200.jpg 4395.0 6020.0  g4 
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 
3     shopfronts_101.jpg 480.0 640.0 others 
4     shopfronts_138.jpg 3733.0 8498.0  g4 
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0  g1 
6  shopfronts_25092017_neon_33.jpg 100.0 200.0  g1 
7     shopfronts_322.jpg 682.0 1024.0 others 
8     shopfronts_171.jpg 800.0 600.0  g3 
9   shopfronts_23092017_3_35.jpg 120.0 210.0  g1 

where 

g1: <= 400x300] 
g2: (400x300, 640x480] 
g3: (640x480, 800x600] 
g4: > 800x600 
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)

Suche nach der Häufigkeitszählung mithilfe der Gruppenspalte. Wenn das nicht der beste Weg ist und wenn es einen besseren Weg gibt, lassen Sie es mich wissen.

Quelle

2017-09-28 bsrcube

@Zero - My schlecht. Du hast recht. Ich habe jetzt die Änderungen in der Frage gemacht. Vielen Dank für die Antwort. – bsrcube

Mit np.where

In [4510]: df['group'] = np.where((df.height <= 400) & (df.width <= 300), 
     ...:   'g1', 
     ...:   np.where((df.height <= 640) & (df.width <= 480), 
     ...:   'g2', 
     ...:   np.where((df.height <= 800) & (df.width <= 600), 
     ...:   'g3', 
     ...:   np.where((df.height > 800) & (df.width > 600), 
     ...:   'g4', 
     ...:   'others')))) 

In [4511]: df 
Out[4511]: 
           filename height width group 
0  shopfronts_23092017_3_285.jpg 750.0 560.0  g3 
1     shopfronts_200.jpg 4395.0 6020.0  g4 
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 
3     shopfronts_101.jpg 480.0 640.0 others 
4     shopfronts_138.jpg 3733.0 8498.0  g4 
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0  g1 
6  shopfronts_25092017_neon_33.jpg 100.0 200.0  g1 
7     shopfronts_322.jpg 682.0 1024.0 others 
8     shopfronts_171.jpg 800.0 600.0  g3 
9   shopfronts_23092017_3_35.jpg 120.0 210.0  g1

Quelle

2017-09-28 15:46:26 Zero

@Bharathshetty - Sie haben Recht. Datensatz 8 sollte zu g3 gehören. Ich habe Änderungen an der Frage vorgenommen, um dasselbe zu reflektieren. – bsrcube

Mein Fehler. Die g4-Definition ist jetzt in der Frage überarbeitet. Entschuldigung für die Mehrdeutigkeit vor. – bsrcube

können Sie Dual verwenden pd.cut d.h

bins = [0,400,640,800,np.inf] 
df['group'] = pd.cut(df['height'].values, bins,labels=["g1","g2","g3",'g4']) 

nbin = [0,300,480,600,np.inf] 
t = pd.cut(df['width'].values, nbin,labels=["g1","g2","g3",'g4']) 

df['group'] =np.where(df['group'] == t,df['group'],'others')

 
           filename height width group 
0  shopfronts_23092017_3_285.jpg 750.0 560.0  g3 
1     shopfronts_200.jpg 4395.0 6020.0  g4 
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 
3     shopfronts_101.jpg 480.0 640.0 others 
4     shopfronts_138.jpg 3733.0 8498.0  g4 
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0  g1 
6  shopfronts_25092017_neon_33.jpg 100.0 200.0  g1 
7     shopfronts_322.jpg 682.0 1024.0 others 
8     shopfronts_171.jpg 800.0 600.0  g3 
9   shopfronts_23092017_3_35.jpg 120.0 210.0  g1

Quelle

2017-09-28 15:47:25 Dark

Python: Binning basierend auf 2 Spalten in Pandas

Antwort

Verwandte Themen