Python: Pandas: Prozentsatz pro Spalte

Problem
"Wie mit% (in Prozent) Spalte pd.pivot_table wo sum/total durch df.pivot_table(margins=True) wird"
Python: Pandas: Prozentsatz pro Spalte

Kontext
Wir haben eine Probe Dreh:

import pandas as pd 
import numpy as np 

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC')) 

print(df) 
print("\n\n") 
pivot = df.pivot_table(
    index=["A"], 
    columns=[], 
    values=["B", "C"], 
    aggfunc={ 
     "B": np.sum, 
     "C": np.sum, 
    }, 
    margins=True, 
    margins_name = "Total", 
    fill_value=0 

) 

print(pivot)

So:

 B C 
A    
row1 50 200 
row2 100 300 
Total 150 500

gewünschte Ausgabe

 B C D  E 
A    
row1 50 200 250 38.46% 
row2 100 300 400 61.54% 
Total 150 500 650 100%

In Worten, wir wollen im Wesentlichen Spalte E (pct of row & column total) mit dem Ausgang des pivot_table hinzuzufügen, die ein Prozentsatz der Spaltensumme ist.

Hinweis, um das Beispiel etwas lesbarer zu machen, haben wir Spalte 'D' hinzugefügt, die nicht Teil der tatsächlichen Ausgabe sein sollte.

Darüber hinaus muss das Ausgabeformat so bleiben, wie wir es schließlich zu einem Excel-Blatt für die Geschäftsnutzung ausgeben werden.

bisher versucht
ähnliche Fragen gestellt wurden:

Add percent of total column to Pandas pivot_table
- nur, dass es mit einem groupby aber wie wir brauchen, um die Spalten intakt zu bleiben, wenn wir drucken() so das hat nicht wirklich funktioniert für mich
Pandas percentage of total with groupby
- Haben Sie nicht für mich arbeiten, wie wir das Ausgabeformat müssen intakt bleiben, die ich bisher nicht

heraus
Außerdem hatte ich gehofft, dass Pandas vielleicht mit der neuesten Version einen guten Weg gefunden haben, so dass wir es mitmachen können. Sie fügen ihren Iterationen normalerweise einige nützliche Verbesserungen hinzu. :)

Spezifikationen
Python: 3.5.2
Pandas: 0.18.1
Numpy: 1,11.1

Quelle

2016-10-23 John

Sie so etwas tun könnte.

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC')) 
df = df.set_index('A') 
df['E'] = df.apply(lambda x: x/df.sum().sum()).sum(axis=1) 
df.loc['Total'] = df.sum() 
In[52]: df 
Out[52]: 
      B  C   E 
A        
row1 50.0 200.0 0.384615 
row2 100.0 300.0 0.615385 
Total 150.0 500.0 1.000000

df.apply(lambda x: x/df.sum().sum())

dvided jedes Element durch die df.sum() sum(), die die Summe aller Elemente ist.

.sum(axis=1)

Summe alle Reihen

und

df.loc['Total']

Sie eine neue Zeile füllen können mit dem, was Sie

Quelle

2016-10-23 18:32:28

Inspiriert von Steven G Ansatz gefällt, diese Lösung für mich gearbeitet:

import pandas as pd 
import numpy as np 

df = pd.DataFrame([["row1",50, 200],["row2",100, 300]], columns=list('ABC')) 


#print(df) 
print("\n\n") 
pivot = df.pivot_table(
    index=["A"], 
    columns=[], 
    values=["B", "C"], 
    aggfunc={ 
     "B": np.sum, 
     "C": np.sum, 

    }, 
    margins=True, 
    margins_name = "Total", 
    fill_value=0 

) 
print(pivot) 

a = pd.DataFrame(pivot.ix["Total"]).transpose()["B"].values 

pivot["E"] = pivot["B"].apply(lambda x: round(x/float(a), 2)) 

print(pivot)

OUTPUT

  B  C  E 
A       
row1 50.0 200.0 0.33 
row2 100.0 300.0 0.67 
Total 150.0 500.0 1.00

Quelle

2016-10-23 18:55:55 John

Python: Pandas: Prozentsatz pro Spalte

Antwort

Verwandte Themen