data.table Berechnung mehrere Spalte auf einmal

Vielen Dank im Voraus für das Lesen dieser. Ich habe eine Funktion, die bei data gut funktionierte.Tabelle 1.9.3. Aber heute habe ich mein data.table-Paket aktualisiert und meine Funktion funktioniert nicht.data.table Berechnung mehrere Spalte auf einmal

Hier ist meine Funktion und Arbeitsbeispiel auf data.table 1.9.3:

trait.by <- function(data,traits="",cross.by){ 
    traits = intersect(traits,names(data)) 
    if(length(traits)<1){ 
    #if there is no intersect between names and traits 
    return(  data[,  list(N. = .N), by=cross.by]) 
    }else{ 
    return(data[,c( N. = .N, 
        MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}) , 
        SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) , 
        'NA' = lapply(.SD,function(x){return(sum (is.na(x)))})), 
       by=cross.by, .SDcols = traits]) 
    } 
} 

> trait.by(data.table(iris),traits = c("Sepal.Length", "Sepal.Width"),cross.by="Species") 
#  Species N. MEAN.Sepal.Length MEAN.Sepal.Width SD.Sepal.Length 
#1:  setosa 50    5.0    3.4   0.35 
#2: versicolor 50    5.9    2.8   0.52 
#3: virginica 50    6.6    3.0   0.64 
# SD.Sepal.Width NA.Sepal.Length NA.Sepal.Width 
#1:   0.38    0    0 
#2:   0.31    0    0 
#3:   0.32    0    0

Der Punkt ist MEAN.(traits), SD.(traits) und NA.(traits) für alle Spalten berechnet, die ich in traits Variable geben.

Als ich betreibe diese mit data.table 1.9.4 ich die folgende Fehlermeldung:

> trait.by(data.table(iris),traits = c("Sepal.Length", "Sepal.Width"),cross.by="Species") 
#Error in assign("..FUN", eval(fun, SDenv, SDenv), SDenv) : 
# cannot change value of locked binding for '..FUN'

Jede Idee, wie ich dieses Problem beheben sollte ?!

Quelle

2014-12-15 Mahdi Jadaliha

Melden Sie es auf der [data.table issues webpage] (https://github.com/Rdatatable/data.table/issues) – pak

Update: Diese sind jetzt in 1.9.5 in commit 1680 behoben wurde. Von NEWS:

Fixed a bug in the internal optimisation of j-expression with more than one lapply(.SD, function(..) ..) as illustrated here on SO . Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO.

Nun ist diese wie erwartet funktioniert:

data[, 
    c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}), 
    SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) 
), by=cross.by, .SDcols = traits]

Dies sieht aus wie ein Bug, der mit c( als Folge der Mehrfachverwendung von lapply(.SD, FUN) in einem data.table Aufruf in Kombination manifestiert. Sie können umgehen, indem Sie c( durch .( ersetzen.

traits <- c("Sepal.Length", "Sepal.Width") 
cross.by <- "Species" 
data <- data.table(iris) 

data[, 
    c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}) 
), 
    by=cross.by, .SDcols = traits 
]

Works.

data[, 
    c(
    SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) 
), 
    by=cross.by, .SDcols = traits 
]

Works. Hat

data[, 
    c(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}), 
    SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) 
), 
    by=cross.by, .SDcols = traits 
]

nicht funktionieren

data[, 
    .(
    MEAN = lapply(.SD,function(x){return(round(mean(x,na.rm=T),digits=1))}), 
    SD = lapply(.SD,function(x){return(round(sd (x,na.rm=T),digits=2))}) 
), 
    by=cross.by, .SDcols = traits 
]

Works.

Quelle

2014-12-16 00:53:53 BrodieG

'. (' Ist das gleiche wie 'liste'. Während'. ('Ist kompakter als ein Code, persönlich bevorzuge ich' liste', die einfacher für die Pflege des Codes ist. Persönlicher Geschmack obwohl. – KFB

Das Problem ist ich weit verbreitet Diese Notation in meinen Funktionen In der aktuellen Lösung ist es nicht offensichtlich, welcher Wert der "Mittelwert" von zum Beispiel "Sepal.Length" ist.Wir können jedoch eine weitere Spalte hinzufügen und diese dann durch eine zusätzliche Spalte umwandeln, aber gibt es eine andere Möglichkeit? –

Gefällt mir? Das Ausgabeformat wurde geringfügig geändert. Aber das Ergebnis ist alles da.

trait.by <- function(data,traits="",cross.by){ 
    traits = intersect(traits,names(data)) 
    if(length(traits)<1){ 
    #if there is no intersect between names and traits 
    return(data[, list(N. = .N), by=cross.by]) 
    }else{ 
    # ** Changes: use list instead of c and don't think we need return here. 
    # and add new col_Nam with refernce to comments below 
    return(data[, list(N. = .N, 
         MEAN = lapply(.SD,function(x){round(mean(x,na.rm=T),digits=1)}) , 
         SD = lapply(.SD,function(x){round(sd (x,na.rm=T),digits=2)}) , 
         'NA' = lapply(.SD,function(x){sum (is.na(x))}), 
         col_Nam = names(.SD)), 
       by=cross.by, .SDcols = traits]) 
    } 
} 
trait.by(data.table(iris),traits = c("Sepal.Length", "Sepal.Width"),cross.by="Species") 

# result 
     Species N. MEAN SD NA  col_Nam 
1:  setosa 50 5 0.35 0 Sepal.Length 
2:  setosa 50 3.4 0.38 0 Sepal.Width 
3: versicolor 50 5.9 0.52 0 Sepal.Length 
4: versicolor 50 2.8 0.31 0 Sepal.Width 
5: virginica 50 6.6 0.64 0 Sepal.Length 
6: virginica 50 3 0.32 0 Sepal.Width

Quelle

2014-12-16 00:48:47 KFB

Sie könnten daran denken, genau zu erwähnen, was Sie vom OP-Code geändert haben. – pak

Siehe # ** Änderungen – KFB

rate ich verpasst, sorry – pak

data.table Berechnung mehrere Spalte auf einmal

Antwort

Verwandte Themen