2016-08-10 4 views
1

I-Datenrahmenden gleichen Wert für jede Gruppe in R

ID <- c(1,1,2,2,2,3,3) 
x <- c("1st","","1st","1st","","","") 
y <- c("2nd","2nd","","","","2nd","2nd") 
z <- c("","","3rd","3rd","","","3rd") 
df <- data.frame(ID,x,y,z) 
df 
    ID x y z 
1 1 1st 2nd  
2 1  2nd  
3 2 1st  3rd 
4 2 1st  3rd 
5 2    
6 3  2nd  
7 3  2nd 3rd 

haben Ich möchte den gleichen Wert von ID füllen, die Ausgabe

ID x y z x1 y1 z1 
1 1 1st 2nd  1st 2nd  
2 1  2nd  1st 2nd  
3 2 1st  3rd 1st  3rd 
4 2 1st  3rd 1st  3rd 
5 2    1st  3rd 
6 3  2nd   2nd 3rd 
7 3  2nd 3rd  2nd 3rd 

Wenn die ID 1 1. haben, neu Variable x1 werden alle „ersten“ für ID1 haben, und so weiter Daten aktualisieren, wenn ich mehr Variablen, aber ich brauche nur x zu verwenden, y, z

ID <- c(1,1,2,2,2,3,3) 
x <- c("1st","","1st","1st","","","") 
y <- c("2nd","2nd","","","","2nd","2nd") 
z <- c("","","3rd","3rd","","","3rd") 
m <- c(10:16) 
n <- c(20:26) 
df <- data.frame(ID,x,y,z,m,n) 
+1

Mit data.table können Sie tun 'Bibliothek (data.table); setDT (df) [, setdiff (Namen (df), "ID"): = lapply (.SD, Funktion (x) x [x! = ""] [1L]), mit = ID] 'Dies füllt auch die Leerzeichen mit 'NA', was dort wie das" Richtige "aussieht. – Frank

+0

@Frank, wie wir Original x, y, z behalten, weil die Ausgabe x, y, z sind die gleichen x1, y1, z1 – BIN

+0

Anstelle von 'setdiff (Namen (df)," ID ")', schreiben 'paste0 (setdiff (Namen (df), "ID"), ".new") oder ähnlich. Auf der linken Seite von ': =' stehen die neuen Var-Namen. – Frank

Antwort

1

Wir können dplyr

library(dplyr) 
df %>% 
    group_by(ID) %>% 
    mutate_each(funs((.[.!=""][1]))) %>% 
    setNames(., c("ID", paste0(names(df)[-1], 1))) %>% 
    select(-ID) %>% 
    bind_cols(df, .) 
#ID x y z ID x1 y1 z1 
#1 1 1st 2nd  1 1st 2nd <NA> 
#2 1  2nd  1 1st 2nd <NA> 
#3 2 1st  3rd 2 1st <NA> 3rd 
#4 2 1st  3rd 2 1st <NA> 3rd 
#5 2    2 1st <NA> 3rd 
#6 3  2nd  3 <NA> 2nd 3rd 
#7 3  2nd 3rd 3 <NA> 2nd 3rd 
+1

@Stat Funktioniert für mich (r 3.2.5, dplyr 0.4.3). Versuchen Sie vielleicht, den Code in einer neuen R-Sitzung auszuführen. Könnten einige Konflikte zwischen Paketen mit ähnlich benannten Funktionen (?) Sein – Frank

+0

Ich verwende dplyr_0.5.0 – akrun

+1

Alternativ, 'f =. %>%. [.! = ""]%>% einzigartig; df_clean = df%>% group_by (ID)%>% summarise_each (Spaß (f)) 'und irgendwie schließe dich dem an. – Frank

2

Hier ist ein Ansatz tidyr::fill Nutzung verwenden. Wenn Sie wurden NA statt leerer Saiten (eine gute Idee) verwendet wird, würde dieser Ansatz ziemlich einfach sein:

library(dplyr) 
library(tidyr) 

     # add versions of x to z with NA instead of empty strings 
df %>% mutate_at(vars(x:z), funs('1' = na_if(., ''))) %>% 
    # set grouping for following operations 
    group_by(ID) %>% 
    # for added columns, fill values downwards and upwards within each group 
    fill(x_1:z_1) %>% fill(x_1:z_1, .direction = 'up') %>% 
    # reinsert empty strings for NAs 
    mutate_at(vars(x_1:z_1), funs(coalesce(., factor('')))) 

## Source: local data frame [7 x 9] 
## Groups: ID [3] 
## 
##  ID  x  y  z  m  n x_1 y_1 z_1 
## <dbl> <fctr> <fctr> <fctr> <int> <int> <fctr> <fctr> <fctr> 
## 1  1 1st 2nd   10 20 1st 2nd  
## 2  1   2nd   11 21 1st 2nd  
## 3  2 1st   3rd 12 22 1st   3rd 
## 4  2 1st   3rd 13 23 1st   3rd 
## 5  2       14 24 1st   3rd 
## 6  3   2nd   15 25   2nd 3rd 
## 7  3   2nd 3rd 16 26   2nd 3rd 
2

etwas direkteren Ansatz data.table:

df = data.frame(ID, x, y, z, stringsAsFactors=FALSE) 

require(data.table) 
setDT(df)[, c("x1", "y1", "z1") := lapply(.SD, function(x) x[which.max(x != "")]), by = ID] 
Verwandte Themen