2017-02-07 4 views
3

dies ist ein Beispiel für meine Daten-Sets:R: Schmelzen und Zusammenführen von Daten

ID = c(1, 2, 3, 4) 
Allegation = c("A::B::C::V", "A::C", "A::D", "D::E::D") 
Disposition = c("Open::Closed::Open", "Closed::Closed", "Open::Open", "Closed::Open") 
df <- data.frame(ID,Allegation, Disposition) 

    ID Allegation  Disposition 
    1 A::B::C::V Open::Closed::Open 
    2  A::C  Closed::Closed 
    3  A::D   Open::Open 
    4 D::E::D  Closed::Open 

ich folgende Ergebnisse will:

ID Allegation Disposition Allegation_detail Dispostion_detail 
1 A::B::C::V Open::Closed::Open A  Open 
1 A::B::C::V Open::Closed::Open B  Closed 
1 A::B::C::V Open::Closed::Open C  Open 
1 A::B::C::V Open::Closed::Open V  NA 
2  A::C  Closed::Closed  A  Closed 

habe ich versucht, die Daten zu schmelzen und später zusammengefügt, aber ich bin nicht die gewünschte Ausgabe

zu erhalten das ist mein Ansatz so weit:

#Create column to see num of allegations 
df$num_allegations <- (str_count(as.character(df$Allegation), "::") +1) 

#Looking max allegations 
max(df$num_allegations) 

#Expanding allegations 
df$Allegation1 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 1) 
df$Allegation2 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 2) 
df$Allegation3 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 3) 
df$Allegation4 <- sapply(strsplit(as.character(df$Allegation), "::", fixed= TRUE), `[`, 4) 

#Expanding Disposition 
df$Disposition1 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 1) 
df$Disposition2 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 2) 
df$Disposition3 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 3) 
df$Disposition4 <- sapply(strsplit(as.character(df$Disposition), "::", fixed= TRUE), `[`, 4) 

#melting data 
dfmelt1 <- melt(df[,c(1:8)], id=c("ID", "Allegation", "Disposition", "num_allegations")) 
dfmelt2 <- melt(df[,c(1,2,3,4,9,10,11,12)], id=c("ID", "Allegation", "Disposition", "num_allegations")) 
colnames(dfmelt2) <- c("ID" ,"Allegation" ,"Disposition","num_allegations", "variable2", 
        "value2") 

Aber wenn ich die Daten am Zusammenführen ich dieses Ergebnis bin zu erhalten, das nicht das, was ich will:

merge(dfmelt1, dfmelt2, by = c("ID", "Allegation", "Disposition", "num_allegations")) 

ID Allegation  Disposition num_allegations variable value  variable2 value2 
1 A::B::C::V Open::Closed::Open    4 Allegation1  A Disposition1 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation1  A Disposition2 Closed 
1 A::B::C::V Open::Closed::Open    4 Allegation1  A Disposition3 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation1  A Disposition4 <NA> 
1 A::B::C::V Open::Closed::Open    4 Allegation2  B Disposition1 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation2  B Disposition2 Closed 
1 A::B::C::V Open::Closed::Open    4 Allegation2  B Disposition3 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation2  B Disposition4 <NA> 
1 A::B::C::V Open::Closed::Open    4 Allegation3  C Disposition1 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation3  C Disposition2 Closed 
1 A::B::C::V Open::Closed::Open    4 Allegation3  C Disposition3 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation3  C Disposition4 <NA> 
1 A::B::C::V Open::Closed::Open    4 Allegation4  V Disposition1 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation4  V Disposition2 Closed 
1 A::B::C::V Open::Closed::Open    4 Allegation4  V Disposition3 Open 
1 A::B::C::V Open::Closed::Open    4 Allegation4  V Disposition4 <NA> 
2  A::C  Closed::Closed    2 Allegation1  A Disposition1 Closed 

Wie kann ich fusionieren, so dass ich erhalte Disposition 1, wo nur er sagt Allegation 1?

Dank

Antwort

0

Hier ist eine Idee,

#get a vector with repeats for expanding the data.frame 
ind <- stringr::str_count(df$Allegation, '\\w+') 
new_df <- df[rep(row.names(df), ind),] 
#create vector with allegation details 
v1 <- do.call(rbind, sapply(strsplit(as.character(df$Allegation), '::'), function(i) 
                    t(as.data.frame(t(i))))) 
#create vector with Disposition details 
v2 <- do.call(rbind, sapply(strsplit(as.character(df$Disposition), '::'), function(i) 
                    t(as.data.frame(t(i))))) 
v2 <- v2[match(make.unique(rownames(v1)), make.unique(rownames(v2)))] 

#construct final data frame 
final_df <- data.frame(new_df, Allegation_detail=v1, Disposition_detail=v2, 
               stringsAsFactors = FALSE, row.names = NULL) 

final_df 
# ID Allegation  Disposition Allegation_detail Disposition_detail 
#1 1 A::B::C::V Open::Closed::Open     A    Open 
#2 1 A::B::C::V Open::Closed::Open     B    Closed 
#3 1 A::B::C::V Open::Closed::Open     C    Open 
#4 1 A::B::C::V Open::Closed::Open     V    <NA> 
#5 2  A::C  Closed::Closed     A    Closed 
#6 2  A::C  Closed::Closed     C    Closed 
#7 3  A::D   Open::Open     A    Open 
#8 3  A::D   Open::Open     D    Open 
#9 4 D::E::D  Closed::Open     D    Closed 
#10 4 D::E::D  Closed::Open     E    Open 
#11 4 D::E::D  Closed::Open     D    <NA> 
+1

obwohl diese Kommentare zu vermeiden, sagt schreiben wie danke. Ich werde sagen, ein großes Dankeschön Sotos, –

+0

Sie sind herzlich willkommen :) – Sotos

0

hier eine Lösung data.table verwendet, ist aber logisch seine ähnlich wie Ihr Algorithmus

library(data.table) 
library(stringi) 
setDT(df) 
splitter <- function(x) as.vector(stri_list2matrix(stri_split_fixed(x, "::"))) 

#find the max parts for padding NA at the end 
#http://stackoverflow.com/questions/17804389/pad-each-element-in-a-list-to-specific-length-in-r 
df[, Len:=max(lengths(lapply(.SD, splitter))), by="ID"] 

#split using :: 
parsedDF <- df[, lapply(.SD, function(x) { 
     ans <- splitter(x) 
     length(ans) <- Len 
     ans 
    }), by="ID"][, 
     Len:=NULL] 
setnames(parsedDF, names(parsedDF), paste0(names(parsedDF),"_detail")) 

#join back with original data.table 
df[parsedDF, on=c("ID"="ID_detail")][, 
    Len:=NULL] 

## ID Allegation  Disposition Allegation_detail Disposition_detail 
## 1: 1 A::B::C::V Open::Closed::Open     A    Open 
## 2: 1 A::B::C::V Open::Closed::Open     B    Closed 
## 3: 1 A::B::C::V Open::Closed::Open     C    Open 
## 4: 1 A::B::C::V Open::Closed::Open     V     NA 
## 5: 2  A::C  Closed::Closed     A    Closed 
## 6: 2  A::C  Closed::Closed     C    Closed 
## 7: 3  A::D   Open::Open     A    Open 
## 8: 3  A::D   Open::Open     D    Open 
## 9: 4 D::E::D  Closed::Open     D    Closed 
## 10: 4 D::E::D  Closed::Open     E    Open 
## 11: 4 D::E::D  Closed::Open     D     NA 
Verwandte Themen