Der schwierige Teil wie üblich sammelt die Daten zuerst, aber ich habe es zufällig von der US Census archiviert. So führen Sie die folgenden Zeilen Code nach läuft die "State/Region Daten" unter:
df <- data.frame(emails=c("[email protected]","[email protected]","[email protected]",
"[email protected]","[email protected]"),
states=c("NV","CA","UT","AZ","IA"))
df$regions <- sapply(df$states,
function(x) names(region.list)[grep(x,region.list)])
#Then write to desktop, for example, with:
write.csv(df,"~/Desktop/nameHere.csv",row.names=FALSE)
Ausgang:
emails states regions
1 [email protected] NV West
2 [email protected] CA West
3 [email protected] UT West
4 [email protected] AZ West
5 [email protected] IA Midwest
Bundesland/Region Daten:
NE.name <- c("Connecticut","Maine","Massachusetts","New Hampshire",
"Rhode Island","Vermont","New Jersey","New York",
"Pennsylvania")
NE.abrv <- c("CT","ME","MA","NH","RI","VT","NJ","NY","PA")
NE.ref <- c(NE.name,NE.abrv)
MW.name <- c("Indiana","Illinois","Michigan","Ohio","Wisconsin",
"Iowa","Kansas","Minnesota","Missouri","Nebraska",
"North Dakota","South Dakota")
MW.abrv <- c("IN","IL","MI","OH","WI","IA","KS","MN","MO","NE",
"ND","SD")
MW.ref <- c(MW.name,MW.abrv)
S.name <- c("Delaware","District of Columbia","Florida","Georgia",
"Maryland","North Carolina","South Carolina","Virginia",
"West Virginia","Alabama","Kentucky","Mississippi",
"Tennessee","Arkansas","Louisiana","Oklahoma","Texas")
S.abrv <- c("DE","DC","FL","GA","MD","NC","SC","VA","WV","AL",
"KY","MS","TN","AR","LA","OK","TX")
S.ref <- c(S.name,S.abrv)
W.name <- c("Arizona","Colorado","Idaho","New Mexico","Montana",
"Utah","Nevada","Wyoming","Alaska","California",
"Hawaii","Oregon","Washington")
W.abrv <- c("AZ","CO","ID","NM","MT","UT","NV","WY","AK","CA",
"HI","OR","WA")
W.ref <- c(W.name,W.abrv)
region.list <- list(
Northeast=NE.ref,
Midwest=MW.ref,
South=S.ref,
West=W.ref)
Sie vielleicht brauche 'split (df1 $ states, df1 $ regions)' oder wenn du eine separate Spalte brauchst, dann mit 'dcast' zB' library (data.table); dcast (setDT (df1), rowid (Regionen) ~ Regionen, value.var = "states") ' – akrun
@ akrun..Thanku für einen Start..Aber ich habe eine kurze Frage .. Wie ich diese Staaten gruppieren werde Regionen? AS diese Regionspalte ist die Ausgabe, die ich will – sim
Ich denke, die beste Option wäre, eine 'liste' mit' split' zu haben, wie oben in meinen Kommentaren erwähnt – akrun