2016-05-19 11 views
0

Ich habe folgendes data.frame mit etwa 18 Millionen von DatensätzenTeilmenge von einem data.frame basierend auf einem Datum Zeit colum

 Gender Age Bici DepartingSta  DateTimeDepa ArrivingSta  DateTimeArri TravelTime 
    1  M 28 69   85 2010-02-16 12:42:32   85 2010-02-16 12:45:37  3.1 
    2  M 30 11   85 2010-02-16 12:53:29   26 2010-02-16 13:22:23  28.9 
    3  M 37 43   85 2010-02-16 13:21:46   13 2010-02-16 13:49:47  28.0 
    4  M 37 826   22 2010-02-16 14:06:40   85 2010-02-16 14:23:13  16.6 
    5  M 19 662   27 2010-02-16 15:31:15   74 2010-02-16 16:29:17  58.0 
    6  F 25 8   85 2010-02-16 16:31:53   20 2010-02-16 16:49:26  17.6 
17919307  F 26 2760   121 2015-01-30 23:58:33   106 2015-01-31 00:22:08  23.6 
17919308  M 22 4077   71 2015-01-30 23:58:50   190 2015-01-31 00:13:24  14.6 
17919309  M 32 699   154 2015-01-30 23:58:55   165 2015-01-31 00:02:25  3.5 
17919310  F 26 4044   64 2015-01-30 23:59:20   50 2015-01-31 00:05:38  6.3 
17919311  M 26 3114   26 2015-01-30 23:59:23   127 2015-01-31 00:12:29  13.1 
17919312  M 25 4115   165 2015-01-30 23:59:55   73 2015-01-31 00:12:39  12.7 

Ich möchte eine Funktion subset die Fahrten von 2015. Der Januar schreiben Eingang ist "201501" und die Ergebnisse ist

Gender Age Bici DepartingSta  DateTimeDepa ArrivingSta  DateTimeArri TravelTime 
17919307  F 26 2760   121 2015-01-30 23:58:33   106 2015-01-31 00:22:08  23.6 
17919308  M 22 4077   71 2015-01-30 23:58:50   190 2015-01-31 00:13:24  14.6 
17919309  M 32 699   154 2015-01-30 23:58:55   165 2015-01-31 00:02:25  3.5 
17919310  F 26 4044   64 2015-01-30 23:59:20   50 2015-01-31 00:05:38  6.3 
17919311  M 26 3114   26 2015-01-30 23:59:23   127 2015-01-31 00:12:29  13.1 
17919312  M 25 4115   165 2015-01-30 23:59:55   73 2015-01-31 00:12:39  12.7 
+0

Möchten Sie nach 'DateTimeDepa' oder' DateTimeArri' filtern? Ich sage das, weil ein Flug am '2015-01-31 23: 00: 00' abreisen und bei' 2015-02-01 1: 00: 00' –

+0

große Frage erreichen könnte, nur die DateTimeDepa – LuisMoncayo

Antwort

0

gemäß dieser answer vorgeschlagen, können Sie Ihre Daten-Set in ein xts Objekt umwandeln könnte und dann verwenden intelligente subsetting Optionen:

xtsdf <- xts::xts(df, order.by = df$DateTimeDepa) 
xtsdf["201501"] 

Welche gibt:

#     Gender Age Bici DepartingSta DateTimeDepa   #ArrivingSta 
#2015-01-30 23:58:33 "F" "26" "2760" "121"  "2015-01-30 23:58:33" "106"  
#2015-01-30 23:58:50 "M" "22" "4077" " 71"  "2015-01-30 23:58:50" "190"  
#2015-01-30 23:58:55 "M" "32" " 699" "154"  "2015-01-30 23:58:55" "165"  
#2015-01-30 23:59:20 "F" "26" "4044" " 64"  "2015-01-30 23:59:20" " 50"  
#2015-01-30 23:59:23 "M" "26" "3114" " 26"  "2015-01-30 23:59:23" "127"  
#2015-01-30 23:59:55 "M" "25" "4115" "165"  "2015-01-30 23:59:55" " 73"  
#     DateTimeArri   TravelTime 
#2015-01-30 23:58:33 "2015-01-31 00:22:08" "23.6"  
#2015-01-30 23:58:50 "2015-01-31 00:13:24" "14.6"  
#2015-01-30 23:58:55 "2015-01-31 00:02:25" " 3.5"  
#2015-01-30 23:59:20 "2015-01-31 00:05:38" " 6.3"  
#2015-01-30 23:59:23 "2015-01-31 00:12:29" "13.1"  
#2015-01-30 23:59:55 "2015-01-31 00:12:39" "12.7" 
+1

Funktioniert perfekt zu mir Prost. – LuisMoncayo

0

Hier ist, wie Sie diese Basis R mit lösen können format(), vektorisiert String-Vergleich und subset():

df <- data.frame(Gender=c('M','M','M','M','M','F','F','M','M','F','M','M'),Age=c(28L,30L,37L,37L,19L,25L,26L,22L,32L,26L,26L,25L),Bici=c(69L,11L,43L,826L,662L,8L,2760L,4077L,699L,4044L,3114L,4115L),DepartingSta=c(85L,85L,85L,22L,27L,85L,121L,71L,154L,64L,26L,165L),DateTimeDepa=as.POSIXct(c('2010-02-16 12:42:32','2010-02-16 12:53:29','2010-02-16 13:21:46','2010-02-16 14:06:40','2010-02-16 15:31:15','2010-02-16 16:31:53','2015-01-30 23:58:33','2015-01-30 23:58:50','2015-01-30 23:58:55','2015-01-30 23:59:20','2015-01-30 23:59:23','2015-01-30 23:59:55')),ArrivingSta=c(85L,26L,13L,85L,74L,20L,106L,190L,165L,50L,127L,73L),DateTimeArri=as.POSIXct(c('2010-02-16 12:45:37','2010-02-16 13:22:23','2010-02-16 13:49:47','2010-02-16 14:23:13','2010-02-16 16:29:17','2010-02-16 16:49:26','2015-01-31 00:22:08','2015-01-31 00:13:24','2015-01-31 00:02:25','2015-01-31 00:05:38','2015-01-31 00:12:29','2015-01-31 00:12:39')),TravelTime=c(3.1,28.9,28,16.6,58,17.6,23.6,14.6,3.5,6.3,13.1,12.7),row.names=c('1','2','3','4','5','6','17919307','17919308','17919309','17919310','17919311','17919312'),stringsAsFactors=F); 
ym <- '201501'; 

df; 
##   Gender Age Bici DepartingSta  DateTimeDepa ArrivingSta  DateTimeArri TravelTime 
## 1    M 28 69   85 2010-02-16 12:42:32   85 2010-02-16 12:45:37  3.1 
## 2    M 30 11   85 2010-02-16 12:53:29   26 2010-02-16 13:22:23  28.9 
## 3    M 37 43   85 2010-02-16 13:21:46   13 2010-02-16 13:49:47  28.0 
## 4    M 37 826   22 2010-02-16 14:06:40   85 2010-02-16 14:23:13  16.6 
## 5    M 19 662   27 2010-02-16 15:31:15   74 2010-02-16 16:29:17  58.0 
## 6    F 25 8   85 2010-02-16 16:31:53   20 2010-02-16 16:49:26  17.6 
## 17919307  F 26 2760   121 2015-01-30 23:58:33   106 2015-01-31 00:22:08  23.6 
## 17919308  M 22 4077   71 2015-01-30 23:58:50   190 2015-01-31 00:13:24  14.6 
## 17919309  M 32 699   154 2015-01-30 23:58:55   165 2015-01-31 00:02:25  3.5 
## 17919310  F 26 4044   64 2015-01-30 23:59:20   50 2015-01-31 00:05:38  6.3 
## 17919311  M 26 3114   26 2015-01-30 23:59:23   127 2015-01-31 00:12:29  13.1 
## 17919312  M 25 4115   165 2015-01-30 23:59:55   73 2015-01-31 00:12:39  12.7 
ym; 
## [1] "201501" 

subset(df,format(DateTimeDepa,'%Y%m')==ym); 
##   Gender Age Bici DepartingSta  DateTimeDepa ArrivingSta  DateTimeArri TravelTime 
## 17919307  F 26 2760   121 2015-01-30 23:58:33   106 2015-01-31 00:22:08  23.6 
## 17919308  M 22 4077   71 2015-01-30 23:58:50   190 2015-01-31 00:13:24  14.6 
## 17919309  M 32 699   154 2015-01-30 23:58:55   165 2015-01-31 00:02:25  3.5 
## 17919310  F 26 4044   64 2015-01-30 23:59:20   50 2015-01-31 00:05:38  6.3 
## 17919311  M 26 3114   26 2015-01-30 23:59:23   127 2015-01-31 00:12:29  13.1 
## 17919312  M 25 4115   165 2015-01-30 23:59:55   73 2015-01-31 00:12:39  12.7 
Verwandte Themen