Was passiert hier in lubridate :: unique.Interval?

Probe von hier angepasst Daten: https://www.reddit.com/r/rstats/comments/4j2efe/help_counting_unique_days_in_r_with_overlap_and/Was passiert hier in lubridate :: unique.Interval?

df = read.table(text = "Start   End 
      1/8/2015   1/9/2015 
      1/8/2015   1/9/2015 
      1/13/2015  1/15/2015 
      1/7/2015   1/17/2015 
      1/12/2015  1/22/2015 
      1/8/2015   1/16/2015" , header = T)

Intervall erstellen

df %>% transmute(Start = mdy(Start), End = mdy(End), Interval = interval(Start, End)) 

     Start  End      Interval 
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC 
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC 
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC 
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC 
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC 
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC

einzigartiges Intervall finden. Was ist mit diesem Intervall passiert? 2015-01-12 UTC - 2015-01-22 UTC ist weg. Ist das beabsichtigtes Verhalten?

.Last.value %>% select(Interval) %>% unique 

         Interval 
1 2015-01-08 UTC--2015-01-09 UTC 
3 2015-01-13 UTC--2015-01-15 UTC 
4 2015-01-07 UTC--2015-01-17 UTC 
6 2015-01-08 UTC--2015-01-16 UTC

Quelle

2016-05-12 Vlo

2015-01-12 UTC - 2015.01.22 UTC entfernt wird, weil es ein dupliziert Fall für 2015.01.07 UTC ist - 2015.01.17 UTC, auch wenn sie nicht identische Objekte, aber sie sind gleich unter == Operator.

> intervalDf 
     Start  End      Interval 
1 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC 
2 2015-01-08 2015-01-09 2015-01-08 UTC--2015-01-09 UTC 
3 2015-01-13 2015-01-15 2015-01-13 UTC--2015-01-15 UTC 
4 2015-01-07 2015-01-17 2015-01-07 UTC--2015-01-17 UTC 
5 2015-01-12 2015-01-22 2015-01-12 UTC--2015-01-22 UTC 
6 2015-01-08 2015-01-16 2015-01-08 UTC--2015-01-16 UTC 
> intervalDf[4,3] 
[1] 2015-01-07 UTC--2015-01-17 UTC 

> intervalDf[5,3] 
[1] 2015-01-12 UTC--2015-01-22 UTC 
> intervalDf[4,3] == intervalDf[5,3] 
[1] TRUE

jedoch

> identical(intervalDf[4,3], intervalDf[5,3]) 
[1] FALSE

Dies bedeutet auch, dass möglicherweise unique die == als die Vergleichsfunktion verwendet. Wenn Sie sie beibehalten möchten, können Sie die Spalte Interval in Zeichen umwandeln und dann die eindeutige Funktion anwenden.

aktualisieren: Inkonsistenz von unique Funktion auf einzelne und mehrere Spalten Datenrahmen.

> dfTest 
    x      Interval 
1 1 2015-01-08 UTC--2015-01-09 UTC 
2 1 2015-01-08 UTC--2015-01-09 UTC 
3 1 2015-01-13 UTC--2015-01-15 UTC 
4 1 2015-01-07 UTC--2015-01-17 UTC 
5 1 2015-01-12 UTC--2015-01-22 UTC 
6 1 2015-01-08 UTC--2015-01-16 UTC 
> unique(dfTest) 
    x      Interval 
1 1 2015-01-08 UTC--2015-01-09 UTC 
3 1 2015-01-13 UTC--2015-01-15 UTC 
4 1 2015-01-07 UTC--2015-01-17 UTC 
5 1 2015-01-12 UTC--2015-01-22 UTC 
6 1 2015-01-08 UTC--2015-01-16 UTC 
> dfTest1 
         Interval 
1 2015-01-08 UTC--2015-01-09 UTC 
2 2015-01-08 UTC--2015-01-09 UTC 
3 2015-01-13 UTC--2015-01-15 UTC 
4 2015-01-07 UTC--2015-01-17 UTC 
5 2015-01-12 UTC--2015-01-22 UTC 
6 2015-01-08 UTC--2015-01-16 UTC 
> unique(dfTest1) 
         Interval 
1 2015-01-08 UTC--2015-01-09 UTC 
3 2015-01-13 UTC--2015-01-15 UTC 
4 2015-01-07 UTC--2015-01-17 UTC 
6 2015-01-08 UTC--2015-01-16 UTC

Zwei Methoden Definition, die den Unterschied erklärt.

> getAnywhere("unique.data.frame") A single object matching ‘unique.data.frame’ was found It was found in the following places package:base registered S3 method for unique from namespace base namespace:base with value 

function (x, incomparables = FALSE, fromLast = FALSE, ...) { 
    if (!identical(incomparables, FALSE)) 
     .NotYetUsed("incomparables != FALSE") 
    x[!duplicated(x, fromLast = fromLast, ...), , drop = FALSE] } <bytecode: 0x10c2ab0a0> <environment: namespace:base> 
> getAnywhere("duplicated.data.frame") A single object matching ‘duplicated.data.frame’ was found It was found in the following places package:base registered S3 method for duplicated from namespace base namespace:base with value 

function (x, incomparables = FALSE, fromLast = FALSE, ...) { 
    if (!identical(incomparables, FALSE)) 
     .NotYetUsed("incomparables != FALSE") 
    if (length(x) != 1L) 
     duplicated(do.call("paste", c(x, sep = "\r")), fromLast = fromLast) 
    else duplicated(x[[1L]], fromLast = fromLast, ...) } <bytecode: 0x10c33a4b0> <environment: namespace:base>

Quelle

2016-05-13 02:39:56 Psidom

So 'getAnywhere ("unique.Interval")' gibt an, dass die '@ .Data' und' @ Starten' der Intervalle in einem data.frame gespeichert sind. Dann verlässt sich die Funktion auf "unique.data.frame". '@ .Data' scheint die' time_length' des Intervalls zu sein. Weißt du, wie ich die Quelle zu "getAnywhere" ("==. Interval") bekommen kann? Möglicherweise muss ich ein Problem auf GitHub erstellen, um zu sehen, ob dies beabsichtigt ist. – Vlo

Der Grund dafür liegt meiner Meinung nach in der Tatsache, dass die Funktion 'unique' für einzelne und mehrere Spalten unterschiedlich definiert ist. Ich habe mehr Tests in die Antwort eingefügt, hoffentlich wird es helfen, die Antwort zu finden. – Psidom

Ok, ich habe es herausgefunden. 'class (dfTest1%>% select (Interval))' ist 'data.frame'. 'unique.data.frame' ist das Problem. 'unique.Interval' funktioniert wie vorgesehen. Als Testfall. 'unique (einmalig (neu (" Intervall " , .Data = c (864000, 864000) , Start = Struktur (c (1420588800, 1421020800), Klasse = c (" POSIXct "," POSIXt " ), tzone = "UTC") , Tzone = "UTC"))) funktioniert wie vorgesehen. – Vlo

Was passiert hier in lubridate :: unique.Interval?

Antwort

Verwandte Themen