2017-01-05 3 views
1

Ich habe ein Datenframe mit Spalten für den Fahrplan, eine Satelliten-ID und eine Website-ID. Mein Ziel ist es, den Datensatz in einzelne "Tracks" aufzuteilen, wobei jeder "Track" eine einzigartige Kombination aus Satelliten- und Site-IDs ist. Ich kann dies leicht mit der Standard Pandas Groupby Funktionalität tun und by=['site', 'sat'] spezifizieren. Ein weiterer Nachteil ist jedoch, dass bei einer Zeitlücke von mehr als N Minuten innerhalb einer Gruppe die Daten nach der Zeitlücke zu einer neuen "Spur" werden.Pandas groupby ein timedelta größer als N Minuten

Meine Frage ist, was ist der beste Weg, um das Timedelta zwischen aufeinanderfolgenden Zeilen innerhalb meiner (Site, Sat) Gruppe zu berechnen, identifizieren, wenn das Timedelta größer als N Minuten ist, und erstellen Sie eine neue Gruppe/Track?

Ich denke, ich kann das Zeitdelta zwischen Zeilen mit der Methode diff() berechnen. Im Idealfall gibt es eine Möglichkeit, meinem groupby-Aufruf einen dritten Schlüssel hinzuzufügen, der die von mir verwendete Zeitbeschränkung kapselt.

Hier ist ein Beispielcode, um einen Testdatensatz zu generieren und die anfängliche Site, Sat-Gruppierung durchzuführen.

import pandas as pd 
import numpy as np 

# Create first sample set. 

N=10 
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s') 
A_data = np.arange(0, N) 
A_site = ['X'] * N 
A_sat = 12345 

# Create second sample set over the same time span but with a different sat 

N=5 
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s') 
B_data = np.arange(0, N) 
B_site = ['X'] * N 
B_sat = 3456 


# Create a third sample set with a new site over the same time span but the 
# same sat as the second set 

N = 10 
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s') 
C_data = np.arange(0, N) 
C_site = ['Y'] * N 
C_sat = 3456 

# Create a fourth sample set with the same sat and site as the third set but 
# more than 20 minutes after the third set. 

N = 5 
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s') 
D_data = np.arange(0, N) 
D_site = ['Y'] * N 
D_sat = 3456 

# Build a data frame for each sample set 

A = pd.DataFrame(index=A_times, data={'data': A_data, 'site' : A_site, 'sat' : A_sat}) 
B = pd.DataFrame(index=B_times, data={'data': B_data, 'site' : B_site, 'sat' : B_sat}) 
C = pd.DataFrame(index=C_times, data={'data': C_data, 'site' : C_site, 'sat' : C_sat}) 
D = pd.DataFrame(index=D_times, data={'data': D_data, 'site' : D_site, 'sat' : D_sat}) 

# mash them into one larger test data frame 

test = pd.concat([A, B, C, D]) 

print(test) 

my_groups = test.groupby(by = ['site', 'sat']) 

for key, g in my_groups: 
    print(key) 
    print(g) 

Der Ausgang dieser ist

test =

     data sat site 
2016-01-01 00:00:00  0 12345 X 
2016-01-01 00:00:01  1 12345 X 
2016-01-01 00:00:02  2 12345 X 
2016-01-01 00:00:03  3 12345 X 
2016-01-01 00:00:04  4 12345 X 
2016-01-01 00:00:05  5 12345 X 
2016-01-01 00:00:06  6 12345 X 
2016-01-01 00:00:07  7 12345 X 
2016-01-01 00:00:08  8 12345 X 
2016-01-01 00:00:09  9 12345 X 
2016-01-01 00:00:00  0 3456 X 
2016-01-01 00:00:01  1 3456 X 
2016-01-01 00:00:02  2 3456 X 
2016-01-01 00:00:03  3 3456 X 
2016-01-01 00:00:04  4 3456 X 
2016-01-01 00:01:00  0 3456 Y 
2016-01-01 00:01:01  1 3456 Y 
2016-01-01 00:01:02  2 3456 Y 
2016-01-01 00:01:03  3 3456 Y 
2016-01-01 00:01:04  4 3456 Y 
2016-01-01 00:01:05  5 3456 Y 
2016-01-01 00:01:06  6 3456 Y 
2016-01-01 00:01:07  7 3456 Y 
2016-01-01 00:01:08  8 3456 Y 
2016-01-01 00:01:09  9 3456 Y 
2016-01-01 01:00:00  0 3456 Y 
2016-01-01 01:00:01  1 3456 Y 
2016-01-01 01:00:02  2 3456 Y 
2016-01-01 01:00:03  3 3456 Y 
2016-01-01 01:00:04  4 3456 Y 

und die verschiedenen Gruppen sind

('X', 3456) 
        data sat site 
2016-01-01 00:00:00  0 3456 X 
2016-01-01 00:00:01  1 3456 X 
2016-01-01 00:00:02  2 3456 X 
2016-01-01 00:00:03  3 3456 X 
2016-01-01 00:00:04  4 3456 X 
('X', 12345) 
        data sat site 
2016-01-01 00:00:00  0 12345 X 
2016-01-01 00:00:01  1 12345 X 
2016-01-01 00:00:02  2 12345 X 
2016-01-01 00:00:03  3 12345 X 
2016-01-01 00:00:04  4 12345 X 
2016-01-01 00:00:05  5 12345 X 
2016-01-01 00:00:06  6 12345 X 
2016-01-01 00:00:07  7 12345 X 
2016-01-01 00:00:08  8 12345 X 
2016-01-01 00:00:09  9 12345 X 
('Y', 3456) 
        data sat site 
2016-01-01 00:01:00  0 3456 Y 
2016-01-01 00:01:01  1 3456 Y 
2016-01-01 00:01:02  2 3456 Y 
2016-01-01 00:01:03  3 3456 Y 
2016-01-01 00:01:04  4 3456 Y 
2016-01-01 00:01:05  5 3456 Y 
2016-01-01 00:01:06  6 3456 Y 
2016-01-01 00:01:07  7 3456 Y 
2016-01-01 00:01:08  8 3456 Y 
2016-01-01 00:01:09  9 3456 Y 
2016-01-01 01:00:00  0 3456 Y 
2016-01-01 01:00:01  1 3456 Y 
2016-01-01 01:00:02  2 3456 Y 
2016-01-01 01:00:03  3 3456 Y 
2016-01-01 01:00:04  4 3456 Y 

Das gewünschte Verhalten ist, dass die dritte Gruppe, die oben wirklich aufgespalten werden soll, in zwei Gruppen wegen der 20-minütigen Lücke in den Daten, z

('Y', 3456) 
        data sat site 
2016-01-01 00:01:00  0 3456 Y 
2016-01-01 00:01:01  1 3456 Y 
2016-01-01 00:01:02  2 3456 Y 
2016-01-01 00:01:03  3 3456 Y 
2016-01-01 00:01:04  4 3456 Y 
2016-01-01 00:01:05  5 3456 Y 
2016-01-01 00:01:06  6 3456 Y 
2016-01-01 00:01:07  7 3456 Y 
2016-01-01 00:01:08  8 3456 Y 
2016-01-01 00:01:09  9 3456 Y 

New group here 

2016-01-01 01:00:00  0 3456 Y 
2016-01-01 01:00:01  1 3456 Y 
2016-01-01 01:00:02  2 3456 Y 
2016-01-01 01:00:03  3 3456 Y 
2016-01-01 01:00:04  4 3456 Y 

Irgendwelche Vorschläge würden geschätzt. Vielen Dank!

Antwort

1

Nachdem ich mit der Lösung herumgespielt und sie mit den Vorschlägen von this post kombiniert habe, konnte ich das Problem lösen. Ein vollständigerer Beispiel-Testsatz und Lösungen sind unten gezeigt.

# In[222]: 

import pandas as pd 
import numpy as np 


# In[223]: 

# Create 1st sample set. 

N=10 
A_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s') 
A_data = np.arange(0, N) 
A_site = ['X'] * N 
A_sat = 12345 

# Create 2nd sample set over the same time span but with a different sat 

N=5 
B_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='1s') 
B_data = np.arange(0, N) 
B_site = ['X'] * N 
B_sat = 3456 


# Create a 3rd sample set with a new site over the same time span but the 
# same sat as the 2nd set 

N = 10 
C_times = pd.date_range('2016-01-01T00:01:00', periods=N, freq='1s') 
C_data = np.arange(0, N) 
C_site = ['Y'] * N 
C_sat = 3456 

# Create a 4th sample set with the same sat and site as the 3rd set but 
# more than 20 minutes after the third set. 

N = 5 
D_times = pd.date_range('2016-01-01T01:00:00', periods=N, freq='1s') 
D_data = np.arange(0, N) 
D_site = ['Y'] * N 
D_sat = 3456 

# Create a 5th sample set with the same sat and site as the 3rd set but 
# more than 20 minutes after the third set. 

N = 60 
E_times = pd.date_range('2016-01-01T00:00:00', periods=N, freq='60s') 
E_data = np.arange(0, N) 
E_site = ['Z'] * N 
E_sat = 3456 

# Create a 6th sample set with the same sat and site as the 4th set but 
# more than 20 minutes after the fourth set. 

N = 5 
F_times = pd.date_range('2016-01-02T00:00:00', periods=N, freq='60s') 
F_data = np.arange(0, N) 
F_site = ['Y'] * N 
F_sat = 3456 


# In[224]: 

# Build a data frame for each sample set 

A = pd.DataFrame(data={'time': A_times, 'data': A_data, 'site' : A_site, 'sat' : A_sat}) 
B = pd.DataFrame(data={'time': B_times, 'data': B_data, 'site' : B_site, 'sat' : B_sat}) 
C = pd.DataFrame(data={'time': C_times, 'data': C_data, 'site' : C_site, 'sat' : C_sat}) 
D = pd.DataFrame(data={'time': D_times, 'data': D_data, 'site' : D_site, 'sat' : D_sat}) 
E = pd.DataFrame(data={'time': E_times, 'data': E_data, 'site' : E_site, 'sat' : E_sat}) 
F = pd.DataFrame(data={'time': F_times, 'data': F_data, 'site' : F_site, 'sat' : F_sat}) 

# mash them into one larger test data frame 

test = pd.concat([A, B, C, D, E, F]) 


# In[225]: 

print(test) 


# In[226]: 

test.sort_values(['time'], inplace=True) 


# In[227]: 

# This approach doesn't quite work. Note that the group (Y, 3456, 0) really 
# has 2 tracks in it because the overlapping track from (Z, 3456, 0) is screwing up 
# the delta-t calculation and hiding the fact that within the Y, 3456 group there 
# was a large time gap. 

test1 = test.copy() 
test1['delta_t'] = test1['time'].diff() 
test1['track'] = (test1['delta_t'] > pd.Timedelta(minutes=20)).cumsum() 

my_groups = test1.groupby(by = ['site', 'sat', 'track']) 
for key, g in my_groups: 
    print(key) 
    print(g) 

Dies hat Ausgang, der wie folgt aussieht:

('X', 3456, 0) 
    data sat site    time delta_t track 
0  0 3456 X 2016-01-01 00:00:00 0 days  0 
1  1 3456 X 2016-01-01 00:00:01 0 days  0 
2  2 3456 X 2016-01-01 00:00:02 0 days  0 
3  3 3456 X 2016-01-01 00:00:03 0 days  0 
4  4 3456 X 2016-01-01 00:00:04 0 days  0 
('X', 12345, 0) 
    data sat site    time delta_t track 
0  0 12345 X 2016-01-01 00:00:00  NaT  0 
1  1 12345 X 2016-01-01 00:00:01 00:00:01  0 
2  2 12345 X 2016-01-01 00:00:02 00:00:01  0 
3  3 12345 X 2016-01-01 00:00:03 00:00:01  0 
4  4 12345 X 2016-01-01 00:00:04 00:00:01  0 
5  5 12345 X 2016-01-01 00:00:05 00:00:01  0 
6  6 12345 X 2016-01-01 00:00:06 00:00:01  0 
7  7 12345 X 2016-01-01 00:00:07 00:00:01  0 
8  8 12345 X 2016-01-01 00:00:08 00:00:01  0 
9  9 12345 X 2016-01-01 00:00:09 00:00:01  0 
('Y', 3456, 0) 
    data sat site    time delta_t track 
0  0 3456 Y 2016-01-01 00:01:00 00:00:00  0 
1  1 3456 Y 2016-01-01 00:01:01 00:00:01  0 
2  2 3456 Y 2016-01-01 00:01:02 00:00:01  0 
3  3 3456 Y 2016-01-01 00:01:03 00:00:01  0 
4  4 3456 Y 2016-01-01 00:01:04 00:00:01  0 
5  5 3456 Y 2016-01-01 00:01:05 00:00:01  0 
6  6 3456 Y 2016-01-01 00:01:06 00:00:01  0 
7  7 3456 Y 2016-01-01 00:01:07 00:00:01  0 
8  8 3456 Y 2016-01-01 00:01:08 00:00:01  0 
9  9 3456 Y 2016-01-01 00:01:09 00:00:01  0 
0  0 3456 Y 2016-01-01 01:00:00 00:01:00  0 
1  1 3456 Y 2016-01-01 01:00:01 00:00:01  0 
2  2 3456 Y 2016-01-01 01:00:02 00:00:01  0 
3  3 3456 Y 2016-01-01 01:00:03 00:00:01  0 
4  4 3456 Y 2016-01-01 01:00:04 00:00:01  0 
('Y', 3456, 1) 
    data sat site    time delta_t track 
0  0 3456 Y 2016-01-02 00:00:00 22:59:56  1 
1  1 3456 Y 2016-01-02 00:01:00 00:01:00  1 
2  2 3456 Y 2016-01-02 00:02:00 00:01:00  1 
3  3 3456 Y 2016-01-02 00:03:00 00:01:00  1 
4  4 3456 Y 2016-01-02 00:04:00 00:01:00  1 
('Z', 3456, 0) 
    data sat site    time delta_t track 
0  0 3456 Z 2016-01-01 00:00:00 00:00:00  0 
1  1 3456 Z 2016-01-01 00:01:00 00:00:51  0 
2  2 3456 Z 2016-01-01 00:02:00 00:00:51  0 
3  3 3456 Z 2016-01-01 00:03:00 00:01:00  0 
4  4 3456 Z 2016-01-01 00:04:00 00:01:00  0 
5  5 3456 Z 2016-01-01 00:05:00 00:01:00  0 
6  6 3456 Z 2016-01-01 00:06:00 00:01:00  0 
7  7 3456 Z 2016-01-01 00:07:00 00:01:00  0 
8  8 3456 Z 2016-01-01 00:08:00 00:01:00  0 
9  9 3456 Z 2016-01-01 00:09:00 00:01:00  0 
10 10 3456 Z 2016-01-01 00:10:00 00:01:00  0 
11 11 3456 Z 2016-01-01 00:11:00 00:01:00  0 
12 12 3456 Z 2016-01-01 00:12:00 00:01:00  0 
13 13 3456 Z 2016-01-01 00:13:00 00:01:00  0 
14 14 3456 Z 2016-01-01 00:14:00 00:01:00  0 
15 15 3456 Z 2016-01-01 00:15:00 00:01:00  0 
16 16 3456 Z 2016-01-01 00:16:00 00:01:00  0 
17 17 3456 Z 2016-01-01 00:17:00 00:01:00  0 
18 18 3456 Z 2016-01-01 00:18:00 00:01:00  0 
19 19 3456 Z 2016-01-01 00:19:00 00:01:00  0 
20 20 3456 Z 2016-01-01 00:20:00 00:01:00  0 
21 21 3456 Z 2016-01-01 00:21:00 00:01:00  0 
22 22 3456 Z 2016-01-01 00:22:00 00:01:00  0 
23 23 3456 Z 2016-01-01 00:23:00 00:01:00  0 
24 24 3456 Z 2016-01-01 00:24:00 00:01:00  0 
25 25 3456 Z 2016-01-01 00:25:00 00:01:00  0 
26 26 3456 Z 2016-01-01 00:26:00 00:01:00  0 
27 27 3456 Z 2016-01-01 00:27:00 00:01:00  0 
28 28 3456 Z 2016-01-01 00:28:00 00:01:00  0 
29 29 3456 Z 2016-01-01 00:29:00 00:01:00  0 
30 30 3456 Z 2016-01-01 00:30:00 00:01:00  0 
31 31 3456 Z 2016-01-01 00:31:00 00:01:00  0 
32 32 3456 Z 2016-01-01 00:32:00 00:01:00  0 
33 33 3456 Z 2016-01-01 00:33:00 00:01:00  0 
34 34 3456 Z 2016-01-01 00:34:00 00:01:00  0 
35 35 3456 Z 2016-01-01 00:35:00 00:01:00  0 
36 36 3456 Z 2016-01-01 00:36:00 00:01:00  0 
37 37 3456 Z 2016-01-01 00:37:00 00:01:00  0 
38 38 3456 Z 2016-01-01 00:38:00 00:01:00  0 
39 39 3456 Z 2016-01-01 00:39:00 00:01:00  0 
40 40 3456 Z 2016-01-01 00:40:00 00:01:00  0 
41 41 3456 Z 2016-01-01 00:41:00 00:01:00  0 
42 42 3456 Z 2016-01-01 00:42:00 00:01:00  0 
43 43 3456 Z 2016-01-01 00:43:00 00:01:00  0 
44 44 3456 Z 2016-01-01 00:44:00 00:01:00  0 
45 45 3456 Z 2016-01-01 00:45:00 00:01:00  0 
46 46 3456 Z 2016-01-01 00:46:00 00:01:00  0 
47 47 3456 Z 2016-01-01 00:47:00 00:01:00  0 
48 48 3456 Z 2016-01-01 00:48:00 00:01:00  0 
49 49 3456 Z 2016-01-01 00:49:00 00:01:00  0 
50 50 3456 Z 2016-01-01 00:50:00 00:01:00  0 
51 51 3456 Z 2016-01-01 00:51:00 00:01:00  0 
52 52 3456 Z 2016-01-01 00:52:00 00:01:00  0 
53 53 3456 Z 2016-01-01 00:53:00 00:01:00  0 
54 54 3456 Z 2016-01-01 00:54:00 00:01:00  0 
55 55 3456 Z 2016-01-01 00:55:00 00:01:00  0 
56 56 3456 Z 2016-01-01 00:56:00 00:01:00  0 
57 57 3456 Z 2016-01-01 00:57:00 00:01:00  0 
58 58 3456 Z 2016-01-01 00:58:00 00:01:00  0 
59 59 3456 Z 2016-01-01 00:59:00 00:01:00  0 

Beachten Sie, dass die Gruppe für ('Y', 3456, 0) hat wirklich zwei Spuren in ihm. Also keine komplette Lösung. Weiter auf Ich habe versucht, diese

# In[228]: 

# This method works. The difference is that when I calculated the 
# delta_t I did it on the results of the groupby. 

# There's a undesireable effect that the track counter doesn't reset with each new 
# (site, sat) pair. It appears to keep counting up. 

test2 = test.copy() 
test2.sort_values(['site', 'sat', 'time'], inplace=True) 
test2['delta_t'] = test2.groupby(['site', 'sat'])['time'].diff()  
test2['track'] = (test2['delta_t'] > pd.Timedelta(minutes=20)).cumsum() 

my_groups = test2.groupby(by = ['site', 'sat', 'track']) 
for key, g in my_groups: 
    print(key) 
    print(g) 

die Ausgabe

('X', 3456, 0) 
    data sat site    time delta_t track 
0  0 3456 X 2016-01-01 00:00:00  NaT  0 
1  1 3456 X 2016-01-01 00:00:01 00:00:01  0 
2  2 3456 X 2016-01-01 00:00:02 00:00:01  0 
3  3 3456 X 2016-01-01 00:00:03 00:00:01  0 
4  4 3456 X 2016-01-01 00:00:04 00:00:01  0 
('X', 12345, 0) 
    data sat site    time delta_t track 
0  0 12345 X 2016-01-01 00:00:00  NaT  0 
1  1 12345 X 2016-01-01 00:00:01 00:00:01  0 
2  2 12345 X 2016-01-01 00:00:02 00:00:01  0 
3  3 12345 X 2016-01-01 00:00:03 00:00:01  0 
4  4 12345 X 2016-01-01 00:00:04 00:00:01  0 
5  5 12345 X 2016-01-01 00:00:05 00:00:01  0 
6  6 12345 X 2016-01-01 00:00:06 00:00:01  0 
7  7 12345 X 2016-01-01 00:00:07 00:00:01  0 
8  8 12345 X 2016-01-01 00:00:08 00:00:01  0 
9  9 12345 X 2016-01-01 00:00:09 00:00:01  0 
('Y', 3456, 0) 
    data sat site    time delta_t track 
0  0 3456 Y 2016-01-01 00:01:00  NaT  0 
1  1 3456 Y 2016-01-01 00:01:01 00:00:01  0 
2  2 3456 Y 2016-01-01 00:01:02 00:00:01  0 
3  3 3456 Y 2016-01-01 00:01:03 00:00:01  0 
4  4 3456 Y 2016-01-01 00:01:04 00:00:01  0 
5  5 3456 Y 2016-01-01 00:01:05 00:00:01  0 
6  6 3456 Y 2016-01-01 00:01:06 00:00:01  0 
7  7 3456 Y 2016-01-01 00:01:07 00:00:01  0 
8  8 3456 Y 2016-01-01 00:01:08 00:00:01  0 
9  9 3456 Y 2016-01-01 00:01:09 00:00:01  0 
('Y', 3456, 1) 
    data sat site    time delta_t track 
0  0 3456 Y 2016-01-01 01:00:00 00:58:51  1 
1  1 3456 Y 2016-01-01 01:00:01 00:00:01  1 
2  2 3456 Y 2016-01-01 01:00:02 00:00:01  1 
3  3 3456 Y 2016-01-01 01:00:03 00:00:01  1 
4  4 3456 Y 2016-01-01 01:00:04 00:00:01  1 
('Y', 3456, 2) 
    data sat site    time delta_t track 
0  0 3456 Y 2016-01-02 00:00:00 22:59:56  2 
1  1 3456 Y 2016-01-02 00:01:00 00:01:00  2 
2  2 3456 Y 2016-01-02 00:02:00 00:01:00  2 
3  3 3456 Y 2016-01-02 00:03:00 00:01:00  2 
4  4 3456 Y 2016-01-02 00:04:00 00:01:00  2 
('Z', 3456, 2) 
    data sat site    time delta_t track 
0  0 3456 Z 2016-01-01 00:00:00  NaT  2 
1  1 3456 Z 2016-01-01 00:01:00 00:01:00  2 
2  2 3456 Z 2016-01-01 00:02:00 00:01:00  2 
3  3 3456 Z 2016-01-01 00:03:00 00:01:00  2 
4  4 3456 Z 2016-01-01 00:04:00 00:01:00  2 
5  5 3456 Z 2016-01-01 00:05:00 00:01:00  2 
6  6 3456 Z 2016-01-01 00:06:00 00:01:00  2 
7  7 3456 Z 2016-01-01 00:07:00 00:01:00  2 
8  8 3456 Z 2016-01-01 00:08:00 00:01:00  2 
9  9 3456 Z 2016-01-01 00:09:00 00:01:00  2 
10 10 3456 Z 2016-01-01 00:10:00 00:01:00  2 
11 11 3456 Z 2016-01-01 00:11:00 00:01:00  2 
12 12 3456 Z 2016-01-01 00:12:00 00:01:00  2 
13 13 3456 Z 2016-01-01 00:13:00 00:01:00  2 
14 14 3456 Z 2016-01-01 00:14:00 00:01:00  2 
15 15 3456 Z 2016-01-01 00:15:00 00:01:00  2 
16 16 3456 Z 2016-01-01 00:16:00 00:01:00  2 
17 17 3456 Z 2016-01-01 00:17:00 00:01:00  2 
18 18 3456 Z 2016-01-01 00:18:00 00:01:00  2 
19 19 3456 Z 2016-01-01 00:19:00 00:01:00  2 
20 20 3456 Z 2016-01-01 00:20:00 00:01:00  2 
21 21 3456 Z 2016-01-01 00:21:00 00:01:00  2 
22 22 3456 Z 2016-01-01 00:22:00 00:01:00  2 
23 23 3456 Z 2016-01-01 00:23:00 00:01:00  2 
24 24 3456 Z 2016-01-01 00:24:00 00:01:00  2 
25 25 3456 Z 2016-01-01 00:25:00 00:01:00  2 
26 26 3456 Z 2016-01-01 00:26:00 00:01:00  2 
27 27 3456 Z 2016-01-01 00:27:00 00:01:00  2 
28 28 3456 Z 2016-01-01 00:28:00 00:01:00  2 
29 29 3456 Z 2016-01-01 00:29:00 00:01:00  2 
30 30 3456 Z 2016-01-01 00:30:00 00:01:00  2 
31 31 3456 Z 2016-01-01 00:31:00 00:01:00  2 
32 32 3456 Z 2016-01-01 00:32:00 00:01:00  2 
33 33 3456 Z 2016-01-01 00:33:00 00:01:00  2 
34 34 3456 Z 2016-01-01 00:34:00 00:01:00  2 
35 35 3456 Z 2016-01-01 00:35:00 00:01:00  2 
36 36 3456 Z 2016-01-01 00:36:00 00:01:00  2 
37 37 3456 Z 2016-01-01 00:37:00 00:01:00  2 
38 38 3456 Z 2016-01-01 00:38:00 00:01:00  2 
39 39 3456 Z 2016-01-01 00:39:00 00:01:00  2 
40 40 3456 Z 2016-01-01 00:40:00 00:01:00  2 
41 41 3456 Z 2016-01-01 00:41:00 00:01:00  2 
42 42 3456 Z 2016-01-01 00:42:00 00:01:00  2 
43 43 3456 Z 2016-01-01 00:43:00 00:01:00  2 
44 44 3456 Z 2016-01-01 00:44:00 00:01:00  2 
45 45 3456 Z 2016-01-01 00:45:00 00:01:00  2 
46 46 3456 Z 2016-01-01 00:46:00 00:01:00  2 
47 47 3456 Z 2016-01-01 00:47:00 00:01:00  2 
48 48 3456 Z 2016-01-01 00:48:00 00:01:00  2 
49 49 3456 Z 2016-01-01 00:49:00 00:01:00  2 
50 50 3456 Z 2016-01-01 00:50:00 00:01:00  2 
51 51 3456 Z 2016-01-01 00:51:00 00:01:00  2 
52 52 3456 Z 2016-01-01 00:52:00 00:01:00  2 
53 53 3456 Z 2016-01-01 00:53:00 00:01:00  2 
54 54 3456 Z 2016-01-01 00:54:00 00:01:00  2 
55 55 3456 Z 2016-01-01 00:55:00 00:01:00  2 
56 56 3456 Z 2016-01-01 00:56:00 00:01:00  2 
57 57 3456 Z 2016-01-01 00:57:00 00:01:00  2 
58 58 3456 Z 2016-01-01 00:58:00 00:01:00  2 
59 59 3456 Z 2016-01-01 00:59:00 00:01:00  2 

Die Strecke Zusammenbruch richtig hat, aber es wäre schön, wenn der Spurzähler auf 0 zurückgesetzt, mit jedem neuen (Standort, sat) Paar.

# In[229]: 

# This method works. The difference is that when I calculate the 
# "track" counter I'm doing the cumulative sum on the results of 
# the groupby. It also resets the track counter with each new 
# site, sat group. 

test3 = test.copy() 

test3.sort_values(['site', 'sat', 'time'], inplace=True) 
test3['delta_t'] = test3.groupby(['site', 'sat'])['time'].diff()  

# calculate an intermediate flag column. If you try to eliminate this 
# and put the boolean test directly into the 'track' calculation pandas 
# will throw a recursion error. 

test3['new_track'] = test3['delta_t'] > pd.Timedelta(minutes=20) 

# The to_numeric call is used to convert from a float to an integer. 

test3['track'] = pd.to_numeric(test3.groupby(['site', 'sat'])['new_track'].cumsum(), downcast='integer') 

my_groups = test3.groupby(by = ['site', 'sat', 'track']) 
for key, g in my_groups: 
    print(key) 
    print(g) 

, die als die folgende Ausgabe

('X', 3456, 0) 
    data sat site    time delta_t new_track track 
0  0 3456 X 2016-01-01 00:00:00  NaT  False  0 
1  1 3456 X 2016-01-01 00:00:01 00:00:01  False  0 
2  2 3456 X 2016-01-01 00:00:02 00:00:01  False  0 
3  3 3456 X 2016-01-01 00:00:03 00:00:01  False  0 
4  4 3456 X 2016-01-01 00:00:04 00:00:01  False  0 
('X', 12345, 0) 
    data sat site    time delta_t new_track track 
0  0 12345 X 2016-01-01 00:00:00  NaT  False  0 
1  1 12345 X 2016-01-01 00:00:01 00:00:01  False  0 
2  2 12345 X 2016-01-01 00:00:02 00:00:01  False  0 
3  3 12345 X 2016-01-01 00:00:03 00:00:01  False  0 
4  4 12345 X 2016-01-01 00:00:04 00:00:01  False  0 
5  5 12345 X 2016-01-01 00:00:05 00:00:01  False  0 
6  6 12345 X 2016-01-01 00:00:06 00:00:01  False  0 
7  7 12345 X 2016-01-01 00:00:07 00:00:01  False  0 
8  8 12345 X 2016-01-01 00:00:08 00:00:01  False  0 
9  9 12345 X 2016-01-01 00:00:09 00:00:01  False  0 
('Y', 3456, 0) 
    data sat site    time delta_t new_track track 
0  0 3456 Y 2016-01-01 00:01:00  NaT  False  0 
1  1 3456 Y 2016-01-01 00:01:01 00:00:01  False  0 
2  2 3456 Y 2016-01-01 00:01:02 00:00:01  False  0 
3  3 3456 Y 2016-01-01 00:01:03 00:00:01  False  0 
4  4 3456 Y 2016-01-01 00:01:04 00:00:01  False  0 
5  5 3456 Y 2016-01-01 00:01:05 00:00:01  False  0 
6  6 3456 Y 2016-01-01 00:01:06 00:00:01  False  0 
7  7 3456 Y 2016-01-01 00:01:07 00:00:01  False  0 
8  8 3456 Y 2016-01-01 00:01:08 00:00:01  False  0 
9  9 3456 Y 2016-01-01 00:01:09 00:00:01  False  0 
('Y', 3456, 1) 
    data sat site    time delta_t new_track track 
0  0 3456 Y 2016-01-01 01:00:00 00:58:51  True  1 
1  1 3456 Y 2016-01-01 01:00:01 00:00:01  False  1 
2  2 3456 Y 2016-01-01 01:00:02 00:00:01  False  1 
3  3 3456 Y 2016-01-01 01:00:03 00:00:01  False  1 
4  4 3456 Y 2016-01-01 01:00:04 00:00:01  False  1 
('Y', 3456, 2) 
    data sat site    time delta_t new_track track 
0  0 3456 Y 2016-01-02 00:00:00 22:59:56  True  2 
1  1 3456 Y 2016-01-02 00:01:00 00:01:00  False  2 
2  2 3456 Y 2016-01-02 00:02:00 00:01:00  False  2 
3  3 3456 Y 2016-01-02 00:03:00 00:01:00  False  2 
4  4 3456 Y 2016-01-02 00:04:00 00:01:00  False  2 
('Z', 3456, 0) 
    data sat site    time delta_t new_track track 
0  0 3456 Z 2016-01-01 00:00:00  NaT  False  0 
1  1 3456 Z 2016-01-01 00:01:00 00:01:00  False  0 
2  2 3456 Z 2016-01-01 00:02:00 00:01:00  False  0 
3  3 3456 Z 2016-01-01 00:03:00 00:01:00  False  0 
4  4 3456 Z 2016-01-01 00:04:00 00:01:00  False  0 
5  5 3456 Z 2016-01-01 00:05:00 00:01:00  False  0 
6  6 3456 Z 2016-01-01 00:06:00 00:01:00  False  0 
7  7 3456 Z 2016-01-01 00:07:00 00:01:00  False  0 
8  8 3456 Z 2016-01-01 00:08:00 00:01:00  False  0 
9  9 3456 Z 2016-01-01 00:09:00 00:01:00  False  0 
10 10 3456 Z 2016-01-01 00:10:00 00:01:00  False  0 
11 11 3456 Z 2016-01-01 00:11:00 00:01:00  False  0 
12 12 3456 Z 2016-01-01 00:12:00 00:01:00  False  0 
13 13 3456 Z 2016-01-01 00:13:00 00:01:00  False  0 
14 14 3456 Z 2016-01-01 00:14:00 00:01:00  False  0 
15 15 3456 Z 2016-01-01 00:15:00 00:01:00  False  0 
16 16 3456 Z 2016-01-01 00:16:00 00:01:00  False  0 
17 17 3456 Z 2016-01-01 00:17:00 00:01:00  False  0 
18 18 3456 Z 2016-01-01 00:18:00 00:01:00  False  0 
19 19 3456 Z 2016-01-01 00:19:00 00:01:00  False  0 
20 20 3456 Z 2016-01-01 00:20:00 00:01:00  False  0 
21 21 3456 Z 2016-01-01 00:21:00 00:01:00  False  0 
22 22 3456 Z 2016-01-01 00:22:00 00:01:00  False  0 
23 23 3456 Z 2016-01-01 00:23:00 00:01:00  False  0 
24 24 3456 Z 2016-01-01 00:24:00 00:01:00  False  0 
25 25 3456 Z 2016-01-01 00:25:00 00:01:00  False  0 
26 26 3456 Z 2016-01-01 00:26:00 00:01:00  False  0 
27 27 3456 Z 2016-01-01 00:27:00 00:01:00  False  0 
28 28 3456 Z 2016-01-01 00:28:00 00:01:00  False  0 
29 29 3456 Z 2016-01-01 00:29:00 00:01:00  False  0 
30 30 3456 Z 2016-01-01 00:30:00 00:01:00  False  0 
31 31 3456 Z 2016-01-01 00:31:00 00:01:00  False  0 
32 32 3456 Z 2016-01-01 00:32:00 00:01:00  False  0 
33 33 3456 Z 2016-01-01 00:33:00 00:01:00  False  0 
34 34 3456 Z 2016-01-01 00:34:00 00:01:00  False  0 
35 35 3456 Z 2016-01-01 00:35:00 00:01:00  False  0 
36 36 3456 Z 2016-01-01 00:36:00 00:01:00  False  0 
37 37 3456 Z 2016-01-01 00:37:00 00:01:00  False  0 
38 38 3456 Z 2016-01-01 00:38:00 00:01:00  False  0 
39 39 3456 Z 2016-01-01 00:39:00 00:01:00  False  0 
40 40 3456 Z 2016-01-01 00:40:00 00:01:00  False  0 
41 41 3456 Z 2016-01-01 00:41:00 00:01:00  False  0 
42 42 3456 Z 2016-01-01 00:42:00 00:01:00  False  0 
43 43 3456 Z 2016-01-01 00:43:00 00:01:00  False  0 
44 44 3456 Z 2016-01-01 00:44:00 00:01:00  False  0 
45 45 3456 Z 2016-01-01 00:45:00 00:01:00  False  0 
46 46 3456 Z 2016-01-01 00:46:00 00:01:00  False  0 
47 47 3456 Z 2016-01-01 00:47:00 00:01:00  False  0 
48 48 3456 Z 2016-01-01 00:48:00 00:01:00  False  0 
49 49 3456 Z 2016-01-01 00:49:00 00:01:00  False  0 
50 50 3456 Z 2016-01-01 00:50:00 00:01:00  False  0 
51 51 3456 Z 2016-01-01 00:51:00 00:01:00  False  0 
52 52 3456 Z 2016-01-01 00:52:00 00:01:00  False  0 
53 53 3456 Z 2016-01-01 00:53:00 00:01:00  False  0 
54 54 3456 Z 2016-01-01 00:54:00 00:01:00  False  0 
55 55 3456 Z 2016-01-01 00:55:00 00:01:00  False  0 
56 56 3456 Z 2016-01-01 00:56:00 00:01:00  False  0 
57 57 3456 Z 2016-01-01 00:57:00 00:01:00  False  0 
58 58 3456 Z 2016-01-01 00:58:00 00:01:00  False  0 
59 59 3456 Z 2016-01-01 00:59:00 00:01:00  False  0 

Also ich denke, ich an dieser Stelle eine gute Lösung. Es ist seltsam, dass Pandas einen Rekursionsfehler auslöst, wenn Sie versuchen, den new_track Test mit der track Berechnung zu kombinieren.