sql - Finding gaps in huge event streams? -


i have 1 million events in postgresql database of format:

id        |   stream_id     |  timestamp ----------+-----------------+----------------- 1         |   7             |  .... 2         |   8             |  .... 

there 50,000 unique streams.

i need find of events time between 2 of events on time period. in other words, need find event pairs there no event in period of time.

for example:

a b c d   e     f              g         h    j k | | | |   |     |              |         |   |  | |                   \____2 mins____/ 

in scenario, want find pair (f, g) since events surrounding gap.

i don't care if query (that) slow, i.e. on 1 million records it's fine if takes hour or so. however, data set keep growing, if it's slow scales sanely.

i have data in mongodb.

what's best way perform query?

you can lag() window function on partition stream_id ordered timestamp. lag() function gives access previous rows in partition; without lag value, previous row. if partition on stream_id ordered time, previous row previous event stream_id.

select stream_id, lag(id) on pair start_id, id end_id,        ("timestamp" - lag("timestamp") on pair) diff my_table diff > interval '2 minutes' window pair (partition stream_id order "timestamp"); 

Comments