i have 1 million events in postgresql database of format:
id | stream_id | timestamp ----------+-----------------+----------------- 1 | 7 | .... 2 | 8 | ....
there 50,000 unique streams.
i need find of events time between 2 of events on time period. in other words, need find event pairs there no event in period of time.
for example:
a b c d e f g h j k | | | | | | | | | | | \____2 mins____/
in scenario, want find pair (f, g) since events surrounding gap.
i don't care if query (that) slow, i.e. on 1 million records it's fine if takes hour or so. however, data set keep growing, if it's slow scales sanely.
i have data in mongodb.
what's best way perform query?
you can lag()
window function on partition stream_id ordered timestamp. lag()
function gives access previous rows in partition; without lag value, previous row. if partition on stream_id ordered time, previous row previous event stream_id.
select stream_id, lag(id) on pair start_id, id end_id, ("timestamp" - lag("timestamp") on pair) diff my_table diff > interval '2 minutes' window pair (partition stream_id order "timestamp");
Comments
Post a Comment