Time Bucket Gapfill
In addition to the time_bucket function, Timescale also contains a set of hyperfunctions used to provide time buckets with data gaps automatically filled. While those functions are great for filling in datasets with irregular data, they also have a bit of a learning curve. This page documents some of the lessons learned/examples using those functions.
Example Dataset
For these examples, a dummy dataset is being used that contains data at irregular intervals. Additionally,
one of the datapoints is actually null
. This is just to illustrate important concepts when
gapfilling. Here is the example datset used:
Example dataset
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
order by original_time
Results:
original_time | val |
---|---|
2024-06-24T17:08:46.069Z | 1.00 |
2024-06-24T18:08:46.069Z | 1.05 |
2024-06-24T18:13:46.069Z | 1.30 |
2024-06-24T18:28:46.069Z | null |
2024-06-24T18:43:46.069Z | 1.25 |
For simplicity, all examples in this document will use a 15 minute time bucket. Here's that same dataset, except each record also includes the 15 minute time bucket it belongs in:
Example dataset with buckets
-- With 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
original_time,
time_bucket('15 minutes'::interval, original_time) as bucket_time,
val
from dummy_data
order by original_time;
Results:
original_time | bucket_time | val |
---|---|---|
2024-06-24T17:08:46.069Z | 2024-06-24T17:00:00.000Z | 1 |
2024-06-24T18:08:46.069Z | 2024-06-24T18:00:00.000Z | 1.05 |
2024-06-24T18:13:46.069Z | 2024-06-24T18:00:00.000Z | 1.3 |
2024-06-24T18:28:46.069Z | 2024-06-24T18:15:00.000Z | null |
2024-06-24T18:43:46.069Z | 2024-06-24T18:30:00.000Z | 1.25 |
Gapfilling
The time_bucket_gapfill
function allows you to fill in missing datapoints as long as your SQL query meets two requirements:
- You include the time bucket field in the
group by
clause for your query - You are constraining the results by a given start and end time for the time field.
At it's simplest, the time_bucket_gapfill
function will just add an additional null
datapoint
for any time buckets that don't actually contain any real data. Example:
Simple gapfill
-- Simple gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
last(val, original_time) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
group by bucket_time
order by bucket_time;
Results:
bucket_time | last_val |
---|---|
2024-06-24T17:00:00.000+Z | 1 |
2024-06-24T17:15:00.000+Z | null |
2024-06-24T17:30:00.000+Z | null |
2024-06-24T17:45:00.000+Z | null |
2024-06-24T18:00:00.000+Z | 1.3 |
2024-06-24T18:15:00.000+Z | null |
2024-06-24T18:30:00.000+Z | 1.25 |
2024-06-24T18:45:00.000+Z | null |
2024-06-24T19:00:00.000+Z | null |
While that can be useful in isolation, time_bucket_gapfill
is more useful when coupled
with another missing data fill function. See the next two
subsections for more details
Using Last Observation Carried Forward Fill
The last observation carried forward (locf) fill function will fill missing
datapoints by using the last observed value as the datapoint for times that are missing data. For example,
if the value of a field was 4.5
for the previous observed time and the current time is missing data, the value 4.5
will be used for the current time. The example below might make this more concrete:
Gapfill using LOCF
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
locf(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
group by bucket_time
order by bucket_time;
Results:
bucket_time | last_val |
---|---|
2024-06-24T17:00:00.000Z | 1 |
2024-06-24T17:15:00.000Z | 1 |
2024-06-24T17:30:00.000Z | 1 |
2024-06-24T17:45:00.000Z | 1 |
2024-06-24T18:00:00.000Z | 1.3 |
2024-06-24T18:15:00.000Z | null |
2024-06-24T18:30:00.000Z | 1.25 |
2024-06-24T18:45:00.000Z | 1.25 |
2024-06-24T19:00:00.000Z | 1.25 |
In the previous example, notice at time 2024-06-24T18:15:00.000Z
there is still a null
value
in the dataset. See the Null Values and Gapfilling section for more details on why this is. To properly fill in all the gaps, you can remove that null value:
Gapfill using LOCF without null values
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
locf(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
and val is not null
group by bucket_time
order by bucket_time;
Results:
bucket_time | last_val |
---|---|
2024-06-24T17:00:00.000Z | 1 |
2024-06-24T17:15:00.000Z | 1 |
2024-06-24T17:30:00.000Z | 1 |
2024-06-24T17:45:00.000Z | 1 |
2024-06-24T18:00:00.000Z | 1.3 |
2024-06-24T18:15:00.000Z | 1.3 |
2024-06-24T18:30:00.000Z | 1.25 |
2024-06-24T18:45:00.000Z | 1.25 |
2024-06-24T19:00:00.000Z | 1.25 |
Using Interpolation to Fill
The interpolate fill function will fill missing
datapoints by using linear interpolation to fill in missing data. This means a line will be drawn from the known datapoint to the next known datapoint. The point at the intersection of that line missing datapoint time will be used as the value for that datapoint For example,
if the value of a field was 4.5
at time t-1
and 5.0
at time t
, the value will be gapfilled as 4.75
at time t
. The example below might make this more concrete:
Gapfill using interpolate
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
interpolate(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
and val is not null
group by bucket_time
order by bucket_time;
Results:
bucket_time | last_val |
---|---|
2024-06-24T17:00:00.000Z | 1 |
2024-06-24T17:15:00.000Z | 1.075 |
2024-06-24T17:30:00.000Z | 1.15 |
2024-06-24T17:45:00.000Z | 1.225 |
2024-06-24T18:00:00.000Z | 1.3 |
2024-06-24T18:15:00.000Z | 1.275 |
2024-06-24T18:30:00.000Z | 1.25 |
2024-06-24T18:45:00.000Z | null |
2024-06-24T19:00:00.000Z | null |
Notice that the interpolate function can only fill in values up to the last observed datapoint because there
is no end point to draw a line to after that datapoint; both the value at 2024-06-24T18:45:00.000Z
and 2024-06-24T19:00:00.000Z
come back is null
. This is a limitation of interpolate
to be aware of that locf
doesn't have.
Null Values and Gapfilling
Timescale's gapfill will not replace actual null
values. This means you might want to just trim out null
values
from the dataset you are gapfilling. If you don't trim out
null
values both locf
and interpolate
will leave the null value in place.
See the examples in the Using Last Observation Carried Forward Fill
to get an idea of what happens with/without null
values.