Skip to main content

Time Bucket Gapfill

In addition to the time_bucket function, Timescale also contains a set of hyperfunctions used to provide time buckets with data gaps automatically filled. While those functions are great for filling in datasets with irregular data, they also have a bit of a learning curve. This page documents some of the lessons learned/examples using those functions.

Example Dataset

For these examples, a dummy dataset is being used that contains data at irregular intervals. Additionally, one of the datapoints is actually null. This is just to illustrate important concepts when gapfilling. Here is the example datset used:

Example dataset
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
order by original_time

Results:

original_timeval
2024-06-24T17:08:46.069Z1.00
2024-06-24T18:08:46.069Z1.05
2024-06-24T18:13:46.069Z1.30
2024-06-24T18:28:46.069Znull
2024-06-24T18:43:46.069Z1.25

For simplicity, all examples in this document will use a 15 minute time bucket. Here's that same dataset, except each record also includes the 15 minute time bucket it belongs in:

Example dataset with buckets
  -- With 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
original_time,
time_bucket('15 minutes'::interval, original_time) as bucket_time,
val
from dummy_data
order by original_time;

Results:

original_timebucket_timeval
2024-06-24T17:08:46.069Z2024-06-24T17:00:00.000Z1
2024-06-24T18:08:46.069Z2024-06-24T18:00:00.000Z1.05
2024-06-24T18:13:46.069Z2024-06-24T18:00:00.000Z1.3
2024-06-24T18:28:46.069Z2024-06-24T18:15:00.000Znull
2024-06-24T18:43:46.069Z2024-06-24T18:30:00.000Z1.25

Gapfilling

The time_bucket_gapfill function allows you to fill in missing datapoints as long as your SQL query meets two requirements:

  1. You include the time bucket field in the group by clause for your query
  2. You are constraining the results by a given start and end time for the time field.

At it's simplest, the time_bucket_gapfill function will just add an additional null datapoint for any time buckets that don't actually contain any real data. Example:

Simple gapfill
-- Simple gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
last(val, original_time) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
group by bucket_time
order by bucket_time;

Results:

bucket_timelast_val
2024-06-24T17:00:00.000+Z1
2024-06-24T17:15:00.000+Znull
2024-06-24T17:30:00.000+Znull
2024-06-24T17:45:00.000+Znull
2024-06-24T18:00:00.000+Z1.3
2024-06-24T18:15:00.000+Znull
2024-06-24T18:30:00.000+Z1.25
2024-06-24T18:45:00.000+Znull
2024-06-24T19:00:00.000+Znull

While that can be useful in isolation, time_bucket_gapfill is more useful when coupled with another missing data fill function. See the next two subsections for more details

Using Last Observation Carried Forward Fill

The last observation carried forward (locf) fill function will fill missing datapoints by using the last observed value as the datapoint for times that are missing data. For example, if the value of a field was 4.5 for the previous observed time and the current time is missing data, the value 4.5 will be used for the current time. The example below might make this more concrete:

Gapfill using LOCF
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
locf(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
group by bucket_time
order by bucket_time;

Results:

bucket_timelast_val
2024-06-24T17:00:00.000Z1
2024-06-24T17:15:00.000Z1
2024-06-24T17:30:00.000Z1
2024-06-24T17:45:00.000Z1
2024-06-24T18:00:00.000Z1.3
2024-06-24T18:15:00.000Znull
2024-06-24T18:30:00.000Z1.25
2024-06-24T18:45:00.000Z1.25
2024-06-24T19:00:00.000Z1.25

In the previous example, notice at time 2024-06-24T18:15:00.000Z there is still a null value in the dataset. See the Null Values and Gapfilling section for more details on why this is. To properly fill in all the gaps, you can remove that null value:

Gapfill using LOCF without null values
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
locf(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
and val is not null
group by bucket_time
order by bucket_time;

Results:

bucket_timelast_val
2024-06-24T17:00:00.000Z1
2024-06-24T17:15:00.000Z1
2024-06-24T17:30:00.000Z1
2024-06-24T17:45:00.000Z1
2024-06-24T18:00:00.000Z1.3
2024-06-24T18:15:00.000Z1.3
2024-06-24T18:30:00.000Z1.25
2024-06-24T18:45:00.000Z1.25
2024-06-24T19:00:00.000Z1.25

Using Interpolation to Fill

The interpolate fill function will fill missing datapoints by using linear interpolation to fill in missing data. This means a line will be drawn from the known datapoint to the next known datapoint. The point at the intersection of that line missing datapoint time will be used as the value for that datapoint For example, if the value of a field was 4.5 at time t-1 and 5.0 at time t, the value will be gapfilled as 4.75 at time t. The example below might make this more concrete:

Gapfill using interpolate
-- LOCF gapfill with 15 minute buckets
with dummy_data as (
select '2024-06-24T17:08:46.069Z'::timestamptz as original_time, 1.00 as val
union
select '2024-06-24T18:08:46.069Z', 1.05 as val
union
select '2024-06-24T18:13:46.069Z', 1.30
union
select '2024-06-24T18:28:46.069Z', null
union
select '2024-06-24T18:43:46.069Z', 1.25
)
select
time_bucket_gapfill('15 minutes'::interval, original_time) as bucket_time,
interpolate(last(val, original_time)) as last_val
from dummy_data
where original_time between '2024-06-24T17:00:00.000Z' and '2024-06-24T19:00:00.000Z'
and val is not null
group by bucket_time
order by bucket_time;

Results:

bucket_timelast_val
2024-06-24T17:00:00.000Z1
2024-06-24T17:15:00.000Z1.075
2024-06-24T17:30:00.000Z1.15
2024-06-24T17:45:00.000Z1.225
2024-06-24T18:00:00.000Z1.3
2024-06-24T18:15:00.000Z1.275
2024-06-24T18:30:00.000Z1.25
2024-06-24T18:45:00.000Znull
2024-06-24T19:00:00.000Znull

Notice that the interpolate function can only fill in values up to the last observed datapoint because there is no end point to draw a line to after that datapoint; both the value at 2024-06-24T18:45:00.000Z and 2024-06-24T19:00:00.000Z come back is null. This is a limitation of interpolate to be aware of that locf doesn't have.

Null Values and Gapfilling

Timescale's gapfill will not replace actual null values. This means you might want to just trim out null values from the dataset you are gapfilling. If you don't trim out null values both locf and interpolate will leave the null value in place.

See the examples in the Using Last Observation Carried Forward Fill to get an idea of what happens with/without null values.