BigQuery Reservation Scaling Modes: What They Are and How to Choose the Right One

Csaba Kassai

2026-03-11

13 min read

Hero image for 'BigQuery Reservation Scaling Modes: What They Are and How to Choose the Right One' article

BigQuery reservation scaling modes give you explicit control over how a reservation uses capacity beyond its baseline (whether it borrows idle slots, autoscales, or both) and cap total consumption with a hard limit. This post explains what the three available BigQuery scaling modes do, when each is the right choice, and how to query your current configuration in INFORMATION_SCHEMA.

Why does it matter whether a reservation borrows idle slots or autoscales?

Here’s an example: your production pipeline reservation has a 500-slot baseline and autoscaling enabled up to 2,000 slots. Last month’s BigQuery invoice was higher than expected. You dig in and find the reservation autoscaled heavily — but you know the batch processing reservation runs overnight and regularly leaves hundreds of idle slots sitting unused. Why didn’t production borrow those? Or did it borrow and still needed autoscale on top? You can’t easily tell, and you definitely didn’t configure which behavior to prefer.

This is the core problem scaling modes solve. Until recently, you had one setting: ignore_idle_slots. Off means the reservation can borrow idle capacity from others; on means it never borrows. Whether it borrows first or autoscales first, or whether it does both simultaneously, wasn’t configurable. Borrowed idle slots are effectively free — you’ve already paid for that baseline capacity elsewhere. Autoscale slots are billed at the pay-as-you-go rate on top of your reservation. Whether the reservation borrows first or autoscales first, and in what proportion, has a direct impact on costs.

Google has recently added scaling modes as part of the reservation predictability feature — a set of controls that let you explicitly define how a reservation uses capacity beyond its baseline, combined with a hard cap on total consumption.

What are BigQuery reservation scaling modes and how do they work?

Traditional reservations have two capacity parameters:

slot_capacity: baseline slots, always allocated, always billed
autoscale.max_slots: additional slots autoscaling can provision on top of the baseline

The total possible consumption is slot_capacity + autoscale.max_slots + borrowed idle slots. The ignore_idle_slots flag controls whether the reservation borrows idle capacity from others. Both autoscaling and idle borrowing could happen simultaneously with no explicit priority between them.

Scaling modes change this in two ways.

First, max_slots replaces autoscale.max_slots as the scaling ceiling. The difference is important: autoscale.max_slots was a cap on additional autoscale slots above the baseline (borrowed idle slots could still be used on top of that). max_slots is a hard cap on total consumption — baseline plus any scaling. With a 500-slot baseline and max_slots of 2,000, the reservation can reach 2,000 total, not 2,500.

Second, scaling_mode explicitly controls where the headroom above baseline comes from:

ALL_SLOTS: can scale using idle slot borrowing, autoscaling, or both
IDLE_SLOTS_ONLY: can only scale by borrowing idle slots — no autoscale billing
AUTOSCALE_ONLY: can only scale through autoscaling — no idle slot borrowing

Scaling modes are part of BigQuery’s predictable reservations feature, available on Enterprise and Enterprise Plus editions. To use them, reservation-based fairness must first be enabled on the admin project (this is becoming the default setting on 1 Apr 2026 for new projects). The exception is AUTOSCALE_ONLY, which also works with Standard edition — since Standard doesn’t support baseline slots or idle slot sharing, autoscale-only is the only scaling behavior that applies.

New admin projects you create will follow reservation-based fairness for idle slot distribution by default. Existing projects remain on project-based fairness unless you explicitly enable reservation-based fairness. Starting April 1, 2026, Google Cloud is transitioning the default idle slot distribution logic from project-based to reservation-based fairness across all projects — so this is where the ecosystem is heading regardless. For related concepts, see also reservation groups.

Learn more:
BigQuery Idle Slot Sharing: What the April 2026 Default Change Means for Your Reservations

What is reservation-based fairness, and why is it required?

Compared to project-based fairness (the default logic for distributing idle slots for projects created before 1 Apr 2026), reservation-based fairness (enable_reservation_based_fairness) changes how idle slots are distributed when multiple reservations compete for them.

Without it, BigQuery distributes idle slots evenly across projects, regardless of which reservation they belong to. If your production reservation has 2 assigned projects and your batch reservation has 10, each project gets an equal share of the idle pool. That means the batch reservation’s projects collectively receive 10/12 of available idle capacity, even if production needs it more. The distribution is fair at the project level, but can be heavily skewed at the reservation level.

With reservation-based fairness enabled, BigQuery distributes idle slots evenly across reservations first, then fairly within each reservation’s projects. In the same example, production and batch would each receive half the idle pool, regardless of how many projects are inside each. This reservation-level control is what makes scaling modes meaningful: when you set a reservation to IDLE_SLOTS_ONLY, you need predictable access to idle capacity, not access that shifts based on how many projects happen to be assigned elsewhere.

For projects created before 1 Apr 2026, you enable reservation-based fairness at the admin project level using DDL:

ALTER PROJECT `your-admin-project`
SET OPTIONS (
  `region-europe-west3.enable_reservation_based_fairness` = true
);

Once enabled, you can create predictable reservations with max_slots and scaling_mode. Without reservation-based fairness, these parameters are rejected.

What is the difference between `IDLE_SLOTS_ONLY`, `AUTOSCALE_ONLY`, and `ALL_SLOTS` in BigQuery?

`IDLE_SLOTS_ONLY`

The reservation scales beyond baseline only by borrowing idle slots from other reservations. Autoscaling is disabled entirely. If no idle capacity is available, the reservation is hard-capped at its baseline.

This is the most cost-predictable mode: you pay exactly your baseline rate, never incur autoscale charges. But it’s also the least reliable for latency-sensitive workloads. If the batch reservation that was lending idle capacity finishes its overnight run and stops being idle, a production job that was borrowing those slots gets throttled back to baseline.

Best suited for: flexible batch pipelines, periodic exports, workloads that tolerate variable throughput and can queue or slow down gracefully.

Must be paired with ignore_idle_slots = false.

`AUTOSCALE_ONLY`

The reservation scales beyond baseline only through autoscaling. Idle slot borrowing is disabled. The reservation pays for autoscale slots billed at the PAYG rate up to max_slots, regardless of whether other reservations have idle capacity sitting unused next to it.

This mode trades cost efficiency for consistency. You know that burst capacity comes from autoscale, at a known rate, not from idle capacity that may or may not be available. For production workloads with latency SLAs, scaling behavior is now independent of the state of other reservations. The tradeoff is that even when idle capacity would have been free for the taking, the reservation doesn’t use it.

Must be paired with ignore_idle_slots = true.

`ALL_SLOTS`

The reservation can scale using both idle slot borrowing and autoscaling. In practice, BigQuery uses idle slots first, then autoscales if additional capacity is needed beyond what’s available to borrow.

This is the most flexible mode and the closest to the behavior of traditional reservations with ignore_idle_slots = false. The key addition is max_slots: total consumption is now bounded by a hard cap rather than the unbounded behavior of traditional autoscaling. On a busy day with plenty of idle capacity around, the reservation operates cost-efficiently. When idle capacity isn’t available, it autoscales, up to the defined cap.

Must be paired with ignore_idle_slots = false.

Combining BigQuery’s scaling modes with zero-baseline configurations

All three scaling modes also work with a baseline of zero. Two combinations are particularly interesting:

Zero baseline + ALL_SLOTS: the reservation has no guaranteed capacity. It first consumes available idle slots, then autoscales for whatever additional capacity is needed — up to max_slots. This is useful for low-priority workloads that should opportunistically use free capacity and only pay for autoscale when idle slots run out.
Zero baseline + IDLE_SLOTS_ONLY: the reservation has no guaranteed or billed capacity at all. It runs entirely on idle slots borrowed from other reservations. If no idle capacity is available, the reservation gets nothing. This is the cheapest possible setup (zero cost when idle slots exist) but also the least reliable.

One important caveat: reservation predictability is best-effort. Google’s documentation states that overall usage may still briefly exceed the configured max_slots value, though autoscale slots (autoscale.current_slots) will respect the cap.

Idle slot sharing in BigQuery has always been governed by ignore_idle_slots — a flag that controls whether a reservation borrows idle capacity. It doesn’t affect whether a reservation lends its idle capacity to others — a reservation with ignore_idle_slots = true still makes its unused baseline available to borrowers.

Scaling modes build directly on this:

IDLE_SLOTS_ONLY and ALL_SLOTS both require ignore_idle_slots = false — the reservation is configured to borrow
AUTOSCALE_ONLY requires ignore_idle_slots = true — the reservation never borrows, regardless of available capacity

A reservation’s lending behavior is unaffected by its own scaling mode. If your batch reservation has idle capacity, other reservations configured to borrow (IDLE_SLOTS_ONLY or ALL_SLOTS) can still consume it — regardless of what scaling mode the batch reservation itself uses.

The practical implication for multi-reservation setups: if you have a batch reservation that regularly runs off-hours and leaves idle capacity, configuring production workloads with ALL_SLOTS lets them opportunistically consume that free capacity before triggering autoscale charges. If your production reservation needs guaranteed, consistent burst capacity that doesn’t depend on what other reservations are doing, AUTOSCALE_ONLY is the right choice.

How do you check your current scaling modes in `INFORMATION_SCHEMA`?

The INFORMATION_SCHEMA.RESERVATIONS view exposes reservation configuration, including the scaling_mode and max_slots columns. Query it from your admin project to see how each reservation is configured:

SELECT
  reservation_name,
  slot_capacity          AS baseline_slots,
  ignore_idle_slots,
  scaling_mode,
  max_slots,
  autoscale.max_slots    AS autoscale_max_slots
FROM `your-admin-project.region-your-region`.INFORMATION_SCHEMA.RESERVATIONS
ORDER BY reservation_name;

For predictable reservations (those with scaling_mode set), the top-level max_slots column is the total consumption cap (baseline plus scaling). The autoscale.max_slots field inside the autoscale struct will be 0 for these reservations, because max_slots replaces it. Traditional reservations that haven’t been converted will show NULL for scaling_mode and max_slots, and instead use autoscale.max_slots for their autoscale ceiling.

For context on which reservations are generating idle capacity — and therefore which ones would benefit from neighbors using IDLE_SLOTS_ONLY or ALL_SLOTS — join with RESERVATIONS_TIMELINE to get recent utilization. Reservations consistently running below 70-80% baseline utilization are regularly generating idle capacity that currently goes unused unless another reservation borrows it. (See BigQuery Reservations: How Does Autoscaling Really Work? for how to measure this.)

Which BigQuery reservation scaling mode should you use?

For most workloads, ALL_SLOTS is the right default. It gives you the best of both worlds: the reservation uses free idle capacity first, then autoscales only when needed. You get cost efficiency when idle slots are available and reliable burst capacity when they’re not — all bounded by the max_slots cap. Unless you have a specific reason to restrict scaling behavior, ALL_SLOTS is the safest starting point.

IDLE_SLOTS_ONLY is the right choice for workloads that aren’t time-critical and where cost predictability matters more than throughput. Batch exports, nightly aggregation jobs, or low-priority backfill pipelines that can tolerate variable performance are good candidates. Because the reservation never autoscales, you never incur PAYG charges above the baseline — but the tradeoff is that throughput depends entirely on what idle capacity happens to be available. Before relying on this mode, check that there’s a realistic source of idle capacity on your admin project. If all reservations are running at or near their baselines, there’s nothing to borrow — and an IDLE_SLOTS_ONLY reservation will be permanently capped at its baseline.

AUTOSCALE_ONLY disables idle slot borrowing entirely. The main reason to choose it is cost transparency: when a reservation doesn’t borrow, all its capacity comes from either baseline or autoscale, both of which are directly billed to the reservation. There’s no cost attribution complexity from lending and borrowing between reservations. That said, you’re leaving money on the table — idle capacity from other reservations goes unused even when it would have been free. Tracking lending and borrowing across reservations is genuinely hard with INFORMATION_SCHEMA alone, but if you have tooling that handles that attribution, ALL_SLOTS will almost always be more cost-efficient.

Workload type	Recommended mode	Reasoning
Most workloads (default)	`ALL_SLOTS`	Uses free idle capacity first, autoscales when needed, bounded by `max_slots`
Non-time-critical batch / ETL	`IDLE_SLOTS_ONLY`	Zero autoscale cost; queues gracefully if idle capacity is unavailable
Cost transparency without attribution tooling	`AUTOSCALE_ONLY`	No borrowing complexity; all capacity is directly billed to the reservation

How does Rabbit work with scaling modes?

Rabbit’s Max Slot Optimizer dynamically adjusts reservation capacity limits based on observed demand patterns, reducing autoscale waste, which is typically the largest cost variable on capacity-based pricing. For reservations using ALL_SLOTS or AUTOSCALE_ONLY, Rabbit tracks slot usage against the max_slots cap and tightens it during low-demand periods to reduce the waste left behind by BigQuery’s native autoscaler. Daangn, for example, achieved 41% recurring monthly savings once autoscale waste was made visible and automatically managed.

Read case study:
41% Recurring Monthly BigQuery Savings — Daangn's Journey with Rabbit

For IDLE_SLOTS_ONLY reservations, the optimization focus shifts: burst capacity comes from idle slots, not billed autoscale. Rabbit tracks idle slot availability across reservations, giving visibility into whether an IDLE_SLOTS_ONLY reservation is actually getting the capacity it needs or regularly hitting its baseline cap — which would be a signal to reconsider the mode or adjust the baseline.

The cost transparency argument for AUTOSCALE_ONLY also weakens with Rabbit. Lending and borrowing between reservations is one of the hardest things to track manually: GCP doesn’t report lent, borrowed, or wasted slot-hours as separate line items, and reconstructing them requires cross-correlating RESERVATIONS_TIMELINE and JOBS_TIMELINE across all reservations minute by minute. Rabbit handles this attribution automatically, decomposing each reservation’s cost into used, lent, wasted, and borrowed components at hourly granularity. With that visibility in place, there’s no reason to avoid borrowing for the sake of simpler accounting — you can use ALL_SLOTS, get the cost savings from idle slot borrowing, and still have clear cost attribution per reservation.

More broadly, choosing the right scaling mode is part of the same question as choosing the right baseline, commitment structure, and max_slots cap. These settings interact: an AUTOSCALE_ONLY reservation with an oversized max_slots cap wastes money just as an oversized baseline does. Rabbit’s Reservation Planner simulates different configurations against historical usage to help find the right combination.

Calculate your BigQuery savings with Rabbit

What do BigQuery reservation scaling modes actually change?

Scaling modes don’t fundamentally change how BigQuery reservations work — baseline slots, autoscaling, and idle slot sharing are the same underlying mechanics. What they add is explicit, per-reservation control over which mechanisms are in play, and a hard consumption cap that makes costs predictable and bounded. For most teams, ALL_SLOTS with a well-sized max_slots cap is the right starting point — it maximizes cost efficiency by borrowing free capacity first, while keeping total consumption bounded. Reserve IDLE_SLOTS_ONLY for workloads where you want to eliminate autoscale charges entirely, and AUTOSCALE_ONLY for cases where simplified cost attribution outweighs the savings from idle slot borrowing.

For a deeper look at autoscaling behavior and costs, see our post:

BigQuery Reservations: How Does Autoscaling Really Work?