Why do analysts run queries on full data during development?
Because that's where the data is. You write a new query, you want to see if it works, you hit run. Full table, full scan, full cost. This happens multiple times per hour during active development.
What does that cost in practice?
GA4 exports accumulate fast. A year of data for a medium-traffic property can be several terabytes. Running exploratory queries on that repeatedly is not a technical problem — it's a cost problem. I've watched single development sessions generate more BigQuery costs than an entire week of production queries.
What does TABLESAMPLE actually do?
TABLESAMPLE SYSTEM (1 PERCENT) returns a random 1% sample of your table. BigQuery reads a fraction of the data, returns results almost instantly, and charges almost nothing.
You can verify:
- Query logic and structure
- JOIN behavior
- Column names and data types
- General output shape
What you can't verify: exact counts or percentages. But during development you don't need exact counts — you need to know if the logic is right.
When do you switch to full data?
Only for the final production run. Develop on sample, validate on sample, run final query on full data, ship result. I've introduced this habit to three analytics teams. In each case, monthly BigQuery costs dropped noticeably within the first billing cycle.
Want to get all my top Linkedin content? I regularly upload it to one Notion doc.
Go here to download it for FREE

