If your BigQuery queries are drowning in duplicates, it’s time to bring out the big guns: QUALIFY + ROW_NUMBER(). This trick is a lifesaver when cleaning up messy datasets, and shoutout to Constantin Lungu for highlighting it! I with great pleasure borrowed his screenshot to make it even clearer!

Here’s the deal:

You need to keep only the first (or last) occurrence of something — say, the first event per user. Easy, right? Just use ROW_NUMBER() over your partition by user and order by event time. Then, use QUALIFY [window function result] = 1, and boom — duplicates gone!

BUT… there’s a catch.

Sometimes, two events happen at the exact same time. Uh-oh. What now?

- ROW_NUMBER() will randomly pick one and move on — clean, but non-deterministic.
- DENSE_RANK() will keep ties, letting you decide what to do next.

Moral of the story? Choose your ranking function wisely!

Want to get all my top Linkedin content? I regularly upload it to one Notion doc.

Go here to download it for FREE.