When we started using Google BigQuery – almost five years ago now – it didn’t have any partitioning functionality built into it. Heck, queries cost $20 p/TB back then too for goodness’ sake! To compensate for this lack of functionality and to save costs, we had to manually shard our tables using the well known
_YYYYMMDD suffix pattern just like everyone else. This works fine, but it’s quite cumbersome, has some hard limits, and your SQL can quickly becomes unruly.
Then about a year ago, the BigQuery team released ingestion time partitioning. This allowed users to partition tables based on the load/arrival time of the data, or by explicitly stating the partition to load the data into (using the
$ syntax). By using the
_PARTITIONTIME pseudo-column, users were more easily able to craft their SQL, and save costs by only addressing the necessary partition(s). It was a major milestone for the BigQuery engineering team, and we were quick to adopt it into our data pipelines. We rejoiced and gave each other a lot of high-fives.