When we started using Google BigQuery - almost five years ago now - it didn't have any partitioning functionality built into it. Heck, queries cost $20 p/TB back then too for goodness' sake! To compensate for this lack of functionality and to save costs, we had to manually shard our tables using the well known
suffix pattern just like everyone else. This works fine, but it's quite cumbersome, has some hard limits, and your SQL can quickly becomes unruly.
Then about a year ago, the BigQuery team released ingestion time partitioning. This allowed users to partition tables based on the load/arrival time of the data, or by explicitly stating the partition to load the data into (using the
syntax). By using the
pseudo-column, users were more easily able to craft their SQL, and save costs by only addressing the necessary partition(s). It was a major milestone for the BigQuery engineering team, and we were quick to adopt it into our data pipelines. We rejoiced and gave each other a lot of high-fives.
09 April, 2018