Programming Tag

The year was 1997. The Red Hot Chili Peppers were musing on love and the motions of amusement park rides, Pathfinder landed on Mars and Leonardo DiCaprio drew Kate Winslet as per one of his French associates.  It was around this time I had heard about a thing called “Java”, a fancy new language everyone was talking about. The word on IRC was that it was based on work Sun Microsystems had originally done for embedded software on set-top boxes and other smart appliances.
Quite a while back, Google released two new features in BigQuery. One was federated sources. A federated source allows you to query external sources, like files in Google Cloud Storage (GCS), directly using SQL. They also gave us user defined functions (UDF) in that release too. Essentially, a UDF allows you to ram JavaScript right into your SQL to help you perform the map phase of your query. Sweet! In this blog post, I'll go step-by-step through how I combined BigQuery's federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.