Quite a while back, Google released
two new features in BigQuery. One was federated sources. A federated source allows you to query external sources, like files in Google Cloud Storage (GCS), directly using SQL. They also gave us user defined functions (UDF) in that release too. Essentially, a UDF allows you to ram JavaScript right into your SQL to help you perform the map phase of your query. Sweet!
In this blog post, I'll go step-by-step through how I combined BigQuery's federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.
30 March, 2016
/