Database Tag

Introduction

It’s a simple question, often asked by project managers, data scientists, and quality engineers on every data engineering project when that first data source is ingested. How do we know the data that has been ingested into a data lake is accurate and error-free?

Quite a while back, Google released two new features in BigQuery. One was federated sources. A federated source allows you to query external sources, like files in Google Cloud Storage (GCS), directly using SQL. They also gave us user defined functions (UDF) in that release too. Essentially, a UDF allows you to ram JavaScript right into your SQL to help you perform the map phase of your query. Sweet! In this blog post, I'll go step-by-step through how I combined BigQuery's federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.

ren_and_stimpy_by_buttercupnergal

“In our (admittedly limited) experience, Redis is so fast that the slowest part of a cache lookup is the time spent reading and writing bytes to the network” - stackoverflow.com

Can Databases Be Exciting To Work With?

It’s very rare that a project can cause an engineer to get excited about the prospect of working with a database they've never worked with previously, especially when it’s a relational one. That mainly boils down to the fact that the majority of them are clunky monstrosities that are painfully slow and cause us to grimace at the thought of having to integrate them into our applications, not to mention having to piece together gnarly and over engineered SQL statements.