One of the projects that I'm currently working on is developing a solution whereby millions of rows per hour are streamed real-time into Google BigQuery. This data is then available for immediate analysis by the business. The business likes this. It's an extremely interesting, yet challenging project. And we are always looking for ways of improving our streaming infrastructure.
As I explained in a previous blog post, the data/rows that we stream to BigQuery are ad-impressions, which are generated by an ad-server (Google DFP). This was a great accomplishment in its own right, especially after optimising our architecture and adding Redis into the mix. Using Redis added robustness, and stability to our infrastructure. But – there is always a but – we still need to denormalise the data before analysing it.
In this blog post I'll talk about how you can use Google Cloud Pub/Sub to denormalize your data in real-time before performing analysis on it.