I use Google BigQuery a lot. On a daily basis I run dozens of queries, use it to build massively scalable data pipelines for our clients, and regularly help new users navigating it for the first time. Suffice it to say I'm somewhat accustomed to its little quirks. Unfortunately, the same can't be said for the new users who are commonly left scratching their heads, and shouting "What the fudge!?" at their monitors.
Here's the top three WTFs that I regularly hear from new BigQuery users:
A few months back I read this post from 2015 (yes, I know I'm a little late to the party) about how Stack Overflow (SO) was in serious decline, and heading for total and utter oblivion. In the post, the first item to be called out was that SO "hated new users":
Stack Overflow has always been a better-than-average resource for finding answers to programming questions. In particular, I have found a number of helpful answers to really obscure questions on the site, many of which helped me get past a road block either at work or in my hobby programming. As such, I decided I’d join the site to see if I could help out. Never before has a website given me a worse first impression.
At the time, I remember thinking that this seemed like somewhat of an unfair statement. That was mostly down to the fact that when I joined the community (many years ago), I had fond memories of a smooth on-boarding, and never experienced any snarky remarks on my initial questions. Yes, gaining traction for noobs is very, very hard, but there is a good reason why it exists.
For me, SO is invaluable. How else would I be able to pretend to know what I'm doing? How else could I copy and paste code from some other person who's obviously a lot smarter than me, and take all the credit for it? Anyway, once I had read the post, and gotten on with my life (e.g. copying and pasting more code from SO), I did't think too much more about the post. Maybe I had just been lucky with my foray into the SO community?
However, just last week, I was reminded of that post once again, when I noticed that BigQuery (BQ) now has a public dataset which includes all the data from SO - including user comments and answers. Do you see where I am going with this yet? If not, then don't worry. Neither did I when I started writing this.
Last week, Shine's very own Pablo Caif gave a presentation at GCP Next 2016 in San Francisco, which is Google’s largest annual cloud platform event. Pablo delivered an outstanding talk on the work Shine have done for Telstra, which involves building solutions on the GCP stack to manage and analyse their massive datasets. More specifically, the talk focused around two of Google’s core big data products –BigQuery & Cloud Dataflow.
One of the projects that I'm currently working on is developing a solution whereby millions of rows per hour are streamed real-time into Google BigQuery. This data is then available for immediate analysis by the business. The business likes this. It's an extremely interesting, yet challenging project. And we are always looking for ways of improving our streaming infrastructure.
As I explained in a previous blog post, the data/rows that we stream to BigQuery are ad-impressions, which are generated by an ad-server (Google DFP). This was a great accomplishment in its own right, especially after optimising our architecture and adding Redis into the mix. Using Redis added robustness, and stability to our infrastructure. But – there is always a but – we still need to denormalise the data before analysing it.
In this blog post I'll talk about how you can use Google Cloud Pub/Sub to denormalize your data in real-time before performing analysis on it.
For me, Jaws is hands down one of the best movies ever made. It's almost 40 years old but it still looks fantastic and the acting is phenomenal. And it's able to boast one of the most memorable ad-libs ever quipped by any actor on the big screen:
“In our (admittedly limited) experience, Redis is so fast that the slowest part of a cache lookup is the time spent reading and writing bytes to the network” - stackoverflow.com
Can Databases Be Exciting To Work With?
It’s very rare that a project can cause an engineer to get excited about the prospect of working with a database they've never worked with previously, especially when it’s a relational one. That mainly boils down to the fact that the majority of them are clunky monstrosities that are painfully slow and cause us to grimace at the thought of having to integrate them into our applications, not to mention having to piece together gnarly and over engineered SQL statements.