The TEL group was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company’s profile as a technical thought-leader through blogs, local meet up talks, and conference presentations. Each month, the TEL group gather up all the awesome things that Shine folk have been getting up to in and around the community. Here’s the latest roundup:
Beers + ‘WSPR’ = fun
To this day, I’m a firm believer in the benefits of simple, informative, and spontaneous conversations with my colleagues – at least with the ones who can stand me long enough to chat with me . Chewing the fat with other like minded folks over a beer or two is a bloody good thing. It’s how ideas are born, knowledge is shared, and relationships are formed. It’s an important aspect of any business that is sadly all too often overlooked.
The decline of Stack Overflow?
A few months back I read this post from 2015 (yes, I know I’m a little late to the party) about how Stack Overflow (SO) was in serious decline, and heading for total and utter oblivion. In the post, the first item to be called out was that SO “hated new users“:
Stack Overflow has always been a better-than-average resource for finding answers to programming questions. In particular, I have found a number of helpful answers to really obscure questions on the site, many of which helped me get past a road block either at work or in my hobby programming. As such, I decided I’d join the site to see if I could help out. Never before has a website given me a worse first impression.
At the time, I remember thinking that this seemed like somewhat of an unfair statement. That was mostly down to the fact that when I joined the community (many years ago), I had fond memories of a smooth on-boarding, and never experienced any snarky remarks on my initial questions. Yes, gaining traction for noobs is very, very hard, but there is a good reason why it exists.
For me, SO is invaluable. How else would I be able to pretend to know what I’m doing? How else could I copy and paste code from some other person who’s obviously a lot smarter than me, and take all the credit for it? Anyway, once I had read the post, and gotten on with my life (e.g. copying and pasting more code from SO), I did’t think too much more about the post. Maybe I had just been lucky with my foray into the SO community?
However, just last week, I was reminded of that post once again, when I noticed that BigQuery (BQ) now has a public dataset which includes all the data from SO – including user comments and answers. Do you see where I am going with this yet? If not, then don’t worry. Neither did I when I started writing this.
Shine’s very own Pablo Caif will be rocking the stage at the very first YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery, along with a killer demo of it in action. You can find more details of his talk here.
At Shine we’re big fans of Google BigQuery, which is their flagship big data processing SaaS. Load in your data of any size, write some SQL, and smash through datasets in mere seconds. We love it. It’s the one true zero-ops model that we’re aware of for grinding through big data without the headache of worrying about any infrastructure. It also scales to petabytes. Although we’ve only got terabytes, but you’ve got to start somewhere right?
If you haven’t yet been introduced to the wonderful world of BigQuery, then I suggest you take some time right after this reading this post to go and check it out. Your first 1TB is free anyway. Bargain!
Anyway, back to the point of this post. There have been a lot of updates to BigQuery in recent months, both internally and via features, and I wanted to capture them all in a concise blog post. I won’t go into great detail on each of them, but rather give a quick summary of each, which will hopefully give readers a good overview of what’s been happening with the big Q lately. I’ve pulled together a lot of this stuff from various Google blog posts, videos, and announcements at GCP Next 2016 etc.
In this blog post, I’ll go step-by-step through how I combined BigQuery’s federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.
Last week, Shine’s very own Pablo Caif gave a presentation at GCP Next 2016 in San Francisco, which is Google’s largest annual cloud platform event. Pablo delivered an outstanding talk on the work Shine have done for Telstra, which involves building solutions on the GCP stack to manage and analyse their massive datasets. More specifically, the talk focused around two of Google’s core big data products –BigQuery & Cloud Dataflow.
A few months back, Shine’s Pablo Caif and Graham Polley were welcomed into the Google Developer Expert (GDE) program as a result of their recent work at Telstra. The projects they are working on consist of building bleeding edge big data solutions using tools like BigQuery and Cloud Dataflow on the Google Cloud Platform (GCP). You can read all about that here.
GDE acceptance comes with many benefits and privileges, one of which is a yearly trip to a private summit at a different location each year. With Google footing the bill, they bring all the GDEs (around 250 currently) from around the globe for, let’s admit it, a complete Google geek-out fest for 2 days!
This year the summit was at the Googleplex in Mountain View. Needless to say, Pablo and Graham were chomping at the bit to go. However, in addition to the summit, Google invited them to fly out prior to actual summit itself. They had lined up a few other things especially for the guys. So this was no ordinary trip. Lucky buggers!
We asked both guys to give their individual feedback on the trip, and here’s what they had to say about it. Read on if you want to hear about how the guys spent six days hanging out with Google in America.
One of the projects that I’m currently working on is developing a solution whereby millions of rows per hour are streamed real-time into Google BigQuery. This data is then available for immediate analysis by the business. The business likes this. It’s an extremely interesting, yet challenging project. And we are always looking for ways of improving our streaming infrastructure.
As I explained in a previous blog post, the data/rows that we stream to BigQuery are ad-impressions, which are generated by an ad-server (Google DFP). This was a great accomplishment in its own right, especially after optimising our architecture and adding Redis into the mix. Using Redis added robustness, and stability to our infrastructure. But – there is always a but – we still need to denormalise the data before analysing it.
In this blog post I’ll talk about how you can use Google Cloud Pub/Sub to denormalize your data in real-time before performing analysis on it.
Well, today Shine are proud to announce that both guys have been officially awarded the braggable title of GDE, and we’d like to congratulate them on this mighty big achievement!
Becoming an expert, meant undergoing a stringent evaluation and interview process, as well as being nominated by a Google employee, authorised by the Google Developers team, and all based on the special contribution they make to their field.