Whispers from the other side of the globe with BigQuery

Setting the scene

A couple of months ago my colleague Graham Polley wrote about how we got started analysing 8+ years worth of of WSPR (pronounced ‘whisper’) data. What is WSPR? WSPR, or Weak Signal Propagation Reporter, is signal reporting network setup by radio amateurs for monitoring the ability for radio signals to get from one place to another. Why would I care? I’m a geek and I like data. More specifically the things it can tell us about seemingly complex processes. I’m also a radio amateur, and enjoy the technical aspects of  communicating around the globe with equipment I’ve built myself.

Homer simpson at Radio transceiver
Homer Simpson as a radio Amateur

TEL monthly newsletter – March 2017

Shine’s TEL group was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company’s profile as a technical thought-leader through blogs, local meet up talks, and conference presentations. Each month, the TEL group gather up all the awesome things that Shine folk have been getting up to in and around the community. Here’s the latest roundup from what’s been happening.

TEL monthly newsletter – February 2017

The TEL group was established in 2011 with the aim of publicising the great technical work that Shine does, and to raise the company’s profile as a technical thought-leader through blogs, local meet up talks, and conference presentations. Each month, the TEL group gather up all the awesome things that Shine folk have been getting up to in and around the community.  Here’s the latest roundup:

Gobbling up big-ish data for lunch using BigQuery

Beers + ‘WSPR’ = fun

To this day, I’m a firm believer in the benefits of simple, informative, and spontaneous conversations with my colleagues – at least with the ones who can stand me long enough to chat with me . Chewing the fat with other like minded folks over a beer or two is a bloody good thing. It’s how ideas are born, knowledge is shared, and relationships are formed. It’s an important aspect of any business that is sadly all too often overlooked.

Analysing Stack Overflow comment sentiment using Google Cloud Platform

The decline of Stack Overflow?

A few months back I read this post from 2015 (yes, I know I’m a little late to the party) about how Stack Overflow (SO) was in serious decline, and heading for total and utter oblivion.  In the post, the first item to be called  out was that SO “hated new users“:

Stack Overflow has always been a better-than-average resource for finding answers to programming questions. In particular, I have found a number of helpful answers to really obscure questions on the site, many of which helped me get past a road block either at work or in my hobby programming. As such, I decided I’d join the site to see if I could help out. Never before has a website given me a worse first impression.

At the time, I remember thinking that this seemed like somewhat of an unfair statement. That was mostly down to the fact that when I joined the community (many years ago), I had fond memories of a smooth on-boarding, and never experienced any snarky remarks on my initial questions. Yes, gaining traction for noobs is very, very hard, but there is a good reason why it exists.

For me, SO is invaluable. How else would I be able to pretend to know what I’m doing? How else could I copy and paste code from some other person who’s obviously a lot smarter than me, and take all the credit for it? Anyway, once I had read the post, and gotten on with my life (e.g. copying and pasting more code from SO), I did’t think too much more about the post. Maybe I had just been lucky with my foray into the SO community?

However, just last week, I was reminded of that post once again, when I noticed that BigQuery (BQ) now has a public dataset which includes all the data from SO – including user comments and answers. Do you see where I am going with this yet? If not, then don’t worry. Neither did I when I started writing this.

Shiner to present at very first YOW!Data conference

 

Shine’s very own Pablo Caif will be rocking the stage at the very first YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery, along with a killer demo of it in action. You can find more details of his talk here.

Will Swift be the next king of server side development?

Swift throne

In June 2015, Apple announced at WWDC that they were open-sourcing the Swift language and its runtime libraries. On December 3rd that year they made good on their promise. In this post I’d like to talk about why this is significant, particularly for server-side developers.

Messages in the sky

contrailscience.com_skitch_skitched_20130315_131709

One of the projects that I’m currently working on is developing a solution whereby millions of rows per hour are streamed real-time into Google BigQuery. This data is then available for immediate analysis by the business. The business likes this. It’s an extremely interesting, yet challenging project. And we are always looking for ways of improving our streaming infrastructure.

As I explained in a previous blog post, the data/rows that we stream to BigQuery are ad-impressions, which are generated by an ad-server (Google DFP). This was a great accomplishment in its own right, especially after optimising our architecture and adding Redis into the mix. Using Redis added robustness, and stability to our infrastructure.  But – there is always a but – we still need to denormalise the data before analysing it.

In this blog post I’ll talk about how you can use Google Cloud Pub/Sub to denormalize your data in real-time before performing analysis on it.

Google Cloud Dataproc and the 17 minute train challenge

multiple-seats

My work commute

My commute to and from work on the train is on average 17 minutes. It’s the usual uneventful affair, where the majority of people pass the time by surfing their mobile devices, catching a few Zs, or by reading a book. I’m one of those people who like to check in with family & friends on my phone, and see what they have been up to back home in Europe, while I’ve been snug as a bug in my bed.

Stay with me here folks.

But aside from getting up to speed with the latest events from back home, I also like to catch up on the latest tech news, and in particular what’s been happening in the rapidly evolving cloud area. And this week, one news item in my AppyGeek feed immediately jumped off the screen at me. Google have launched yet another game-changing product into their cloud platform big data suite.

It’s called Cloud Dataproc.

Spring Data REST and Projections

Project_Data

Introduction

In recent years, Spring has become much more than just a dependancy injection container and an MVC web application framework. Nowadays, it’s the go-to for building enterprise solutions due to the fact it has a fantastic community built up around it, and it has a multitude of projects that makes every developer’s life that little bit easier! In this blog post, I’m going to briefly introduce Spring Data REST, and how we used it and an unknown feature called ‘projectionson a recent project.