Google BigQuery Tag

  Shine's very own Pablo Caif will be rocking the stage at the very first YOW! Data conference in Sydney. The conference will be running over two days (22-23 Sep) and is focused big data, analytics, and machine learning. Pablo will give his presentation on Google BigQuery,...

When you have a Big Data solution that relies upon a high quality, uninterrupted stream of data for it to meet the client’s expectation you need a solution in place that is extremely reliable and has many points of fault tolerance. That all sounds well and good but how exactly does that work in practice? Let me start by explaining the problem. About 2 years ago our team was asked to spike a streaming service that could stream billions of events per month to Google’s BigQuery. The events were to come from an endpoint on our existing Apache web stack. We would be pushing the events to BigQuery using an application written in PHP. We did exactly this, however, we were finding that requests to BigQuery were taking too long and thus resulted in slow response times for users. So we needed to find a solution to Queue the events before sending them to BigQuery.
At Shine we're big fans of Google BigQuery, which is their flagship big data processing SaaS. Load in your data of any size, write some SQL, and smash through datasets in mere seconds. We love it. It's the one true zero-ops model that we're aware of for grinding through big data without the headache of worrying about any infrastructure. It also scales to petabytes. Although we've only got terabytes, but you've got to start somewhere right? If you haven't yet been introduced to the wonderful world of BigQuery, then I suggest you take some time right after this reading this post to go and check it out. Your first 1TB is free anyway. Bargain! Anyway, back to the point of this post. There have been a lot of updates to BigQuery in recent months, both internally and via features, and I wanted to capture them all in a concise blog post. I won't go into great detail on each of them, but rather give a quick summary of each, which will hopefully give readers a good overview of what's been happening with the big Q lately. I've pulled together a lot of this stuff from various Google blog posts, videos, and announcements at GCP Next 2016 etc.
Quite a while back, Google released two new features in BigQuery. One was federated sources. A federated source allows you to query external sources, like files in Google Cloud Storage (GCS), directly using SQL. They also gave us user defined functions (UDF) in that release too. Essentially, a UDF allows you to ram JavaScript right into your SQL to help you perform the map phase of your query. Sweet! In this blog post, I'll go step-by-step through how I combined BigQuery's federated sources and UDFs to create a scalable, totally serverless, and cost-effective ETL pipeline in BigQuery.
http://www.youtube.com/watch?v=6xV6aelL6fQ Last week, Shine's very own Pablo Caif gave a presentation at GCP Next 2016 in San Francisco, which is Google’s largest annual cloud platform event. Pablo delivered an outstanding talk on the work Shine have done for Telstra, which involves building solutions on the GCP stack to manage and analyse their massive datasets. More specifically, the talk focused around two of Google’s core big data products –BigQuery & Cloud Dataflow.

Shine is extremely proud to announce that Pablo Caif has been invited to present at GCP Next 2016, which is Google's largest annual cloud platform event held in San Francisco. Pablo will be presenting on the work Shine have done for Telstra, which involves building solutions on GCP to...

[caption id="attachment_7618" align="aligncenter" width="1024"]c246283e-2952-41db-a64f-a8fb9f186c6c-original All the GDEs posing at the Googleplex[/caption] A few months back, Shine's Pablo Caif and Graham Polley were welcomed into the Google Developer Expert (GDE) program as a result of their recent work at Telstra. The projects they are working on consist of building bleeding edge big data solutions using tools like BigQuery and Cloud Dataflow on the Google Cloud Platform (GCP). You can read all about that here. GDE acceptance comes with many benefits and privileges, one of which is a yearly trip to a private summit at a different location each year. With Google footing the bill, they bring all the GDEs (around 250 currently) from around the globe for, let's admit it, a complete Google geek-out fest for 2 days! This year the summit was at the Googleplex in Mountain View. Needless to say, Pablo and Graham were chomping at the bit to go. However, in addition to the summit, Google invited them to fly out prior to actual summit itself. They had lined up a few other things especially for the guys. So this was no ordinary trip. Lucky buggers! We asked both guys to give their individual feedback on the trip, and here's what they had to say about it. Read on if you want to hear about how the guys spent six days hanging out with Google in America.
multiple-seats

My work commute

My commute to and from work on the train is on average 17 minutes. It's the usual uneventful affair, where the majority of people pass the time by surfing their mobile devices, catching a few Zs, or by reading a book. I'm one of those people who like to check in with family & friends on my phone, and see what they have been up to back home in Europe, while I've been snug as a bug in my bed. Stay with me here folks. But aside from getting up to speed with the latest events from back home, I also like to catch up on the latest tech news, and in particular what's been happening in the rapidly evolving cloud area. And this week, one news item in my AppyGeek feed immediately jumped off the screen at me. Google have launched yet another game-changing product into their cloud platform big data suite. It's called Cloud Dataproc.