Do you recoil in horror at the thought of running yet another mundane SQL script just so a table is automatically rebuilt for you each day in BigQuery? Can you barely remember your name first thing in the morning, let alone remember to click “Run Query” so that your boss gets the latest data refreshed in his fancy Data Studio charts, and then takes all the credit for your hard work?

Well, fear not my fellow BigQuery’ians. There’s a solution to this madness.

It’s simple.

It’s quick.

Yes, it’s Google Apps Script to the rescue.

Disclaimer: all credit for this goes to the one and only Felipe Hoffa. He ‘da man!

1. A few mouse clicks

Apps Scripts lets you write little snippets of JS that are fully managed by Google. It’s primarily used by developers for building add-ons for products like Calendar, Docs, Drive, Gmail, and the likes. But, few people know that it also allows you to talk to BigQuery by hooking into its REST API – and not only that, but Apps Script come with a scheduling feature too, which is something Cloud Functions lacks right now (“Cloud Function PMs, I’m looking in your direction!”).

Do you see where I’m going with this yet? Of course you do silly.

OK, now jump on over to script.google.com like a good developer would ‘ya please. Start by giving your project a unique, funky name (‘my first GAS project‘ usually goes down well with the crowd), and then enable the BigQuery API (v2) by navigating to Resources > Advanced Google Services.

Without enabling this, it obviously won’t work:

2. copy and paste

You’ll like this part. It’s something we all do every day to make ourselves look much smarter than what we actually are. Yes, copy and paste the following code snippet into the code editor – in turn, I copied and pasted it from this Stack Overflow question, and dutifully hacked it:

function runQuery() {
  var configuration = {
    "query": {
    "useQueryCache": false,
    "destinationTable": {
          "projectId": "grey-sort-challenge",
          "datasetId": "apps_script_scheduler",
          "tableId": "i_was_created_by_apps_script_on_a_timer"
        },
    "writeDisposition": "WRITE_TRUNCATE",
    "createDisposition": "CREATE_IF_NEEDED",
    "allowLargeResults": true,
    "useLegacySql": false,
    "query": "SELECT * FROM `grey-sort-challenge.apps_script_scheduler.v_apps_script_scheduler_demo`"
    }
  };
  
  var job = {
    "configuration": configuration
  };

  var jobResult = BigQuery.Jobs.insert(job, "grey-sort-challenge");
  Logger.log(jobResult);
}

Replace my details with your details (projectId, datasetId etc.) and the SQL you want to run. I think a good pattern for this is to move your SQL to a view in BigQuery, and then just point to that view instead of putting all the SQL in the Apps Script code. Not only does this keep the script clean and succinct, but it also decouples the scheduler from the query itself.

The script will run the SQL specified, and then write the results to the destinationTable. Easy-peasy-lemon-squeezy, right? Save your script.

This is how the set up looks in BigQuery:

3. Run/debug

Once you’ve saved your script, you’re now ready to test it. Pay attention, this part is mega complex.

Click Run/Debug:

NoteWhen you do this you will mostly likely be prompted to give the script access to your BigQuery project/dataset. Just follow the prompts.

That’s it. Seriously!

The script will execute a BigQuery async job that will run your SQL, and write the results to the table you’ve specified. Now, for the astute readers out there, you’ll realise that by inserting the job async, you won’t know if it succeeded or not. You can modify the code to poll for the job until it’s complete, and then check the status/result if you really want to. But, I didn’t bother because it’s not a big problem if it fails for me – I just run it again manually, or wait for the next timer to kick in while I drink my double-shot latte.

Once you run/debug it, you can view the logs:

 

4. You’re almost there..

After running the script, you should be able to see the table that it created in BigQuery – as long as you didn’t get any errors of course.

Et voilà:

Now, it’s time to put it on a timer (see what I did there?) and schedule it. Again, this couldn’t be easier. Click on that weird looking icon that’s a mixture between a speech bubble and a clock. Then, configure when you want it to run, and save it.

Annnnnnndddddd, you’re done.

Give yourself a big pat on the back.

Some food for thought

How easy was that huh? There’s a few things to think about though.

The script is tied to whatever Google account that you’re signed in with. So, I wouldn’t be creating/deploying such things for enterprise production systems. Maybe there’s a way to set it up with a service account, but I didn’t have time to look into that (I’m also a little lazy today). Anyway, I envision this trick getting used primarily by data analysts who want to create their own tables from production data/tables on a regular basis.

The script itself is actually saved as a file in your Drive account, so don’t go accidentally deleting it, otherwise you’ll be forced to come back here and read (ahem…copy & paste) this blog post again.

Enjoy, and thanks again to Felipe who mentioned this tip to me.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s