Scheduling BigQuery jobs using Google Apps Script

Scheduling BigQuery jobs using Google Apps Script

Do you recoil in horror at the thought of running yet another mundane SQL script just so a table is automatically rebuilt for you each day in BigQuery? Can you barely remember your name first thing in the morning, let alone remember to click “Run Query” so that your boss gets the latest data refreshed in his fancy Data Studio charts, and then takes all the credit for your hard work?

Well, fear not my fellow BigQuery’ians. There’s a solution to this madness.

It’s simple.

It’s quick.

Yes, it’s Google Apps Script to the rescue.

Disclaimer: all credit for this goes to the one and only Felipe Hoffa. He ‘da man!

1. A few mouse clicks

Apps Scripts lets you write little snippets of JS that are fully managed by Google. It’s primarily used by developers for building add-ons for products like Calendar, Docs, Drive, Gmail, and the likes. But, few people know that it also allows you to talk to BigQuery by hooking into its REST API – and not only that, but Apps Script come with a scheduling feature too, which is something Cloud Functions lacks right now (“Cloud Function PMs, I’m looking in your direction!”).

Do you see where I’m going with this yet? Of course you do silly.

OK, now jump on over to script.google.com like a good developer would ‘ya please. Start by giving your project a unique, funky name (‘my first GAS project‘ usually goes down well with the crowd), and then enable the BigQuery API (v2) by navigating to Resources > Advanced Google Services.

Without enabling this, it obviously won’t work:

2. copy and paste

You’ll like this part. It’s something we all do every day to make ourselves look much smarter than what we actually are. Yes, copy and paste the following code snippet into the code editor – in turn, I copied and pasted it from this Stack Overflow question, and dutifully hacked it:

function runQuery() {
  var configuration = {
    "query": {
    "useQueryCache": false,
    "destinationTable": {
          "projectId": "grey-sort-challenge",
          "datasetId": "apps_script_scheduler",
          "tableId": "i_was_created_by_apps_script_on_a_timer"
        },
    "writeDisposition": "WRITE_TRUNCATE",
    "createDisposition": "CREATE_IF_NEEDED",
    "allowLargeResults": true,
    "useLegacySql": false,
    "query": "SELECT * FROM `grey-sort-challenge.apps_script_scheduler.v_apps_script_scheduler_demo`"
    }
  };
  
  var job = {
    "configuration": configuration
  };

  var jobResult = BigQuery.Jobs.insert(job, "grey-sort-challenge");
  Logger.log(jobResult);
}

Replace my details with your details (projectId, datasetId etc.) and the SQL you want to run. I think a good pattern for this is to move your SQL to a view in BigQuery, and then just point to that view instead of putting all the SQL in the Apps Script code. Not only does this keep the script clean and succinct, but it also decouples the scheduler from the query itself.

The script will run the SQL specified, and then write the results to the destinationTable. Easy-peasy-lemon-squeezy, right? Save your script.

This is how the set up looks in BigQuery:

3. Run/debug

Once you’ve saved your script, you’re now ready to test it. Pay attention, this part is mega complex.

Click Run/Debug:

NoteWhen you do this you will mostly likely be prompted to give the script access to your BigQuery project/dataset. Just follow the prompts.

That’s it. Seriously!

The script will execute a BigQuery async job that will run your SQL, and write the results to the table you’ve specified. Now, for the astute readers out there, you’ll realise that by inserting the job async, you won’t know if it succeeded or not. You can modify the code to poll for the job until it’s complete, and then check the status/result if you really want to. But, I didn’t bother because it’s not a big problem if it fails for me – I just run it again manually, or wait for the next timer to kick in while I drink my double-shot latte.

Once you run/debug it, you can view the logs:

 

4. You’re almost there..

After running the script, you should be able to see the table that it created in BigQuery – as long as you didn’t get any errors of course.

Et voilà:

Now, it’s time to put it on a timer (see what I did there?) and schedule it. Again, this couldn’t be easier. Click on that weird looking icon that’s a mixture between a speech bubble and a clock. Then, configure when you want it to run, and save it.

Annnnnnndddddd, you’re done.

Give yourself a big pat on the back.

Some food for thought

How easy was that huh? There’s a few things to think about though.

The script is tied to whatever Google account that you’re signed in with. So, I wouldn’t be creating/deploying such things for enterprise production systems. Maybe there’s a way to set it up with a service account, but I didn’t have time to look into that (I’m also a little lazy today). Anyway, I envision this trick getting used primarily by data analysts who want to create their own tables from production data/tables on a regular basis.

The script itself is actually saved as a file in your Drive account, so don’t go accidentally deleting it, otherwise you’ll be forced to come back here and read (ahem…copy & paste) this blog post again.

Enjoy, and thanks again to Felipe who mentioned this tip to me.

server.admin@shinesolutions.com
9 Comments
  • Tal
    Posted at 17:57h, 22 November Reply

    Thanks Graham, great post, works like a charm!!! I’m already using it…

    Could you please extend the explanation for scenarios such as using a federated query – querying a view based on a table which is pointing to a Google Sheet in Google Drive? It doesn’t seem to work for this kind of queries, probably because of permissions/access issues, even though my username has admin access to both BQ project and the Google Sheet file.

    Thanks!!

  • dbinott
    Posted at 16:02h, 19 December Reply

    How can I get this to output to a google sheet?

  • Paleti
    Posted at 21:00h, 16 January Reply

    Thanks Graham, well explained just got working without any other search 🙂

    I am also thinking is there is way to excute some scripts on the machines based the query results. I want to automate some fixes on the machines if query returns bad machines.

  • Bala
    Posted at 03:11h, 20 February Reply

    Thanks for the post.
    Is it possible to do streaming insert. I have one BQ table, that is filled with streaming data from web pages and I want to read the rows as in when it gets filled and insert in to another table.

    Regards,
    Bala

  • Senya
    Posted at 21:57h, 29 May Reply

    Hey man, thanks for this really helpful. The only problem is when I use really big queries it gives me this mistake “Missing } after property list. (line 14, file “Code”)” when I try to save the file. Is there any way to fix this? Thanks!

  • Suresh M N
    Posted at 20:08h, 04 September Reply

    Thanks for the post and its very useful. however, How we can track the job fails here?

  • Emrah Özkan
    Posted at 19:01h, 15 November Reply

    Could you make example about the partitional tables like this;
    mytable_20181101
    mytable_20181102
    mytable_20181103
    mytable_20181104

    Thanks,

  • Benjamin
    Posted at 02:27h, 10 April Reply

    Thank you so much! This was really useful! However I did notice that of the queries that are pulling from a google sheet as the data source, it will not update? There is no errors when I run the function but when I check the last modified date; it does not update.

    Any insight from anyone would be awesome!!

    Thanks again!!!

Leave a Reply

%d bloggers like this: