Fun with Serializable Functions and Dynamic Destinations in Cloud Dataflow

Waterslide analogy. One input, multiple outputs. Each slide represents a date partition in one table.

Do you have some data that needs to be fed into BigQuery but the output must be split between multiple destination tables? Using a Cloud Dataflow pipeline, you could define some side outputs for each destination table you need, but what happens when you want to write to date partitions in a table and you’re not sure what partitions you need to write to in advance? It gets a little messy. That was the problem I encountered, but we have a solution.

What nobody at Uni will tell you about being a Software Developer

I wasn’t sure what my first day at Shine would look like. I looked for some blog posts that resembled this one for some insights but I figured everyone’s experience is different. I hadn’t worked in this industry before, and my work experience at a laptop repair shop didn’t really count. The only relevant experience I had was the industry project I did in my final year of study and that turned out to be very valuable. I knew I would be thrown into the deep end and have to learn quickly. Since day one, I’ve been surrounded by great mentors, helping with code reviews, best practices to follow, great book suggestions and general insights into how this business works. Anyway, I think enough time has passed now to reflect on this year.