24 Dec 2015 YOW! 2015 Melbourne: A Conference Report
The YOW! 2015 Developer Conference in Melbourne took place a few weeks ago, and once again the organisers did a splendid job curating a selection of both international and local speakers (including Shine’s very own Ben Teese). There were also delicious meals and glorious developer fuel (a.k.a coffee) to keep the energy going strong between the amazing talks.
This year’s conference felt like it featured a wider variety of topics compared to previous years; headlining were Mobile development, Lean and Agile, Performance Testing, Software Architecture and Design, Big Data, Cloud Platforms, and DevOps. There was one topic however that took the crown and was presented with an overwhelming sense of urgency and importance: Microservices.
We had talks from big players such as Facebook, Uber, ThoughtWorks and Netflix, each giving an insight on how they are using microservices and how nearly everything they have done is a microservice (Over 1000 services!). It is safe to say that it was this year’s favourite buzz word.
Did you say Microservices?
Why yes, repeatedly.
YOW! Day One’s opening keynote “It’s Complicated” by Adrian Cockcroft initially tackled the subject from a very philosophical approach, quoting The Hitchhiker’s Guide to the Galaxy and asking the audience “What is the meaning of life, the universe… and everything?”
This lead us to the question of what it means for something to be complicated, for which Adrian proposed an answer that many engineers might relate to: “Too many moving parts and I can’t intuit it’s behaviour”.
The most memorable thought I was left with was how much of a difference it makes when process doesn’t get in the way of creativity and innovation. Adrian raised the point on how process can drive away talent.
Another high ranking theme from this talk was ‘ownership’ around software development. Having the team who built a service maintain it all the way through to production as opposed of just hand-balling it to the generic ‘support team’.
Failure testing at Uber Scale
The most enjoyable presentation had to be “Designing for Failure: Scaling Uber’s Backend by Breaking Everything” by Matt Ranney, where he went through some of the challenges Uber had to deal with overtime and how their crazy growth affected their system design, for better or worse.
When you go to large conferences like YOW! you will often find yourself looking at these seemingly perfectly crafted software systems from big players like Facebook, Netflix where everything is automated and not a speck of dust falls in the wrong place. Compare them to your own workplace systems feels somewhat inappropriate, like reaching towards a system utopia that is unachievable. But the good news is that it turns out we’re not alone; Matt says “Don’t worry, we feel that too”. This is somewhat comforting.
At Uber they now have over 600 microservices, and that number only continues to grow. They use tools like Simian Army/Chaos Monkey to assist with their failure test and follow the principles of chaos engineering in order to build confidence in the system’s capability to withstand turbulent conditions in production.
If you haven’t heard of Chaos Monkey, it’s a service that randomly seeks out Auto Scaling Groups and terminates your EC2 instances in order to ensure your application can tolerate random failures.
Oh yeah, and it does it in production. Just let that sink in, I’ll wait.
Alrighty, moving on. Uber didn’t get to this level without going through many challenges. Matt went over their worst ever outage, in which they didn’t have an available Postgres master database and numerous slaves were corrupted for approximately 16 hours and had inadequate read capacity for an additional 24 hours.
In short, they had to decipher mysterious Postgres error messages (“invalid magic number 0000 in log fle 50610, segment 179, offset 0″,,,,,,,,,””) and after several hours of attempts to correctly promote a slave to master and regain followers, ended up having to write a C program at midnight to allow them to skip corrupted files and bypass the normal safety mechanisms built into PostgreSQL.
So at the end of this, we were left with a lesson on spending a bit more effort in writing meaningful error messages and testing for failure left, right and centre – so you too don’t end up writing random C programs in the middle of the night.
Dave Hahn gave us an insight in “A Day in the Life of a Netflix Engineer“, showing us what drives them to align themselves to the company goal of “Winning Moments of Truth” (those moments being when a family uses their spare time to watch Netflix).
On the topic of microservices, Dave mentioned their insanely cool microservices infrastructure, with includes self-diagnosing and self -healing agents.
With hundreds of production changes every hour, thousands of separate microservices, and tens of thousands of instances, it is hard to imagine it working any other way.
Finally, success also hinged upon around each engineering team’s responsibility to write and maintain their own services, which links back to the ‘100% ownership culture’ in software development that was mentioned by Adrian Cockcroft earlier in the day.
See YOW! later!
As always, I had a lot of fun and sure learned a lot at YOW! this year. It most definitely lived up to expectations and I can’t wait to see how companies will take on the invaluable information and groundwork on microservices that we learnt this year.
I am really looking forward to seeing what the organisers will bring out next year. Personally I’d love to see even more talks on the nitty-gritty of software development and systems security. But whatever happens, I’m confident YOW! will always be worth the price of admission.
Francis KimPosted at 18:47h, 25 December
600+ microservices!? I think I can make Uber work with 60 😉 Merry Christmas!
Michael LeroyPosted at 10:02h, 01 January
Thanks for the rundown on the conference Arthur, it was a good read!
If you’re interested in chaos engineering, you should check out Netflix’s recent post on Chaos Kong. It takes down an entire AWS region for maximum chaos :p