Every five years, the Australian Bureau of Statistics counts every person and household in Australia. It includes around 10 million households and over 25 million people. Census data tells us about the economic, social, and cultural make-up of the country. The Census is a massive logistical operation that coordinates thousands of temporary staff around Australia.
To ensure efficient and comprehensive collection of Census forms both online and in the field, the ABS requires timely insights into the performance of its digital service and deployment of its field staff.
Historically, operational reporting to identify bottlenecks and inefficiencies in the Census form collection process has been performed manually by a small team. Reports were produced overnight based on the previous day’s data, which delayed the opportunity for management to interpret and respond to developing trends. This delay meant opportunities to optimise the in-field approach to form collection may have passed.
Empowering with knowledge: the 2021 census
Uniquely familiar with the power of data analytics, in 2020, the ABS saw an opportunity for the 2021 Census to transform the way it analysed and reported operational performance of its online channels and field workers. The ABS decided to create an Operational Insights (OI) reporting platform that could deliver the correct data to decision-makers – early and often. The ABS partnered with AWS Professional Services cloud experts to design and build a data analytics platform.
The purpose of this platform was to consolidate the many disparate data sources into a single real-time reporting dashboard that was user-customisable by hundreds of decision-makers.
Data lake expertise
Due to Shine’s deep experience in engineering and operating mission-critical enterprise data lakes, AWS approached Shine to work on the data engineering, analytics and reporting development.
Automated and scalable platform design
The goal was to produce an automated and scalable platform design that could deliver performance insights in near real-time to support Census digital and field operations.
The project began with a four-week consultation to gather user requirements, scope the data sources, and understand objectives. The key challenges for the project were rooted in the nature of the data. The data lake would need to aggregate over two terabytes of queryable data in almost 200 tables from various sources of real-time, batch, structured and unstructured data.
Additionally, given that many individual operational areas were collecting data, there were inconsistencies in common shared metric definitions, making it difficult to aggregate and match data accurately.
When complete, the platform also needed to be ready for IRAP (Information Security Registered Assessors Program) compliance assessment and meeting security standards certification for ‘Protected’ level classified government data.
Light and heavy data transformations
Working with AWS Professional Services and ABS business and IT staff, Shine provided architectural and engineering services through both the design and build phases of the project. The ABS consulted across business units to harmonise metric definitions and recommend new ones to solve the challenges of inconsistent definitions and datasets.
Shine and AWS consultants built a custom data transformation framework to allow the ABS to construct common conceptual data objects or entities (data/metric definitions, data points, and produce reports), derivations and dependencies.
The engineering team also built a serverless data transformation engine to ensure the resulting 1,200-plus dependent data transformations are executed in the right order. This would resolve and enforce the data dependencies each time a transformation and reporting cycle is executed.
The platform consisted of a light-duty transformation and ingestion stage, ensuring that data from different upstream systems are consistent before uploading to Amazon Redshift.
A heavy-duty transformation step then applies the business logic to produce reports, before publishing to Amazon Aurora where the data is then visualised in the Microsoft Power BI tool already licenced by ABS.
For efficiency and minimal maintenance, the solution was designed and built with a DevOps CI/CD, microservices approach, and low-maintenance serverless architecture to reduce infrastructure overhead and facilitate auto-healing and recovery. Use of serverless technologies supported the ABS’ goals of scalability to meet operational demand as well as cost optimisation.
Any changes were delivered through code and deployed into each of the SDLC environments after being thoroughly tested for functionality, security and performance. With a microservices architecture, changes could be kept small, and not impact or require changes to all components. Automation was provided through a CI/CD pipeline with Octopus Deploy and CloudFormation to quickly and easily make changes.
As a result, changes were quick, tested, and didn’t require human intervention, giving the ABS confidence to make improvements without risking any business interruption or security event.
The OI platform was designed and tested to ingest millions of changes in upstream data sources while processing data to produce the reports. The platform is capable of ingesting millions of updates or inserts per second. Transformation and reporting cycles are executed every 20 minutes. However, they could be executed more frequently, given that some are performed near real-time or in just a few minutes.