How It Works
From raw government data to interactive dashboards — here's the pipeline that powers the Australian Health Data Hub.
Data Collection
Over 60 custom Python ingestors automatically collect health data from official Australian sources on a scheduled basis. These include:
- Australian Bureau of Statistics (ABS)
- Australian Institute of Health and Welfare (AIHW)
- data.gov.au open data portal
- State and territory health departments (NSW Health, SA Health, Vic DHHS, etc.)
- Medicare and PBS statistics
- Various research institutions and registries
Data is pulled from REST APIs, downloaded as XLSX/CSV files, and scraped from structured web pages, then normalised into a consistent format.
Processing & Storage
Raw data is cleaned, validated, and transformed into standardised schemas. Each dataset is tagged with metadata including source, date range, geographic scope, and category. Processed data is stored in a SQLite database optimised for fast analytical queries.
The processing pipeline handles unit conversions, date normalisation, geographic mapping (postcodes to SA2/SA3/LGA regions), and cross-referencing between datasets.
API Layer
A Python FastAPI application serves as the backend, exposing the processed data through a clean RESTful API. The API supports filtering, aggregation, time-series queries, and geographic breakdowns — allowing the frontend to request exactly the data it needs for each dashboard view.
Frontend & Visualisation
The interactive dashboards are built with Next.js and React, providing a fast, responsive experience across desktop and mobile. Data is presented through charts, maps, tables, and narrative summaries — each designed for a specific audience (families, clinicians, researchers, policymakers).
Deployment & Infrastructure
The platform runs on a Linux VPS with Apache as the web server, fronted by Cloudflare for CDN, DDoS protection, and SSL termination. The backend API runs as a systemd service using Uvicorn. Static frontend assets are served directly by Apache for maximum performance.
The entire stack is managed through automated deployment scripts, with the codebase maintained in Git.