How It Works

From raw government data to interactive dashboards — here's the pipeline that powers the Australian Health Data Hub.

1

Data Collection

Over 60 custom Python ingestors automatically collect health data from official Australian sources on a scheduled basis. These include:

  • Australian Bureau of Statistics (ABS)
  • Australian Institute of Health and Welfare (AIHW)
  • data.gov.au open data portal
  • State and territory health departments (NSW Health, SA Health, Vic DHHS, etc.)
  • Medicare and PBS statistics
  • Various research institutions and registries

Data is pulled from REST APIs, downloaded as XLSX/CSV files, and scraped from structured web pages, then normalised into a consistent format.

2

Processing & Storage

Raw data is cleaned, validated, and transformed into standardised schemas. Each dataset is tagged with metadata including source, date range, geographic scope, and category. Processed data is stored in a SQLite database optimised for fast analytical queries.

The processing pipeline handles unit conversions, date normalisation, geographic mapping (postcodes to SA2/SA3/LGA regions), and cross-referencing between datasets.

3

API Layer

A Python FastAPI application serves as the backend, exposing the processed data through a clean RESTful API. The API supports filtering, aggregation, time-series queries, and geographic breakdowns — allowing the frontend to request exactly the data it needs for each dashboard view.

4

Frontend & Visualisation

The interactive dashboards are built with Next.js and React, providing a fast, responsive experience across desktop and mobile. Data is presented through charts, maps, tables, and narrative summaries — each designed for a specific audience (families, clinicians, researchers, policymakers).

5

Deployment & Infrastructure

The platform runs on a Linux VPS with Apache as the web server, fronted by Cloudflare for CDN, DDoS protection, and SSL termination. The backend API runs as a systemd service using Uvicorn. Static frontend assets are served directly by Apache for maximum performance.

The entire stack is managed through automated deployment scripts, with the codebase maintained in Git.

Tech Stack

Data Collection: Python, custom ingestors, scheduled tasks
Backend API: Python, FastAPI, Uvicorn
Database: SQLite (analytical workload)
Frontend: Next.js, React, TypeScript
Web Server: Apache 2.4
CDN & Security: Cloudflare
Infrastructure: Linux VPS, systemd
Development: AI-assisted (RapidDevAI)