3k Log in Get Started

The Guide to Apache Druid Architectures

The Guide to Apache Druid Architectures

As we build out the Rill Data team, we often encounter folks who are new to Apache Druid and looking for ways to get up to speed quickly. For this reason we maintain this “Guide to Apache Druid.” It’s meant to be a balanced list of articles, customer stories, and architectural diagrams that best helped us get up to speed answering questions like:

  • Who uses Apache Druid?
  • What are they using it for?
  • How does it fit in with the other pieces of the Modern Data Stack?

This is a living document so when particularly relevant pieces come up we’ll update this page.

The Apache Druid site

The open source Apache Druid project site itself is always a good place to start learning about Druid, and in particular, the exhaustive list of companies using Druid .

Customer Stories

It’s always best to hear directly from users exactly how they’re using Druid inside their company. While the community-maintained list of companies above is fairly exhaustive, these stories below are some of our favorites (in no particular order):

Druid Architecture

What is the internal architecture of Apache Druid and how is it different from other OLAP databases?

Reference Architecture Diagrams

If a picture is worth 1000 words, an architecture slide is worth a 1000 lines of code. One of the most valuable learnings I take away from online talks, presentations, and blogs are the architecture slides from the leading pioneers of real-time data infrastructure. What technologies have Netflix, AirBnb, Lyft, Pinterest, and Snap used to assemble data stacks to process, store, and act upon the massive amounts of signal streaming from their platforms? How do batch and real-time systems interoperate in practice? What databases are being used? Which visualization tools?

Since founding Rill, I began collecting architecture diagrams from the leading companies that have adopted real-time data stacks. I focused on Apache Druid because that's the technology we're building on at Rill, but these insights hold true for other real-time databases like Clickhouse and Pinot. I intentionally chose slides from content that was not sponsored or associated with any commercial vendor (including Rill), so you can trust that is engineers telling their stories.

While "monoDBism" is a compelling philosophy in theory, and in a vendor's interest, evidence from these leading companies shows "polyDBism" is more widely practiced. Real-time databases have a complementary role to play alongside data lakes, warehouses, key-value stores, and graph DBs.

I hope you enjoy reading this tour through reference architectures for real-time data stacks "feat. Apache Druid", as much as I enjoyed collecting and redrawing these.

Pinterest, 2020

Archmage, Pinterest’s Real-time Analytics Platform on Druid

Netflix, 2020

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

GumGum, 2020

Optimized Real-time Analytics using Spark Streaming and Apache Druid

Salesforce, 2020

Delivering High-Quality Insights Interactively Using Apache Druid at Salesforce

Reddit, 2021

Scaling Reporting at Reddit

eBay, 2019

Monitoring at eBay with Druid

Web analytics at scale with Druid

Lyft, 2018

Streaming SQL and Druid

AirBnb, 2017

How Superset and Druid Power Real-Time Analytics at AirBnB

Batch Processing

Stream Processing

Apache Superset with Maxime Beauchemin (Formerly Lyft, AirBnB, Facebook)—March 2019. Search for ‘Druid’ in Transcript to get a sense of why Maxime built Superset to run on Druid.

Did we miss something great?

We're always on the lookout for smart write-ups. Let us know if you found something that we overlooked.

More Posts

Rill’s Agentic Architecture: Analytics for the AI Era

Rill’s Agentic Architecture: Analytics for the AI Era

From prompt hacks to a unified agentic runtime for analytical work.

Video: Building Fast Agentic Analytics with Google Antigravity and Rill

Video: Building Fast Agentic Analytics with Google Antigravity and Rill

In this walkthrough, we explore how developers and data teams can use the combination of Rill Data + Google Antigravity to build, modify, and scale analytics workflows entirely in code.

Feeding the agentic beast: Building a data stack that AI loves

Feeding the agentic beast: Building a data stack that AI loves

At Rill, we’ve been building for high-concurrency, high-volume analytics workloads from the start. In this post, I’ll focus on the semantic layer — because in agentic analytics it’s the only layer that can simultaneously understand user intent, data topology, and execution cost.