Storage Tiers

KalamDB uses a dual-tier storage architecture that balances write speed with query efficiency and long-term retention.

Hot Tier (RocksDB)

The hot tier handles all incoming writes with sub-millisecond latency using RocksDB column families.

Characteristics:

⚡ Sub-millisecond write latency
Organized as column families per table
Optimized for point lookups and recent data
Data is buffered here before flushing to cold tier

Cold Tier (Parquet)

Flushed data is written to Apache Parquet files for efficient analytical queries and long-term storage.

Characteristics:

📊 Columnar format for efficient analytics
High compression ratios
Each segment tracked in manifest.json
Supports multiple storage backends (local, S3, Azure, GCS)

Flush Policy

Tables are configured with a flush policy that determines when data moves from hot to cold tier:


CREATE TABLE app.messages (
  id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
  content TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
) WITH (
  TYPE = 'USER',
  FLUSH_POLICY = 'rows:1000,interval:60'
);

Policy	Description
`rows:N`	Flush after N rows accumulated
`interval:N`	Flush every N seconds
`rows:N,interval:N`	Flush on whichever threshold is hit first

Manual Flush & Compaction


-- Flush a specific table
STORAGE FLUSH TABLE myapp.messages;
 
-- Flush all tables in a namespace
STORAGE FLUSH ALL IN myapp;
 
-- Compact cold storage
STORAGE COMPACT TABLE myapp.messages;
 
-- Check storage health
STORAGE CHECK local EXTENDED;

Per-User Storage Isolation


data/storage/
├── user/
│   ├── alice/
│   │   └── messages/
│   │       ├── manifest.json
│   │       └── batch-0.parquet
│   └── bob/
│       └── messages/
│           ├── manifest.json
│           └── batch-0.parquet
└── shared/
    └── config/
        ├── manifest.json
        └── batch-0.parquet

Each user’s data lives in a completely separate directory. This enables:

Trivial data export — just copy the user’s directory
Instant deletion — remove the directory for GDPR compliance
Independent scaling — no cross-user interference

Storage Tiers

Hot Tier (RocksDB)

Cold Tier (Parquet)

Flush Policy

Manual Flush & Compaction

Per-User Storage Isolation

Getting Started

Documentation

Resources

Community