Skip to Content
πŸš€ KalamDB v0.3.0-alpha2 is out β€” Learn more
ArchitectureStorage Tiers

Storage Tiers

KalamDB uses a dual-tier storage architecture that balances write speed with query efficiency and long-term retention.

Hot Tier (RocksDB)

The hot tier handles all incoming writes with sub-millisecond latency using RocksDB column families.

Characteristics:

  • ⚑ Sub-millisecond write latency
  • Organized as column families per table
  • Optimized for point lookups and recent data
  • Data is buffered here before flushing to cold tier

Cold Tier (Parquet)

Flushed data is written to Apache Parquet files for efficient analytical queries and long-term storage.

Characteristics:

  • πŸ“Š Columnar format for efficient analytics
  • High compression ratios
  • Each segment tracked in manifest.json
  • Supports multiple storage backends (local, S3, Azure, GCS)

Flush Policy

Tables are configured with a flush policy that determines when data moves from hot to cold tier:

CREATE TABLE app.messages ( id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(), content TEXT NOT NULL, created_at TIMESTAMP DEFAULT NOW() ) WITH ( TYPE = 'USER', FLUSH_POLICY = 'rows:1000,interval:60' );
PolicyDescription
rows:NFlush after N rows accumulated
interval:NFlush every N seconds
rows:N,interval:NFlush on whichever threshold is hit first

Manual Flush & Compaction

-- Flush a specific table STORAGE FLUSH TABLE myapp.messages; -- Flush all tables in a namespace STORAGE FLUSH ALL IN myapp; -- Compact cold storage STORAGE COMPACT TABLE myapp.messages; -- Check storage health STORAGE CHECK local EXTENDED;

Per-User Storage Isolation

data/storage/ β”œβ”€β”€ user/ β”‚ β”œβ”€β”€ alice/ β”‚ β”‚ └── messages/ β”‚ β”‚ β”œβ”€β”€ manifest.json β”‚ β”‚ └── batch-0.parquet β”‚ └── bob/ β”‚ └── messages/ β”‚ β”œβ”€β”€ manifest.json β”‚ └── batch-0.parquet └── shared/ └── config/ β”œβ”€β”€ manifest.json └── batch-0.parquet

Each user’s data lives in a completely separate directory. This enables:

  • Trivial data export β€” just copy the user’s directory
  • Instant deletion β€” remove the directory for GDPR compliance
  • Independent scaling β€” no cross-user interference
Last updated on