Storage Tiers
KalamDB uses a dual-tier storage architecture that balances write speed with query efficiency and long-term retention.
Hot Tier (RocksDB)
The hot tier handles all incoming writes with sub-millisecond latency using RocksDB column families.
Characteristics:
- β‘ Sub-millisecond write latency
- Organized as column families per table
- Optimized for point lookups and recent data
- Data is buffered here before flushing to cold tier
Cold Tier (Parquet)
Flushed data is written to Apache Parquet files for efficient analytical queries and long-term storage.
Characteristics:
- π Columnar format for efficient analytics
- High compression ratios
- Each segment tracked in
manifest.json - Supports multiple storage backends (local, S3, Azure, GCS)
Flush Policy
Tables are configured with a flush policy that determines when data moves from hot to cold tier:
CREATE TABLE app.messages (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (
TYPE = 'USER',
FLUSH_POLICY = 'rows:1000,interval:60'
);| Policy | Description |
|---|---|
rows:N | Flush after N rows accumulated |
interval:N | Flush every N seconds |
rows:N,interval:N | Flush on whichever threshold is hit first |
Manual Flush & Compaction
-- Flush a specific table
STORAGE FLUSH TABLE myapp.messages;
-- Flush all tables in a namespace
STORAGE FLUSH ALL IN myapp;
-- Compact cold storage
STORAGE COMPACT TABLE myapp.messages;
-- Check storage health
STORAGE CHECK local EXTENDED;Per-User Storage Isolation
data/storage/
βββ user/
β βββ alice/
β β βββ messages/
β β βββ manifest.json
β β βββ batch-0.parquet
β βββ bob/
β βββ messages/
β βββ manifest.json
β βββ batch-0.parquet
βββ shared/
βββ config/
βββ manifest.json
βββ batch-0.parquetEach userβs data lives in a completely separate directory. This enables:
- Trivial data export β just copy the userβs directory
- Instant deletion β remove the directory for GDPR compliance
- Independent scaling β no cross-user interference
Last updated on