Technical Case Study: Building Horizon - A High-Performance Logging & Analytics Engine

Introduction

As the founder and lead architect of Horizon, my goal was to build a self-hostable, high-performance logging and analytics engine that could rival commercial solutions without their prohibitive cost and complexity. This journey involved overcoming significant infrastructure challenges, selecting the right technologies, and architecting for scale from day one.

The Problem: The Cost & Complexity of Traditional Logging Stacks

Modern applications generate an exponential volume of logs. Traditional solutions like Elasticsearch (part of the ELK stack) are powerful but come with steep operational overhead:

Resource Intensive: Elasticsearch clusters are notoriously RAM-hungry and complex to manage, especially at scale.
Schema Rigidity: While flexible, managing schemas for diverse log data can become a burden.
High Latency for Aggregations: For real-time analytics on high-cardinality data, Elasticsearch can struggle with query performance.

My objective was to create a solution that offered sub-second query performance on millions of log entries, simplified deployment, and provided cost-effective storage for vast amounts of data.

Solution Architecture: ClickHouse at the Core

After evaluating several OLAP (Online Analytical Processing) databases, ClickHouse emerged as the optimal choice for Horizon due to its:

Columnar Storage: Ingests and queries large datasets at incredible speeds. Ideal for logging, where you often query specific columns (e.g., level, timestamp, message) across vast rows.
Vectorized Query Execution: Processes data in batches, leveraging modern CPU architectures for unmatched analytical performance.
Append-Only Nature: Perfect for log data, which is written once and rarely updated.

The Stack:

Ingestion API: Fastify (Node.js/TypeScript) for high-throughput, non-blocking ingestion.
Primary Database: PostgreSQL for managing user accounts, organizations, projects, and metadata.
Analytics Engine: ClickHouse for storing and querying raw log data.
Deployment: Docker containers orchestrated on Coolify.
Frontend: Next.js (App Router) for a real-time, interactive dashboard.

Overcoming Key Challenges

Challenge 1: The Elusive "EADDRINUSE" - Container Port Conflicts

During initial deployment on Coolify, I frequently encountered EADDRINUSE errors, preventing the Fastify backend from binding to its designated port.

Problem: Docker's aggressive port mapping and lingering processes on the host could lead to conflicts, even after container restarts.
Solution: Implemented explicit host-to-container port mapping (8123:8123) in Coolify, ensuring ClickHouse's HTTP port was consistently available. This required careful netstat analysis on the host to identify and kill ghost processes, leading to a robust initial setup.

Challenge 2: ClickHouse Connectivity & Schema Management

Connecting a Node.js Fastify backend to ClickHouse required a reliable client and careful schema definition.

Problem: Ensuring the ClickHouse client (clickhouse-js) could establish and maintain connections, and that the logs table was correctly defined for various data types (strings, JSON, timestamps).
Solution: Developed a dedicated initClickHouse() function in the Fastify backend. This function programmatically connected to ClickHouse, executed DDL (Data Definition Language) to create the logs table with a suitable schema (e.g., timestamp DateTime64(3), metadata JSON), and verified its existence. This automated schema management streamlined deployments.

Challenge 3: High-Volume Log Ingestion & Quota Enforcement

A logging engine must handle bursts of incoming data without dropping events or exceeding resource limits.

Problem: Directly writing each log entry to ClickHouse would create excessive overhead. Additionally, a SaaS model requires enforcing usage quotas.
Solution:

Batching: Designed the @horizon/node SDK to buffer logs client-side and send them to the POST /v1/ingest endpoint in batches (e.g., 50 logs or every 2 seconds).
Asynchronous Processing: The Fastify API immediately accepts the batched logs and processes them asynchronously, preventing upstream service blocking.
Atomic Quota Guard: Before inserting into ClickHouse, the ingestion endpoint atomically checks and increments currentUsage in PostgreSQL. If the organization exceeds its usageLimit, the request is rejected with a 429 Too Many Requests status, protecting the infrastructure.

Results & Impact

Horizon successfully demonstrated:

Scalable Ingestion: The Fastify backend, coupled with client-side batching, effectively manages high volumes of incoming log data.
Real-time Analytics: ClickHouse provides near-instantaneous query results for live-tailing and aggregate analytics, even across millions of records.
Operational Efficiency: The self-hostable design (via Docker/Coolify) and automated schema management drastically reduce operational complexity and cost compared to traditional alternatives.
Developer Experience: The robust SDK, transports (Pino, Winston), and auto-error reporting significantly ease integration for developers.

Future Enhancements

Implementing dynamic log level switching for enhanced debugging in production.
Adding alerting capabilities based on error thresholds and log patterns.
Expanding the SDK to support more languages and frameworks.

Conclusion

Building Horizon has been a testament to careful architectural planning, strategic technology choices, and hands-on problem-solving. It stands as a powerful, efficient, and cost-effective logging solution, ready to empower developers with deep insights into their applications.

View Horizon Live | View Dashboard | Explore SDK on NPM;

Technical Case Study: Horizon