GoServe Architecture
This document describes the system architecture, data flow, and component interactions of GoServe.
Table of Contents
- System Overview
- Component Architecture
- Request Flow
- Data Flow Diagrams
- Threading Model
- Memory Management
System Overview
┌───────────────────────────────────────────────────────────────────┐
│ GoServe System │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Client │───▶│ HTTP Server │───▶│ Handlers │ │
│ │ (curl/app) │ │ (Port 8080) │ │ (Routing) │ │
│ └─────────────┘ └──────────────┘ └──────┬───────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Model Registry │ │
│ │ (Thread-Safe) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ ONNX Session │ │
│ │ (Go Wrapper) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ ONNX Runtime │ │
│ │ (C Library) │ │
│ └─────────────────┘ │
│ │
└───────────────────────────────────────────────────────────────────┘
Component Architecture
1. HTTP Server Layer
┌─────────────────────────────────────────────────────────┐
│ HTTP Server │
│ (internal/server/) │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Middleware │ │ Router │ │
│ │ - Logging │───────▶│ (Go 1.22+) │ │
│ │ - Request ID│ │ │ │
│ │ - Recovery │ └──────┬───────┘ │
│ └──────────────┘ │ │
│ │ │
│ ┌────────────────────────┴──────────────┐ │
│ │ │ │ │
│ ┌────▼─────┐ ┌───────▼──────┐ ┌────▼───┐ │
│ │ Health │ │ Model │ │ Infer │ │
│ │ Handlers │ │ Handlers │ │Handler │ │
│ └──────────┘ └──────────────┘ └────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
Components:
- Router: Uses Go 1.22+ standard library ServeMux with method matching.
- Middleware: Structured logging, request tracing, and panic recovery.
- Handlers: Business logic for health checks, model management, and inference.
2. Model Registry
The Model Registry maintains the lifecycle of all loaded models in memory.
Thread Safety:
- Uses sync.RWMutex to allow concurrent inference while preventing conflicts during model loading/unloading.
- Optimized for read-heavy workloads (inference).
Model Struct:
type Model struct {
Name string // Model identifier
Path string // File path to .onnx file
Format string // "onnx"
Session *onnx.Session // ONNX Runtime session
InputInfo []onnx.TensorInfo // Input metadata
OutputInfo []onnx.TensorInfo // Output metadata
LoadedAt time.Time // Load timestamp
}
3. ONNX Session Wrapper
GoServe interacts with the ONNX Runtime C library via CGO bindings.
Generic Inference Pipeline:
1. Introspection: On model load, GoServe queries the ONNX model for input/output names, shapes, and data types.
2. Dynamic Tensor Mapping: Supports multiple named input and output tensors.
3. Type Handling: Supports FLOAT32 and INT64 (common for classification and embeddings).
Tensor Flow:
Go Map[string]any ──▶ Reflect & Flatten ──▶ C Tensors
│
Pass to ONNX Runtime (CGO)
│
ONNX Runtime C API (Inference Execution)
│
Extract Output Tensors ──▶ Dynamic Reshape ──▶ Go Map[string]any
Performance Characteristics
Latency Overhead
While GoServe is designed for high performance, the generic engine introduces minor overheads to support flexibility:
- Reflection (~5-10µs): Used to handle generic any inputs.
- Flattening/Copying: Multi-dimensional Go slices are copied into contiguous C-friendly memory.
- Allocation: Output tensors are reshaped into JSON-friendly Go slices for every request.
Optimization Roadmap
To further reduce latency, the following improvements are planned:
1. Buffer Pooling (sync.Pool): Reuse flattened memory buffers to reduce GC pressure.
2. Zero-Copy binary API: Support for Protobuf/FlatBuffers to pass memory pointers directly to C.
3. Session Reuse: Internal optimizations for ONNX session lifecycle management.
Security Considerations
- Input Validation: Strict checking of feature counts and data types before passing to C code.
- Resource Limits: Batch size limits and memory management to prevent denial-of-service.
- Path Traversal Protection: Validates model paths to ensure only authorized files are loaded.
- CGO Safety: Carefully managed boundaries between Go and C to prevent memory corruption.
For more details, see: - Full Technical Guide - Quick Start README - API Reference