GoServe Architecture

This document describes the system architecture, data flow, and component interactions of GoServe.

System Overview

┌───────────────────────────────────────────────────────────────────┐
│                          GoServe System                            │
│                                                                   │
│  ┌─────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Client    │───▶│  HTTP Server │───▶│  Handlers    │          │
│  │  (curl/app) │    │  (Port 8080) │    │  (Routing)   │          │
│  └─────────────┘    └──────────────┘    └──────┬───────┘          │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  Model Registry │       │
│                                         │  (Thread-Safe)  │       │
│                                         └────────┬────────┘       │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  ONNX Session   │       │
│                                         │  (Go Wrapper)   │       │
│                                         └────────┬────────┘       │
│                                                  │                │
                                         ┌────────▼────────┐
                                         │ ONNX Runtime    │
                                         │  (C Library)    │
                                         │   1.23.2        │
                                         └─────────────────┘
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Component Architecture

1. HTTP Server Layer

┌─────────────────────────────────────────────────────────┐
│                    HTTP Server                           │
│                   (internal/server/)                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐        ┌──────────────┐              │
│  │  Middleware  │        │   Router     │              │
│  │  - Logging   │───────▶│  (Go 1.22+)  │              │
│  │  - Request ID│        │              │              │
│  │  - Recovery  │        └──────┬───────┘              │
│  └──────────────┘               │                       │
│                                  │                       │
│         ┌────────────────────────┴──────────────┐       │
│         │                        │              │       │
│    ┌────▼─────┐          ┌───────▼──────┐  ┌────▼───┐  │
│    │  Health  │          │   Model      │  │ Infer  │  │
│    │ Handlers │          │  Handlers    │  │Handler │  │
│    └──────────┘          └──────────────┘  └────────┘  │
│                                                         │
└─────────────────────────────────────────────────────────┘

Components: - Router: Uses Go 1.22+ standard library ServeMux with method matching. - Middleware: Structured logging, request tracing, and panic recovery. - Handlers: Business logic for health checks, model management, and inference.

2. Model Registry

The Model Registry maintains the lifecycle of all loaded models in memory.

Thread Safety: - Uses sync.RWMutex to allow concurrent inference while preventing conflicts during model loading/unloading. - Optimized for read-heavy workloads (inference).

Model Struct:

type Model struct {
    Name       string              // Model identifier
    Path       string              // File path to .onnx file
    Format     string              // "onnx"
    Session    *onnx.Session       // ONNX Runtime session
    InputInfo  []onnx.TensorInfo   // Input metadata
    OutputInfo []onnx.TensorInfo   // Output metadata
    LoadedAt   time.Time           // Load timestamp
}

3. ONNX Session Wrapper

GoServe interacts with the ONNX Runtime C library via CGO bindings.

Generic Inference Pipeline: 1. Introspection: On model load, GoServe queries the ONNX model for input/output names, shapes, and data types. 2. Dynamic Tensor Mapping: Supports multiple named input and output tensors. 3. Type Handling: Supports FLOAT32 and INT64 (common for classification and embeddings).

Tensor Flow:

Go Map[string]any ──▶ Reflect & Flatten ──▶ C Tensors
     │
Pass to ONNX Runtime (CGO)
     │
ONNX Runtime C API (Inference Execution)
     │
Extract Output Tensors ──▶ Dynamic Reshape ──▶ Go Map[string]any

Performance Characteristics

Latency Overhead

While GoServe is designed for high performance, the generic engine introduces minor overheads to support flexibility: - Reflection (~5-10µs): Used to handle generic any inputs. - Flattening/Copying: Multi-dimensional Go slices are copied into contiguous C-friendly memory. - Allocation: Output tensors are reshaped into JSON-friendly Go slices for every request.

Optimization Roadmap

To further reduce latency, the following improvements are planned: 1. Buffer Pooling (sync.Pool): Reuse flattened memory buffers to reduce GC pressure. 2. Zero-Copy binary API: Support for Protobuf/FlatBuffers to pass memory pointers directly to C. 3. Session Reuse: Internal optimizations for ONNX session lifecycle management.

Security Considerations

Input Validation: Strict checking of feature counts and data types before passing to C code.
Resource Limits: Batch size limits and memory management to prevent denial-of-service.
Path Traversal Protection: Validates model paths to ensure only authorized files are loaded.
CGO Safety: Carefully managed boundaries between Go and C to prevent memory corruption.

For more details, see: - Full Technical Guide - Quick Start README - API Reference