Skip to content

GoServe Architecture

This document describes the system architecture, data flow, and component interactions of GoServe.


Table of Contents

  1. System Overview
  2. Component Architecture
  3. Request Flow
  4. Data Flow Diagrams
  5. Threading Model
  6. Memory Management

System Overview

┌───────────────────────────────────────────────────────────────────┐
│                          GoServe System                            │
│                                                                   │
│  ┌─────────────┐    ┌──────────────┐    ┌──────────────┐          │
│  │   Client    │───▶│  HTTP Server │───▶│  Handlers    │          │
│  │  (curl/app) │    │  (Port 8080) │    │  (Routing)   │          │
│  └─────────────┘    └──────────────┘    └──────┬───────┘          │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  Model Registry │       │
│                                         │  (Thread-Safe)  │       │
│                                         └────────┬────────┘       │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │  ONNX Session   │       │
│                                         │  (Go Wrapper)   │       │
│                                         └────────┬────────┘       │
│                                                  │                │
│                                         ┌────────▼────────┐       │
│                                         │ ONNX Runtime    │       │
│                                         │  (C Library)    │       │
│                                         └─────────────────┘       │
│                                                                   │
└───────────────────────────────────────────────────────────────────┘

Component Architecture

1. HTTP Server Layer

┌─────────────────────────────────────────────────────────┐
│                    HTTP Server                           │
│                   (internal/server/)                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌──────────────┐        ┌──────────────┐              │
│  │  Middleware  │        │   Router     │              │
│  │  - Logging   │───────▶│  (Go 1.22+)  │              │
│  │  - Request ID│        │              │              │
│  │  - Recovery  │        └──────┬───────┘              │
│  └──────────────┘               │                       │
│                                  │                       │
│         ┌────────────────────────┴──────────────┐       │
│         │                        │              │       │
│    ┌────▼─────┐          ┌───────▼──────┐  ┌────▼───┐  │
│    │  Health  │          │   Model      │  │ Infer  │  │
│    │ Handlers │          │  Handlers    │  │Handler │  │
│    └──────────┘          └──────────────┘  └────────┘  │
│                                                         │
└─────────────────────────────────────────────────────────┘

Components: - Router: Uses Go 1.22+ standard library ServeMux with method matching. - Middleware: Structured logging, request tracing, and panic recovery. - Handlers: Business logic for health checks, model management, and inference.


2. Model Registry

The Model Registry maintains the lifecycle of all loaded models in memory.

Thread Safety: - Uses sync.RWMutex to allow concurrent inference while preventing conflicts during model loading/unloading. - Optimized for read-heavy workloads (inference).

Model Struct:

type Model struct {
    Name       string              // Model identifier
    Path       string              // File path to .onnx file
    Format     string              // "onnx"
    Session    *onnx.Session       // ONNX Runtime session
    InputInfo  []onnx.TensorInfo   // Input metadata
    OutputInfo []onnx.TensorInfo   // Output metadata
    LoadedAt   time.Time           // Load timestamp
}


3. ONNX Session Wrapper

GoServe interacts with the ONNX Runtime C library via CGO bindings.

Generic Inference Pipeline: 1. Introspection: On model load, GoServe queries the ONNX model for input/output names, shapes, and data types. 2. Dynamic Tensor Mapping: Supports multiple named input and output tensors. 3. Type Handling: Supports FLOAT32 and INT64 (common for classification and embeddings).

Tensor Flow:

Go Map[string]any ──▶ Reflect & Flatten ──▶ C Tensors
Pass to ONNX Runtime (CGO)
ONNX Runtime C API (Inference Execution)
Extract Output Tensors ──▶ Dynamic Reshape ──▶ Go Map[string]any


Performance Characteristics

Latency Overhead

While GoServe is designed for high performance, the generic engine introduces minor overheads to support flexibility: - Reflection (~5-10µs): Used to handle generic any inputs. - Flattening/Copying: Multi-dimensional Go slices are copied into contiguous C-friendly memory. - Allocation: Output tensors are reshaped into JSON-friendly Go slices for every request.

Optimization Roadmap

To further reduce latency, the following improvements are planned: 1. Buffer Pooling (sync.Pool): Reuse flattened memory buffers to reduce GC pressure. 2. Zero-Copy binary API: Support for Protobuf/FlatBuffers to pass memory pointers directly to C. 3. Session Reuse: Internal optimizations for ONNX session lifecycle management.


Security Considerations

  1. Input Validation: Strict checking of feature counts and data types before passing to C code.
  2. Resource Limits: Batch size limits and memory management to prevent denial-of-service.
  3. Path Traversal Protection: Validates model paths to ensure only authorized files are loaded.
  4. CGO Safety: Carefully managed boundaries between Go and C to prevent memory corruption.

For more details, see: - Full Technical Guide - Quick Start README - API Reference