Skip to content

Latest commit

 

History

History
540 lines (423 loc) · 22.2 KB

File metadata and controls

540 lines (423 loc) · 22.2 KB

netcode Java Conversion Plan

1. Source Analysis

1.1 What is netcode?

netcode is a UDP-based, encrypted, connection-oriented client/server protocol (version 1.02). It is the reference implementation in C (~9000 lines, single file netcode.c) by Glenn Fiedler / Mas Bandwidth LLC.

Key security goals:

  • Protect against zombie clients, MITM, DDoS amplification, replay attacks
  • Authentication via signed connect tokens issued by a web backend
  • All packets (except connection request) are encrypted with ChaCha20-Poly1305

2. C Codebase - Module Map

C Layer Lines Responsibility
Byte I/O primitives ~100 Little-endian read/write uint8/16/32/64
Address ~180 Parse/format/compare IPv4 and IPv6 + port
Socket ~320 Non-blocking UDP socket: create, send, recv
Crypto ~80 Wrap libsodium XChaCha20-Poly1305 (big nonce) and ChaCha20-Poly1305 (IETF, 96-bit nonce)
Connect Token (private) ~200 Serialize, encrypt, decrypt, read the 1024-byte private token
Challenge Token ~100 Serialize, encrypt (chacha20), read the 300-byte challenge
Packets ~700 Write/read all 7 packet types; replay protection
Connect Token (public) ~150 Serialize/deserialize the full 2048-byte public token
Packet Queue ~80 Fixed-size circular queue of received payload packets (size 256)
Network Simulator ~200 Optional in-process latency/loss/jitter/duplicate simulation
Encryption Manager ~200 Map IP:port -> send/receive key pairs; timeout-based eviction
Connect Token Entry Table ~80 Replay prevention for connect tokens (MAC + address table)
Client state machine ~1200 Full client lifecycle, UDP send/recv, state transitions
Server ~2000 Full server lifecycle, slot management, per-client state
Generate Connect Token ~100 Public API to generate tokens (used by web backend)
Tests ~3000 In-process tests covering all modules

3. Key Data Structures

3.1 Addresses

netcode_address_t:
  type    : uint8  (0=NONE, 1=IPv4, 2=IPv6)
  data    : union { uint8[4] ipv4 | uint16[8] ipv6 }
  port    : uint16

Java mapping: NetcodeAddress (value type / plain class, no allocation on hot path - stack-allocated via pre-allocated instances)

3.2 Connect Token (Public - 2048 bytes)

[version_info]       13 bytes   "NETCODE 1.02\0"
[protocol_id]        uint64
[create_timestamp]   uint64
[expire_timestamp]   uint64
[nonce]              24 bytes
[private_data]       1024 bytes (encrypted)
[timeout_seconds]    uint32
[num_server_addrs]   uint32     [1,32]
[server_addresses]   variable
[client_to_server_key] 32 bytes
[server_to_client_key] 32 bytes
<zero pad to 2048>

3.3 Connect Token Private (1024 bytes, before encryption: 1008 plaintext + 16 MAC)

[client_id]          uint64
[timeout_seconds]    uint32
[num_server_addrs]   uint32
[server_addresses]   variable
[client_to_server_key] 32 bytes
[server_to_client_key] 32 bytes
[user_data]          256 bytes
<zero pad to 1024>

Encryption: XChaCha20-Poly1305 IETF (24-byte nonce, random per token) Associated data: version_info || protocol_id || expire_timestamp

3.4 Challenge Token (300 bytes, before encryption: 284 plaintext + 16 MAC)

[client_id]    uint64
[user_data]    256 bytes
<zero pad to 300>

Encryption: ChaCha20-Poly1305 IETF (96-bit nonce = 32-bit zero || 64-bit sequence) Key: server-side random challenge key, regenerated at server start

3.5 Packet Types (7 total)

ID Name Direction Encrypted Replay Protected Payload Size
0 CONNECTION_REQUEST C->S No (token is encrypted) No 1078 bytes fixed
1 CONNECTION_DENIED S->C Yes No 0 plaintext
2 CONNECTION_CHALLENGE S->C Yes No 8 + 300 = 308
3 CONNECTION_RESPONSE C->S Yes No 8 + 300 = 308
4 CONNECTION_KEEP_ALIVE Both Yes Yes 8 (clientIndex + maxClients)
5 CONNECTION_PAYLOAD Both Yes Yes 1-1200 bytes
6 CONNECTION_DISCONNECT Both Yes Yes 0 plaintext

Encrypted packet wire format:

[prefix_byte]       uint8   = (num_sequence_bytes << 4) | packet_type
[sequence_number]   1-8 bytes little-endian, high zero bytes omitted
[encrypted_payload] variable
[HMAC]              16 bytes

Encryption: ChaCha20-Poly1305 IETF Associated data: version_info || protocol_id || prefix_byte Nonce: 96-bit = 32-bit zero || 64-bit sequence

3.6 Replay Protection

  • Buffer size: 256 entries (power-of-two)
  • Index: sequence & 255 (NOT modulo)
  • received_packet[256] initialized to 0xFFFF_FFFF_FFFF_FFFF
  • Discard if sequence + 256 <= most_recent_sequence
  • Discard if received_packet[index] >= sequence

3.7 Client State Machine

States (int):
  CONNECT_TOKEN_EXPIRED          = -6
  INVALID_CONNECT_TOKEN          = -5
  CONNECTION_TIMED_OUT           = -4
  CONNECTION_RESPONSE_TIMED_OUT  = -3
  CONNECTION_REQUEST_TIMED_OUT   = -2
  CONNECTION_DENIED              = -1
  DISCONNECTED                   =  0  (initial)
  SENDING_CONNECTION_REQUEST     =  1
  SENDING_CONNECTION_RESPONSE    =  2
  CONNECTED                      =  3  (goal)

Transitions driven by client_update(time):

  • Send request/response at 10Hz
  • Transition on received packets: CHALLENGE -> send response; KEEP_ALIVE -> connected
  • Timeout on no response within connect_token.timeout_seconds

3.8 Server Structure

Per-server (max 256 clients):
  client_connected[256]
  client_id[256]
  client_sequence[256]
  client_last_packet_send_time[256]
  client_last_packet_receive_time[256]
  client_user_data[256][256]
  client_replay_protection[256]
  client_packet_queue[256]         (each queue: 256 entries)
  client_address[256]
  encryption_manager               (maps address -> send/receive keys)
  connect_token_entries[2048]      (replay prevention for tokens)
  challenge_key[32]                (per-server, random at start)
  challenge_sequence               (uint64, monotonically increasing)
  global_sequence                  (uint64, per-server packet counter)

3.9 Encryption Manager

Maps IP:port -> (send_key, receive_key, timeout, expire_time, last_access_time). Max capacity: NETCODE_MAX_ENCRYPTION_MAPPINGS (default = NETCODE_MAX_CLIENTS * 2 = 512). Linear scan on address match (acceptable for small N, < 512 entries).


4. Crypto Requirements

Operation Algorithm Library (C) Java Target
Connect token encryption XChaCha20-Poly1305 IETF libsodium crypto_aead_xchacha20poly1305_ietf Bouncy Castle XChaCha20Poly1305
Packet encryption ChaCha20-Poly1305 IETF libsodium crypto_aead_chacha20poly1305_ietf Bouncy Castle ChaCha20Poly1305 (JCA) or JDK 17+ ChaCha20-Poly1305
Random bytes CSPRNG libsodium randombytes_buf SecureRandom (init-time only, NOT on hot path)
Key generation CSPRNG libsodium randombytes_buf Pre-generated at startup with SecureRandom

Note: Crypto operations are NOT on the hot path per packet (only decrypt/encrypt per message). Use pre-allocated byte arrays for keys and nonces. Reuse ChaCha20Poly1305 cipher instances per-thread if possible.


5. Java Package Layout (under net.ztrust.netcode)

net.ztrust.netcode/
  Netcode.java                   - init/term, global constants
  NetcodeAddress.java            - address type (IPv4/IPv6 + port), parse/format/equal
  codec/
    BufferWriter.java            - little-endian write primitives wrapping Agrona MutableDirectBuffer
    BufferReader.java            - little-endian read primitives wrapping Agrona DirectBuffer
    ConnectTokenPrivateCodec.java
    ChallengeTokenCodec.java
    PacketCodec.java             - write/read all 7 packet types
    ConnectTokenCodec.java       - public token read/write
  crypto/
    NetcodeCrypto.java           - encrypt/decrypt wrappers (XChaCha20 and ChaCha20)
    ReplayProtection.java        - 256-entry sliding window
  client/
    ClientState.java             - enum for client states
    NetcodeClient.java           - client state machine
    ClientConfig.java
  server/
    NetcodeServer.java           - server state machine
    ServerConfig.java
    EncryptionManager.java       - address -> key mapping
    ConnectTokenEntryTable.java  - token replay prevention
  transport/
    UdpTransport.java            - NIO DatagramChannel, non-blocking
    TransportOverride.java       - interface for test overrides
  simulator/
    NetworkSimulator.java        - optional in-process latency/loss/jitter
  util/
    ConnectTokenGenerator.java   - generate connect tokens (web backend side)
    NanoClock.java               - clock interface
    CachedEpochClock.java        - updated once per agent loop

6. Performance-Critical Mapping (Copilot Instructions Compliance)

6.1 Buffer Strategy

  • All packet buffers: pre-allocated UnsafeBuffer (Agrona off-heap) at startup.
  • BufferWriter / BufferReader: flyweights calling buf.putByte/putShort/putInt/putLong with explicit little-endian byte order (Agrona uses ByteOrder.LITTLE_ENDIAN by default for primitives).
  • Zero allocation on encode, decode, publish, receive paths.
  • Payload packets: use a preallocated pool (object pool of PayloadPacket wrappers over a fixed UnsafeBuffer).

6.2 Collections

C Java
received_packet[256] (replay buffer) long[] array, size 256
packet_queue (circular, 256 slots) long[] (sequences) + DirectBuffer[] (pre-allocated payloads)
encryption_manager arrays Parallel primitive arrays: int[], long[], byte[] (keys flat array)
connect_token_entries Flat byte[] for MACs, NetcodeAddress[] pre-allocated
Client slot arrays on server Parallel int[], long[], double[] by client index
  • No HashMap, LinkedList, ArrayList<Long> anywhere.
  • encryption_manager is small (<=512 entries), linear scan is O(n) with n bounded and cache-friendly.

6.3 Ring Index

All circular indices use seq & (capacity - 1) not seq % capacity.

6.4 Time

  • CachedEpochClock updated once per Agent.doWork() tick - never call System.currentTimeMillis() per packet.
  • Netcode.time() in C maps to clock.time() injection.

6.5 Threading Model

  • Single-writer principle: NetcodeClient owned by one client-side agent thread, NetcodeServer owned by one server agent thread.
  • Use Agrona Agent + AgentRunner pattern.
  • No synchronized, ReentrantLock, or BlockingQueue.
  • If payload packets cross to application thread: Agrona OneToOneRingBuffer or Disruptor RingBuffer.

6.6 Crypto

  • SecureRandom called only at startup (key generation, nonce generation).
  • Cipher instances pre-allocated per-thread (avoid re-init inside hot path).
  • Nonce byte arrays pre-allocated (12 bytes for ChaCha20, 24 bytes for XChaCha20).

6.7 Logging

  • No SLF4J on hot path.
  • Error recording via Agrona DistinctErrorLog.
  • Debug logging via conditional AtomicCounter increments only.

7. Dependency Plan

Add to build.gradle.kts:

// Agrona - off-heap buffers, collections, agent runner
implementation("org.agrona:agrona:1.21.2")

// LMAX Disruptor - ring buffer inter-thread messaging
implementation("com.lmax:disruptor:4.0.0")

// Bouncy Castle - XChaCha20-Poly1305 and ChaCha20-Poly1305
implementation("org.bouncycastle:bcprov-jdk18on:1.78.1")

// Java Thread Affinity - CPU pinning for hot agents
implementation("net.openhft:Java-Thread-Affinity:2.21ea1")

// HdrHistogram - latency telemetry
implementation("org.hdrhistogram:HdrHistogram:2.2.2")

8. Conversion Work Breakdown

Phase 1 - Foundation (No network, no crypto)

Task Files Notes
Project setup build.gradle.kts, gradle.properties Add deps, JDK 21, Spotless, Checkstyle, JMH module
Constants Netcode.java All #define constants as static final int/long/byte[]
Address type NetcodeAddress.java IPv4/IPv6 + port, parse, format, equals
Buffer I/O BufferWriter.java, BufferReader.java Little-endian primitives over Agrona MutableDirectBuffer
Tests: buffer round-trip JUnit 5 Property test: write then read == original

Phase 2 - Tokens and Codec

Task Files Notes
Crypto wrappers NetcodeCrypto.java XChaCha20 (24-byte nonce) and ChaCha20 (12-byte nonce) AEAD encrypt/decrypt
ConnectToken private ConnectTokenPrivateCodec.java generate, write, encrypt, decrypt, read
ChallengeToken ChallengeTokenCodec.java write, encrypt, decrypt, read
All 7 packets PacketCodec.java write and read
ConnectToken public ConnectTokenCodec.java full 2048-byte token write/read
ReplayProtection ReplayProtection.java 256-entry sliding window with long[]
ConnectTokenGenerator ConnectTokenGenerator.java Public API, mirrors netcode_generate_connect_token
Tests JUnit 5 Round-trip encode/decode tests for each token and packet type
Benchmark JMH - TokenPacketCodecBenchmark Token decode < 100 ns, packet decode < 100 ns

Phase 3 - Transport

Task Files Notes
UDP transport UdpTransport.java Non-blocking DatagramChannel + Selector, IPv4 and IPv6
Transport interface TransportOverride.java For in-process test overrides (mirrors override_send_and_receive)
NetworkSimulator NetworkSimulator.java Optional latency/jitter/loss (test use only)
Tests JUnit 5 integration Loopback in-process (no real network in unit tests)

Phase 4 - Server

Task Files Notes
EncryptionManager EncryptionManager.java Parallel arrays: address[], send_key[], receive_key[], timeout[], expire_time[], last_access_time[]
ConnectTokenEntryTable ConnectTokenEntryTable.java 2048-entry MAC + address table
NetcodeServer NetcodeServer.java Full server state machine (start, update, stop, send/receive per client)
ServerConfig ServerConfig.java protocol_id, private_key, callbacks
Tests JUnit 5 Server connect/disconnect/timeout scenarios
Benchmark JMH Server update loop throughput

Phase 5 - Client

Task Files Notes
ClientState ClientState.java Enum mirroring C defines
NetcodeClient NetcodeClient.java Full client state machine
ClientConfig ClientConfig.java Callbacks, override hooks
Tests JUnit 5 Client connect flow with in-process server
End-to-end test Integration client_server.c logic ported to Java test
Benchmark JMH End-to-end IPC p50 < 5us, p99 < 15us

Phase 6 - Agent Integration

Task Files Notes
ClientAgent ClientAgent.java Agrona Agent wrapping NetcodeClient.update()
ServerAgent ServerAgent.java Agrona Agent wrapping NetcodeServer.update()
CachedEpochClock util/CachedEpochClock.java Updated once per doWork() tick
CPU affinity AgentLauncher.java Pin hot agents via OpenHFT Thread Affinity
Soak test Soak test class Assert zero allocation in steady state

9. Wire Compatibility Constraints

The Java implementation MUST be byte-compatible with the C reference implementation:

  • Same little-endian encoding of all integers
  • Same packet prefix byte encoding (type in low nibble, seq bytes in high nibble)
  • Same nonce construction (32-bit zero pad || 64-bit sequence for ChaCha20; 24-byte random for XChaCha20)
  • Same associated data construction for AEAD
  • Same connect token structure, byte-for-byte

Replay tests: record a session from the C reference, replay in Java, assert byte-identical output.


10. Key Risks and Mitigations

Risk Mitigation
XChaCha20-Poly1305 not in JDK Use Bouncy Castle 1.78+; benchmark overhead vs JDK native ChaCha20-Poly1305
GC pressure from crypto allocations Pre-allocate key and nonce byte arrays; reuse cipher objects per-thread
DatagramChannel receive latency Use select() with zero timeout for non-blocking reads; consider kernel-bypass (future)
Replay protection correctness Exact port of C logic with 256-entry long[]; property test with random sequences
Token expiry timestamp drift Inject EpochClock - do NOT call System.currentTimeMillis() per packet
Loopback / in-process client-server for tests Implement TransportOverride interface (mirrors C's override_send_and_receive)
Connect token token replay across restarts In-memory table; document: does not survive server restart (same as C reference)

11. Constants Reference

// Buffer sizes
CONNECT_TOKEN_BYTES         = 2048
KEY_BYTES                   = 32
MAC_BYTES                   = 16
USER_DATA_BYTES             = 256
MAX_SERVERS_PER_CONNECT     = 32
CONNECT_TOKEN_NONCE_BYTES   = 24
CONNECT_TOKEN_PRIVATE_BYTES = 1024
CHALLENGE_TOKEN_BYTES       = 300
VERSION_INFO_BYTES          = 13
VERSION_INFO                = "NETCODE 1.02\0"  (13 bytes)
MAX_PACKET_BYTES            = 1300
MAX_PAYLOAD_BYTES           = 1200
PACKET_QUEUE_SIZE           = 256   // power-of-two
REPLAY_PROTECTION_BUFFER    = 256   // power-of-two
MAX_CLIENTS                 = 256
CLIENT_MAX_RECEIVE_PACKETS  = 64
SERVER_MAX_RECEIVE_PACKETS  = 64 * 256

// Send rates
PACKET_SEND_RATE            = 10.0  // Hz
NUM_DISCONNECT_PACKETS      = 10    // redundant disconnect sends

// Socket buffers
CLIENT_SOCKET_SNDBUF        = 256 * 1024
CLIENT_SOCKET_RCVBUF        = 256 * 1024
SERVER_SOCKET_SNDBUF        = 4 * 1024 * 1024
SERVER_SOCKET_RCVBUF        = 4 * 1024 * 1024

12. Mermaid - Client State Machine

stateDiagram-v2
    [*] --> DISCONNECTED

    DISCONNECTED --> SENDING_CONNECTION_REQUEST : connect(token)
    DISCONNECTED --> INVALID_CONNECT_TOKEN : bad token

    SENDING_CONNECTION_REQUEST --> SENDING_CONNECTION_RESPONSE : recv CHALLENGE
    SENDING_CONNECTION_REQUEST --> CONNECTION_REQUEST_TIMED_OUT : timeout
    SENDING_CONNECTION_REQUEST --> CONNECTION_DENIED : recv DENIED
    SENDING_CONNECTION_REQUEST --> CONNECT_TOKEN_EXPIRED : token expired
    SENDING_CONNECTION_REQUEST --> SENDING_CONNECTION_REQUEST : try next server

    SENDING_CONNECTION_RESPONSE --> CONNECTED : recv KEEP_ALIVE
    SENDING_CONNECTION_RESPONSE --> CONNECTION_RESPONSE_TIMED_OUT : timeout
    SENDING_CONNECTION_RESPONSE --> CONNECTION_DENIED : recv DENIED
    SENDING_CONNECTION_RESPONSE --> CONNECT_TOKEN_EXPIRED : token expired

    CONNECTED --> DISCONNECTED : recv DISCONNECT or app disconnect
    CONNECTED --> CONNECTION_TIMED_OUT : no packet for timeout_seconds
Loading

13. Mermaid - Server Connection Request Flow

sequenceDiagram
    participant C as Client
    participant S as Server
    participant B as Web Backend

    B->>C: connect_token (2048 bytes, HTTPS)
    C->>S: CONNECTION_REQUEST (1078 bytes, unencrypted)
    S->>S: decrypt private token, validate
    S->>S: add encryption mapping (addr -> keys)
    S->>C: CONNECTION_CHALLENGE (encrypted, 308 bytes)
    C->>S: CONNECTION_RESPONSE (encrypted, 308 bytes)
    S->>S: decrypt challenge token, assign client slot
    S->>C: CONNECTION_KEEP_ALIVE (encrypted, 8 bytes)
    C->>S: CONNECTION_PAYLOAD / KEEP_ALIVE (steady state)
    S->>C: CONNECTION_PAYLOAD / KEEP_ALIVE (steady state)
Loading

14. Implementation Notes

14.1 Crypto Latency (Phase 2 observation)

Problem: Bouncy Castle ChaCha20Poly1305.init() allocates a KeyParameter and AEADParameters object on every encrypt/decrypt call. This is unavoidable with the standard BC lightweight API.

Measured overhead (quickBench, laptop, JDK 21.0.11-ea, TokenPacketCodecBenchmark):

Operation Measured Path
Challenge token decrypt ~2,286 ns handshake only
Packet encode (256-byte payload) ~1,706 ns per payload packet
Packet decode + encrypt (round-trip) ~4,327 ns per payload packet
Keep-alive packet encode ~485 ns steady-state per client
Connect token private decrypt ~6,395 ns once per connect
XChaCha20 token generation ~5,921 ns web backend only

Note: All benchmark methods are allocation-free in steady state. Previous measurements were inflated by Arrays.copyOf() and new boolean[] / new ReadResult() inside the benchmark body. Fixed in TokenPacketCodecBenchmark by pre-allocating all work buffers at @Setup(Level.Trial).

Why this is acceptable for Phase 2:

  • The design doc states crypto is "NOT on the hot path per packet" for handshake operations (connection request/challenge/response). These paths run at most once per client connection.
  • For steady-state payload packets (CONNECTION_PAYLOAD, CONNECTION_KEEP_ALIVE), the ~1,706 ns encode cost is dominated by the BC KeyParameter allocation. This sits above the < 80 ns ring buffer publish target but is acceptable for a software-crypto baseline.

Resolution (ADR-001, 2026-06-08):

DirectChacha20Poly1305Engine is now the default engine. new NetcodeCrypto() (used by both NetcodeServer and NetcodeClient) constructs a DirectChacha20Poly1305Engine - zero allocation in steady state. See docs/decisions/adr-001-crypto-engine-default-and-codec-budget.md.

  1. DirectChacha20Poly1305Engine (default): zero allocation, pure Java RFC 8439.
    • 256-byte encrypt ~954 ns, decrypt ~371 ns.
  2. JdkChacha20Poly1305Engine (via NetcodeCrypto.withJdkEngine()): one IvParameterSpec alloc per call, faster for large (1200-byte) payloads: 1200-byte ~2,233 ns vs direct ~3,482 ns.
  3. Kernel-bypass crypto (DPDK + hardware AES-GCM offload): deferred to separate ADR.

Crypto engine comparison (quickBench, CryptoEngineBenchmark, JDK 21):

Engine 8-byte encrypt 256-byte encrypt 1200-byte encrypt 256-byte decrypt
BC (BcChacha20Poly1305Engine) 445 ns 1,722 ns 5,985 ns 2,385 ns
Direct (DirectChacha20Poly1305Engine) 281 ns 856 ns 3,425 ns 336 ns
JDK (JdkChacha20Poly1305Engine) 488 ns 794 ns 2,181 ns 1,434 ns

XChaCha20-Poly1305 implementation note: BC 1.78.1 does not include a standalone XChaCha20Poly1305 AEAD mode class. The implementation uses HChaCha20 (the ChaCha20 core, 20 rounds, no final state addition) to derive a 32-byte subkey from key and nonce[0:16], then delegates to ChaCha20Poly1305 with the subkey and [0x00000000 || nonce[16:24]] as the 12-byte nonce. This matches libsodium crypto_aead_xchacha20poly1305_ietf exactly. Verified against the RFC draft test vector (key 0x0001...1f, expected output 0x82413b42...).