netcode is a UDP-based, encrypted, connection-oriented client/server protocol (version 1.02).
It is the reference implementation in C (~9000 lines, single file netcode.c) by Glenn Fiedler / Mas Bandwidth LLC.
Key security goals:
- Protect against zombie clients, MITM, DDoS amplification, replay attacks
- Authentication via signed connect tokens issued by a web backend
- All packets (except connection request) are encrypted with ChaCha20-Poly1305
| C Layer | Lines | Responsibility |
|---|---|---|
| Byte I/O primitives | ~100 | Little-endian read/write uint8/16/32/64 |
| Address | ~180 | Parse/format/compare IPv4 and IPv6 + port |
| Socket | ~320 | Non-blocking UDP socket: create, send, recv |
| Crypto | ~80 | Wrap libsodium XChaCha20-Poly1305 (big nonce) and ChaCha20-Poly1305 (IETF, 96-bit nonce) |
| Connect Token (private) | ~200 | Serialize, encrypt, decrypt, read the 1024-byte private token |
| Challenge Token | ~100 | Serialize, encrypt (chacha20), read the 300-byte challenge |
| Packets | ~700 | Write/read all 7 packet types; replay protection |
| Connect Token (public) | ~150 | Serialize/deserialize the full 2048-byte public token |
| Packet Queue | ~80 | Fixed-size circular queue of received payload packets (size 256) |
| Network Simulator | ~200 | Optional in-process latency/loss/jitter/duplicate simulation |
| Encryption Manager | ~200 | Map IP:port -> send/receive key pairs; timeout-based eviction |
| Connect Token Entry Table | ~80 | Replay prevention for connect tokens (MAC + address table) |
| Client state machine | ~1200 | Full client lifecycle, UDP send/recv, state transitions |
| Server | ~2000 | Full server lifecycle, slot management, per-client state |
| Generate Connect Token | ~100 | Public API to generate tokens (used by web backend) |
| Tests | ~3000 | In-process tests covering all modules |
netcode_address_t:
type : uint8 (0=NONE, 1=IPv4, 2=IPv6)
data : union { uint8[4] ipv4 | uint16[8] ipv6 }
port : uint16
Java mapping: NetcodeAddress (value type / plain class, no allocation on hot path - stack-allocated via pre-allocated instances)
[version_info] 13 bytes "NETCODE 1.02\0"
[protocol_id] uint64
[create_timestamp] uint64
[expire_timestamp] uint64
[nonce] 24 bytes
[private_data] 1024 bytes (encrypted)
[timeout_seconds] uint32
[num_server_addrs] uint32 [1,32]
[server_addresses] variable
[client_to_server_key] 32 bytes
[server_to_client_key] 32 bytes
<zero pad to 2048>
[client_id] uint64
[timeout_seconds] uint32
[num_server_addrs] uint32
[server_addresses] variable
[client_to_server_key] 32 bytes
[server_to_client_key] 32 bytes
[user_data] 256 bytes
<zero pad to 1024>
Encryption: XChaCha20-Poly1305 IETF (24-byte nonce, random per token) Associated data: version_info || protocol_id || expire_timestamp
[client_id] uint64
[user_data] 256 bytes
<zero pad to 300>
Encryption: ChaCha20-Poly1305 IETF (96-bit nonce = 32-bit zero || 64-bit sequence) Key: server-side random challenge key, regenerated at server start
| ID | Name | Direction | Encrypted | Replay Protected | Payload Size |
|---|---|---|---|---|---|
| 0 | CONNECTION_REQUEST | C->S | No (token is encrypted) | No | 1078 bytes fixed |
| 1 | CONNECTION_DENIED | S->C | Yes | No | 0 plaintext |
| 2 | CONNECTION_CHALLENGE | S->C | Yes | No | 8 + 300 = 308 |
| 3 | CONNECTION_RESPONSE | C->S | Yes | No | 8 + 300 = 308 |
| 4 | CONNECTION_KEEP_ALIVE | Both | Yes | Yes | 8 (clientIndex + maxClients) |
| 5 | CONNECTION_PAYLOAD | Both | Yes | Yes | 1-1200 bytes |
| 6 | CONNECTION_DISCONNECT | Both | Yes | Yes | 0 plaintext |
Encrypted packet wire format:
[prefix_byte] uint8 = (num_sequence_bytes << 4) | packet_type
[sequence_number] 1-8 bytes little-endian, high zero bytes omitted
[encrypted_payload] variable
[HMAC] 16 bytes
Encryption: ChaCha20-Poly1305 IETF Associated data: version_info || protocol_id || prefix_byte Nonce: 96-bit = 32-bit zero || 64-bit sequence
- Buffer size: 256 entries (power-of-two)
- Index:
sequence & 255(NOT modulo) received_packet[256]initialized to0xFFFF_FFFF_FFFF_FFFF- Discard if
sequence + 256 <= most_recent_sequence - Discard if
received_packet[index] >= sequence
States (int):
CONNECT_TOKEN_EXPIRED = -6
INVALID_CONNECT_TOKEN = -5
CONNECTION_TIMED_OUT = -4
CONNECTION_RESPONSE_TIMED_OUT = -3
CONNECTION_REQUEST_TIMED_OUT = -2
CONNECTION_DENIED = -1
DISCONNECTED = 0 (initial)
SENDING_CONNECTION_REQUEST = 1
SENDING_CONNECTION_RESPONSE = 2
CONNECTED = 3 (goal)
Transitions driven by client_update(time):
- Send request/response at 10Hz
- Transition on received packets: CHALLENGE -> send response; KEEP_ALIVE -> connected
- Timeout on no response within
connect_token.timeout_seconds
Per-server (max 256 clients):
client_connected[256]
client_id[256]
client_sequence[256]
client_last_packet_send_time[256]
client_last_packet_receive_time[256]
client_user_data[256][256]
client_replay_protection[256]
client_packet_queue[256] (each queue: 256 entries)
client_address[256]
encryption_manager (maps address -> send/receive keys)
connect_token_entries[2048] (replay prevention for tokens)
challenge_key[32] (per-server, random at start)
challenge_sequence (uint64, monotonically increasing)
global_sequence (uint64, per-server packet counter)
Maps IP:port -> (send_key, receive_key, timeout, expire_time, last_access_time).
Max capacity: NETCODE_MAX_ENCRYPTION_MAPPINGS (default = NETCODE_MAX_CLIENTS * 2 = 512).
Linear scan on address match (acceptable for small N, < 512 entries).
| Operation | Algorithm | Library (C) | Java Target |
|---|---|---|---|
| Connect token encryption | XChaCha20-Poly1305 IETF | libsodium crypto_aead_xchacha20poly1305_ietf |
Bouncy Castle XChaCha20Poly1305 |
| Packet encryption | ChaCha20-Poly1305 IETF | libsodium crypto_aead_chacha20poly1305_ietf |
Bouncy Castle ChaCha20Poly1305 (JCA) or JDK 17+ ChaCha20-Poly1305 |
| Random bytes | CSPRNG | libsodium randombytes_buf |
SecureRandom (init-time only, NOT on hot path) |
| Key generation | CSPRNG | libsodium randombytes_buf |
Pre-generated at startup with SecureRandom |
Note: Crypto operations are NOT on the hot path per packet (only decrypt/encrypt per message). Use pre-allocated byte arrays for keys and nonces. Reuse ChaCha20Poly1305 cipher instances per-thread if possible.
net.ztrust.netcode/
Netcode.java - init/term, global constants
NetcodeAddress.java - address type (IPv4/IPv6 + port), parse/format/equal
codec/
BufferWriter.java - little-endian write primitives wrapping Agrona MutableDirectBuffer
BufferReader.java - little-endian read primitives wrapping Agrona DirectBuffer
ConnectTokenPrivateCodec.java
ChallengeTokenCodec.java
PacketCodec.java - write/read all 7 packet types
ConnectTokenCodec.java - public token read/write
crypto/
NetcodeCrypto.java - encrypt/decrypt wrappers (XChaCha20 and ChaCha20)
ReplayProtection.java - 256-entry sliding window
client/
ClientState.java - enum for client states
NetcodeClient.java - client state machine
ClientConfig.java
server/
NetcodeServer.java - server state machine
ServerConfig.java
EncryptionManager.java - address -> key mapping
ConnectTokenEntryTable.java - token replay prevention
transport/
UdpTransport.java - NIO DatagramChannel, non-blocking
TransportOverride.java - interface for test overrides
simulator/
NetworkSimulator.java - optional in-process latency/loss/jitter
util/
ConnectTokenGenerator.java - generate connect tokens (web backend side)
NanoClock.java - clock interface
CachedEpochClock.java - updated once per agent loop
- All packet buffers: pre-allocated
UnsafeBuffer(Agrona off-heap) at startup. BufferWriter/BufferReader: flyweights callingbuf.putByte/putShort/putInt/putLongwith explicit little-endian byte order (Agrona usesByteOrder.LITTLE_ENDIANby default for primitives).- Zero allocation on
encode,decode,publish,receivepaths. - Payload packets: use a preallocated pool (object pool of
PayloadPacketwrappers over a fixedUnsafeBuffer).
| C | Java |
|---|---|
received_packet[256] (replay buffer) |
long[] array, size 256 |
packet_queue (circular, 256 slots) |
long[] (sequences) + DirectBuffer[] (pre-allocated payloads) |
encryption_manager arrays |
Parallel primitive arrays: int[], long[], byte[] (keys flat array) |
connect_token_entries |
Flat byte[] for MACs, NetcodeAddress[] pre-allocated |
| Client slot arrays on server | Parallel int[], long[], double[] by client index |
- No
HashMap,LinkedList,ArrayList<Long>anywhere. encryption_manageris small (<=512 entries), linear scan is O(n) with n bounded and cache-friendly.
All circular indices use seq & (capacity - 1) not seq % capacity.
CachedEpochClockupdated once perAgent.doWork()tick - never callSystem.currentTimeMillis()per packet.Netcode.time()in C maps toclock.time()injection.
- Single-writer principle:
NetcodeClientowned by one client-side agent thread,NetcodeServerowned by one server agent thread. - Use Agrona
Agent+AgentRunnerpattern. - No
synchronized,ReentrantLock, orBlockingQueue. - If payload packets cross to application thread: Agrona
OneToOneRingBufferor DisruptorRingBuffer.
SecureRandomcalled only at startup (key generation, nonce generation).- Cipher instances pre-allocated per-thread (avoid re-init inside hot path).
- Nonce byte arrays pre-allocated (12 bytes for ChaCha20, 24 bytes for XChaCha20).
- No SLF4J on hot path.
- Error recording via Agrona
DistinctErrorLog. - Debug logging via conditional
AtomicCounterincrements only.
Add to build.gradle.kts:
// Agrona - off-heap buffers, collections, agent runner
implementation("org.agrona:agrona:1.21.2")
// LMAX Disruptor - ring buffer inter-thread messaging
implementation("com.lmax:disruptor:4.0.0")
// Bouncy Castle - XChaCha20-Poly1305 and ChaCha20-Poly1305
implementation("org.bouncycastle:bcprov-jdk18on:1.78.1")
// Java Thread Affinity - CPU pinning for hot agents
implementation("net.openhft:Java-Thread-Affinity:2.21ea1")
// HdrHistogram - latency telemetry
implementation("org.hdrhistogram:HdrHistogram:2.2.2")| Task | Files | Notes |
|---|---|---|
| Project setup | build.gradle.kts, gradle.properties |
Add deps, JDK 21, Spotless, Checkstyle, JMH module |
| Constants | Netcode.java |
All #define constants as static final int/long/byte[] |
| Address type | NetcodeAddress.java |
IPv4/IPv6 + port, parse, format, equals |
| Buffer I/O | BufferWriter.java, BufferReader.java |
Little-endian primitives over Agrona MutableDirectBuffer |
| Tests: buffer round-trip | JUnit 5 | Property test: write then read == original |
| Task | Files | Notes |
|---|---|---|
| Crypto wrappers | NetcodeCrypto.java |
XChaCha20 (24-byte nonce) and ChaCha20 (12-byte nonce) AEAD encrypt/decrypt |
| ConnectToken private | ConnectTokenPrivateCodec.java |
generate, write, encrypt, decrypt, read |
| ChallengeToken | ChallengeTokenCodec.java |
write, encrypt, decrypt, read |
| All 7 packets | PacketCodec.java |
write and read |
| ConnectToken public | ConnectTokenCodec.java |
full 2048-byte token write/read |
| ReplayProtection | ReplayProtection.java |
256-entry sliding window with long[] |
| ConnectTokenGenerator | ConnectTokenGenerator.java |
Public API, mirrors netcode_generate_connect_token |
| Tests | JUnit 5 | Round-trip encode/decode tests for each token and packet type |
| Benchmark | JMH - TokenPacketCodecBenchmark |
Token decode < 100 ns, packet decode < 100 ns |
| Task | Files | Notes |
|---|---|---|
| UDP transport | UdpTransport.java |
Non-blocking DatagramChannel + Selector, IPv4 and IPv6 |
| Transport interface | TransportOverride.java |
For in-process test overrides (mirrors override_send_and_receive) |
| NetworkSimulator | NetworkSimulator.java |
Optional latency/jitter/loss (test use only) |
| Tests | JUnit 5 integration | Loopback in-process (no real network in unit tests) |
| Task | Files | Notes |
|---|---|---|
| EncryptionManager | EncryptionManager.java |
Parallel arrays: address[], send_key[], receive_key[], timeout[], expire_time[], last_access_time[] |
| ConnectTokenEntryTable | ConnectTokenEntryTable.java |
2048-entry MAC + address table |
| NetcodeServer | NetcodeServer.java |
Full server state machine (start, update, stop, send/receive per client) |
| ServerConfig | ServerConfig.java |
protocol_id, private_key, callbacks |
| Tests | JUnit 5 | Server connect/disconnect/timeout scenarios |
| Benchmark | JMH | Server update loop throughput |
| Task | Files | Notes |
|---|---|---|
| ClientState | ClientState.java |
Enum mirroring C defines |
| NetcodeClient | NetcodeClient.java |
Full client state machine |
| ClientConfig | ClientConfig.java |
Callbacks, override hooks |
| Tests | JUnit 5 | Client connect flow with in-process server |
| End-to-end test | Integration | client_server.c logic ported to Java test |
| Benchmark | JMH | End-to-end IPC p50 < 5us, p99 < 15us |
| Task | Files | Notes |
|---|---|---|
| ClientAgent | ClientAgent.java |
Agrona Agent wrapping NetcodeClient.update() |
| ServerAgent | ServerAgent.java |
Agrona Agent wrapping NetcodeServer.update() |
| CachedEpochClock | util/CachedEpochClock.java |
Updated once per doWork() tick |
| CPU affinity | AgentLauncher.java |
Pin hot agents via OpenHFT Thread Affinity |
| Soak test | Soak test class | Assert zero allocation in steady state |
The Java implementation MUST be byte-compatible with the C reference implementation:
- Same little-endian encoding of all integers
- Same packet prefix byte encoding (type in low nibble, seq bytes in high nibble)
- Same nonce construction (32-bit zero pad || 64-bit sequence for ChaCha20; 24-byte random for XChaCha20)
- Same associated data construction for AEAD
- Same connect token structure, byte-for-byte
Replay tests: record a session from the C reference, replay in Java, assert byte-identical output.
| Risk | Mitigation |
|---|---|
| XChaCha20-Poly1305 not in JDK | Use Bouncy Castle 1.78+; benchmark overhead vs JDK native ChaCha20-Poly1305 |
| GC pressure from crypto allocations | Pre-allocate key and nonce byte arrays; reuse cipher objects per-thread |
DatagramChannel receive latency |
Use select() with zero timeout for non-blocking reads; consider kernel-bypass (future) |
| Replay protection correctness | Exact port of C logic with 256-entry long[]; property test with random sequences |
| Token expiry timestamp drift | Inject EpochClock - do NOT call System.currentTimeMillis() per packet |
| Loopback / in-process client-server for tests | Implement TransportOverride interface (mirrors C's override_send_and_receive) |
| Connect token token replay across restarts | In-memory table; document: does not survive server restart (same as C reference) |
// Buffer sizes
CONNECT_TOKEN_BYTES = 2048
KEY_BYTES = 32
MAC_BYTES = 16
USER_DATA_BYTES = 256
MAX_SERVERS_PER_CONNECT = 32
CONNECT_TOKEN_NONCE_BYTES = 24
CONNECT_TOKEN_PRIVATE_BYTES = 1024
CHALLENGE_TOKEN_BYTES = 300
VERSION_INFO_BYTES = 13
VERSION_INFO = "NETCODE 1.02\0" (13 bytes)
MAX_PACKET_BYTES = 1300
MAX_PAYLOAD_BYTES = 1200
PACKET_QUEUE_SIZE = 256 // power-of-two
REPLAY_PROTECTION_BUFFER = 256 // power-of-two
MAX_CLIENTS = 256
CLIENT_MAX_RECEIVE_PACKETS = 64
SERVER_MAX_RECEIVE_PACKETS = 64 * 256
// Send rates
PACKET_SEND_RATE = 10.0 // Hz
NUM_DISCONNECT_PACKETS = 10 // redundant disconnect sends
// Socket buffers
CLIENT_SOCKET_SNDBUF = 256 * 1024
CLIENT_SOCKET_RCVBUF = 256 * 1024
SERVER_SOCKET_SNDBUF = 4 * 1024 * 1024
SERVER_SOCKET_RCVBUF = 4 * 1024 * 1024stateDiagram-v2
[*] --> DISCONNECTED
DISCONNECTED --> SENDING_CONNECTION_REQUEST : connect(token)
DISCONNECTED --> INVALID_CONNECT_TOKEN : bad token
SENDING_CONNECTION_REQUEST --> SENDING_CONNECTION_RESPONSE : recv CHALLENGE
SENDING_CONNECTION_REQUEST --> CONNECTION_REQUEST_TIMED_OUT : timeout
SENDING_CONNECTION_REQUEST --> CONNECTION_DENIED : recv DENIED
SENDING_CONNECTION_REQUEST --> CONNECT_TOKEN_EXPIRED : token expired
SENDING_CONNECTION_REQUEST --> SENDING_CONNECTION_REQUEST : try next server
SENDING_CONNECTION_RESPONSE --> CONNECTED : recv KEEP_ALIVE
SENDING_CONNECTION_RESPONSE --> CONNECTION_RESPONSE_TIMED_OUT : timeout
SENDING_CONNECTION_RESPONSE --> CONNECTION_DENIED : recv DENIED
SENDING_CONNECTION_RESPONSE --> CONNECT_TOKEN_EXPIRED : token expired
CONNECTED --> DISCONNECTED : recv DISCONNECT or app disconnect
CONNECTED --> CONNECTION_TIMED_OUT : no packet for timeout_seconds
sequenceDiagram
participant C as Client
participant S as Server
participant B as Web Backend
B->>C: connect_token (2048 bytes, HTTPS)
C->>S: CONNECTION_REQUEST (1078 bytes, unencrypted)
S->>S: decrypt private token, validate
S->>S: add encryption mapping (addr -> keys)
S->>C: CONNECTION_CHALLENGE (encrypted, 308 bytes)
C->>S: CONNECTION_RESPONSE (encrypted, 308 bytes)
S->>S: decrypt challenge token, assign client slot
S->>C: CONNECTION_KEEP_ALIVE (encrypted, 8 bytes)
C->>S: CONNECTION_PAYLOAD / KEEP_ALIVE (steady state)
S->>C: CONNECTION_PAYLOAD / KEEP_ALIVE (steady state)
Problem: Bouncy Castle ChaCha20Poly1305.init() allocates a KeyParameter and AEADParameters
object on every encrypt/decrypt call. This is unavoidable with the standard BC lightweight API.
Measured overhead (quickBench, laptop, JDK 21.0.11-ea, TokenPacketCodecBenchmark):
| Operation | Measured | Path |
|---|---|---|
| Challenge token decrypt | ~2,286 ns | handshake only |
| Packet encode (256-byte payload) | ~1,706 ns | per payload packet |
| Packet decode + encrypt (round-trip) | ~4,327 ns | per payload packet |
| Keep-alive packet encode | ~485 ns | steady-state per client |
| Connect token private decrypt | ~6,395 ns | once per connect |
| XChaCha20 token generation | ~5,921 ns | web backend only |
Note: All benchmark methods are allocation-free in steady state. Previous measurements were inflated by
Arrays.copyOf()andnew boolean[]/new ReadResult()inside the benchmark body. Fixed inTokenPacketCodecBenchmarkby pre-allocating all work buffers at@Setup(Level.Trial).
Why this is acceptable for Phase 2:
- The design doc states crypto is "NOT on the hot path per packet" for handshake operations (connection request/challenge/response). These paths run at most once per client connection.
- For steady-state payload packets (
CONNECTION_PAYLOAD,CONNECTION_KEEP_ALIVE), the ~1,706 ns encode cost is dominated by the BCKeyParameterallocation. This sits above the < 80 ns ring buffer publish target but is acceptable for a software-crypto baseline.
Resolution (ADR-001, 2026-06-08):
DirectChacha20Poly1305Engine is now the default engine. new NetcodeCrypto() (used by both
NetcodeServer and NetcodeClient) constructs a DirectChacha20Poly1305Engine - zero allocation
in steady state. See docs/decisions/adr-001-crypto-engine-default-and-codec-budget.md.
DirectChacha20Poly1305Engine(default): zero allocation, pure Java RFC 8439.- 256-byte encrypt ~954 ns, decrypt ~371 ns.
JdkChacha20Poly1305Engine(viaNetcodeCrypto.withJdkEngine()): oneIvParameterSpecalloc per call, faster for large (1200-byte) payloads: 1200-byte ~2,233 ns vs direct ~3,482 ns.- Kernel-bypass crypto (DPDK + hardware AES-GCM offload): deferred to separate ADR.
Crypto engine comparison (quickBench, CryptoEngineBenchmark, JDK 21):
| Engine | 8-byte encrypt | 256-byte encrypt | 1200-byte encrypt | 256-byte decrypt |
|---|---|---|---|---|
BC (BcChacha20Poly1305Engine) |
445 ns | 1,722 ns | 5,985 ns | 2,385 ns |
Direct (DirectChacha20Poly1305Engine) |
281 ns | 856 ns | 3,425 ns | 336 ns |
JDK (JdkChacha20Poly1305Engine) |
488 ns | 794 ns | 2,181 ns | 1,434 ns |
XChaCha20-Poly1305 implementation note:
BC 1.78.1 does not include a standalone XChaCha20Poly1305 AEAD mode class. The implementation
uses HChaCha20 (the ChaCha20 core, 20 rounds, no final state addition) to derive a 32-byte
subkey from key and nonce[0:16], then delegates to ChaCha20Poly1305 with the subkey and
[0x00000000 || nonce[16:24]] as the 12-byte nonce. This matches libsodium
crypto_aead_xchacha20poly1305_ietf exactly. Verified against the RFC draft test vector
(key 0x0001...1f, expected output 0x82413b42...).