PR18: GGML Import & Pipeline Modules by agibsonccc · Pull Request #10451 · deeplearning4j/deeplearning4j

agibsonccc · 2026-06-15T02:26:21Z

Summary

PR 18 of 22 PRs in the ag_new_release_updates_2 branch split. [Merge Layer 6]

GGUF format support: GGUFReader handles magic 0x46554747, v1/v2/v3 binary layout; GGMLFormatDetector distinguishes GGUF from legacy GGML via 4-byte magic
18 architecture handlers: map GGUF tensor name patterns to SameDiff variables for LLaMA 1/2/3/4, Gemma 2/3, Mistral, Phi-3/3.5, ChatGLM, IBM Granite, LFM2 (SSM hybrid), Nemotron, OLMo, OpenELM, SmolVLM2, Qwen3-VL, MiniCPM-V, Whisper, and a generic fallback
Full quantization codec suite: byte-exact dequantizers for all GGML formats — Q4_0 through Q8_K, k-quant super-blocks (Q2_K through Q6_K), importance-matrix quants (IQ1–IQ4), and ternary formats (TQ1_0, TQ2_0)
Round-trip export: quantizers for Q4_0, Q4_1, Q4_K, Q5_0, Q5_1, Q5_K, Q6_K, Q8_0; GGMLModelExport + SameDiffToGGMLConverter for export
Adaptive quantization: AdaptiveLayerQuantizer walks Q2_K → F32 until size budget is met; protects embeddings, LM head, and first/last transformer blocks
Multimodal split files: MultimodalGGUFLoader handles separate encoder/decoder GGUF shards (Qwen3-VL, SmolVLM2)
SPI pipeline framework: samediff-pipeline-core defines AutoModel.fromPretrained() dispatching by file extension/magic; format loaders registered via ServiceLoader
SafeTensors pipeline: samediff-pipeline-safetensors includes SmolVLM2SafeTensorsBuilder and Qwen3VLSafeTensorsBuilder for direct SafeTensors loading

What Changed

`nd4j/nd4j-ggml` — Core GGML/GGUF Module (87 new files)

Entry points:

GGMLModelImport.java — importModel(File), convertToSDZ(src, dst), inspectModel(File)
GGMLModelExport.java — exportModel(SameDiff, File, ExportOptions)
GGMLImportException.java / GGMLExportException.java — checked exception hierarchy

Format layer (format/):

GGUFReader.java — GGUF v1/v2/v3 binary parser (magic, version, KV metadata, tensor descriptors, data sections)
GGUFWriter.java — GGUF binary output with alignment padding
GGMLReader.java / GGMLWriter.java — legacy GGML format support
GGMLFormat.java / GGMLFormatDetector.java — auto-detection from 4-byte magic
GGMLDataType.java — enum of raw GGML dtype codes
GGMLHeader.java / GGMLTensorInfo.java / GGMLMetadata.java — format descriptor types
MultimodalGGUFLoader.java — loads split multimodal GGUF shards

Architecture layer (architecture/):

ModelArchitecture.java — interface: isCompatible(GGMLMetadata), buildSameDiff(GGUFReader, options)
ArchitectureRegistry.java — priority-ordered; auto-discovers via ServiceLoader
LayerTensorDiscovery.java — maps GGUF tensor name patterns to SameDiff variable names
Architecture handlers: LLaMAArchitecture.java, LLaMAExportArchitecture.java, Llama4Architecture.java, GemmaArchitecture.java, MistralArchitecture.java, PhiArchitecture.java, GLMArchitecture.java, GraniteArchitecture.java, LFM2Architecture.java, NemotronArchitecture.java, OLMoArchitecture.java, OpenELMArchitecture.java, GptOssArchitecture.java, SmolVLM2Architecture.java, Qwen3VLArchitecture.java, MiniCPMVArchitecture.java, WhisperArchitecture.java, GenericArchitecture.java
ArchitectureConfig.java / ExportArchitecture.java / ExportArchitectureRegistry.java — export-side counterparts

Quantization layer (quantization/):

GGMLQuantType.java — enum from Q2_K (2.5625 bpw) through F32 (32 bpw)
Quantizer.java / Dequantizer.java / QuantizerFactory.java / DequantizerFactory.java / QuantizationInfo.java — quantization interfaces and dispatch
Standard dequantizers: Q4_0Dequantizer.java, Q4_1Dequantizer.java, Q5_0Dequantizer.java, Q5_1Dequantizer.java, Q8_0Dequantizer.java, Q8_KDequantizer.java
K-quant dequantizers: Q2_KDequantizer.java, Q3_KDequantizer.java, Q4_KDequantizer.java, Q5_KDequantizer.java, Q6_KDequantizer.java
IQ dequantizers: IQ1_MDequantizer.java, IQ1_SDequantizer.java, IQ2_SDequantizer.java, IQ2_XSDequantizer.java, IQ2_XXSDequantizer.java, IQ3_SDequantizer.java, IQ3_XXSDequantizer.java, IQ4_NLDequantizer.java, IQ4_XSDequantizer.java, TQ1_0Dequantizer.java, TQ2_0Dequantizer.java
Export quantizers: Q4_0Quantizer.java, Q4_1Quantizer.java, Q4_KQuantizer.java, Q5_0Quantizer.java, Q5_1Quantizer.java, Q5_KQuantizer.java, Q6_KQuantizer.java, Q8_0Quantizer.java
Adaptive: AdaptiveLayerQuantizer.java, AdaptiveQuantConfig.java, DynamicQuantizationAnalyzer.java, DynamicQuantConfig.java

Conversion layer:

convert/GGMLToSameDiffConverter.java — reads GGUF tensors, dequantizes, creates SameDiff variables
convert/ConversionOptions.java — quantization mode, forTraining flag, architecture override
export/SameDiffToGGMLConverter.java — reverse path: SameDiff variables → GGUF tensor stream
export/ExportOptions.java / TensorExportInfo.java — export configuration

`nd4j/samediff-pipeline-core` — Pipeline Framework (12 new files)

Pipeline.java — interface: generate(input), embed(text), classify(input)
PipelineLoader.java — SPI interface for format-specific loaders
PipelineLoaderRegistry.java — discovers loaders via ServiceLoader
AutoModel.java — AutoModel.fromPretrained(path) dispatches by file extension or header magic
ModelFormat.java — enum: GGUF, SAFETENSORS, ONNX, SDZ, TORCHSCRIPT
ChatTemplate.java / TokenizerConfig.java / GenerationConfig.java / ModelManifest.java / ModelIndex.java / WeightMapIndex.java / SchedulerConfig.java / PreprocessorConfig.java / SpecialTokensMap.java — inference and manifest types

`nd4j/samediff-pipeline-ggml` — GGML Pipeline Loader (5 new files)

GGMLPipelineLoader.java — implements PipelineLoader; detects architecture then delegates to ArchitectureRegistry
GGUFReader.java / GGUFHeader.java / GGUFMetadataType.java / GGUFType.java — lightweight reader for metadata-only extraction
Registered via META-INF/services/org.eclipse.deeplearning4j.pipeline.PipelineLoader

`nd4j/samediff-pipeline-safetensors` — SafeTensors Pipeline Loader (8 new files)

SafeTensorsReader.java — JSON header + memory-mapped tensor data
SafeTensorsHeader.java / SafeTensorsDtype.java — format types
SafeTensorsPipelineLoader.java — PipelineLoader for .safetensors files
architecture/SafeTensorsArchitecture.java / SafeTensorsArchitectureRegistry.java — architecture registry
architecture/SmolVLM2SafeTensorsBuilder.java — builds SameDiff graph from SmolVLM2 shards
architecture/Qwen3VLSafeTensorsBuilder.java — builds SameDiff graph from Qwen3-VL shards

Dependencies

Depends on: PR12 (nd4j-api, SameDiff), PR13 (nd4j-native backend)
Required by: PR19 (samediff-llm uses GGUF import for model loading), PR22 (platform tests: GGMLModelImportTest, GGUFReaderTest, TestAdaptiveLayerQuantizer, RoundTripTest, etc.)

Merge Order

This PR is in Layer 6.

Layer	PRs
0 (no deps)	PR01, PR02, PR20
1 (build/infra)	PR03, PR04
2 (native core)	PR05, PR06, PR07
3 (native feat)	PR08, PR09, PR10, PR11
4 (java core)	PR12, PR13, PR14, PR15
5 (java feat)	PR16
6 (import/gen)	PR17, PR18, PR19, PR21
7 (validation)	PR22

Part of the 22-PR split of ag_new_release_updates_2 branch. Merge layer: 6 (import/gen) Files: 121 See pr-plans/00-master-plan.md for the full split plan and merge order.

…ipeline

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

…plementation The samediff-pipeline-ggml module contained its own parallel GGUFReader (337 lines, RandomAccessFile) plus companion GGUFHeader, GGUFType, and GGUFMetadataType classes, duplicating functionality already present in nd4j-ggml/format/ (443 lines, memory-mapped I/O). Consolidation: - Add open() static factories and high-level INDArray-returning methods (readTensor, readAllTensors, readTensors, getTensorNames, getTensorCount, getTensorInfo) to org.nd4j.ggml.format.GGUFReader. Dequantization delegates to DequantizerFactory which supports the full Q4_0..TQ2_0 type set. - Add nd4j-ggml dependency to samediff-pipeline-ggml/pom.xml. - Rewrite GGMLPipelineLoader to import org.nd4j.ggml.format.{GGUFReader,GGUFHeader} directly; all header accessor methods (getArchitecture, getModelName, getContextLength, getEmbeddingLength, getBlockCount) already exist on the canonical GGUFHeader. - Delete the four now-redundant pipeline classes: GGUFReader, GGUFHeader, GGUFType, GGUFMetadataType. Also fixes a latent bug: the pipeline GGUFType had wrong integer-type IDs (I8=24, I16=25, I32=26, I64=27) which conflict with IQ1_M at ID 24 per the GGUF spec. The canonical GGMLDataType uses the correct IDs (I8=25..I64=28).

…rties samediff-pipeline-ggml (8 files, 1,054 LOC): - Duplicate GGML/GGUF reader module with zero external imports - Canonical GGML support lives in nd4j-ggml module - GGMLPipelineLoader, GGUFReader, GGUFHeader, GGUFType, GGUFMetadataType all duplicates of nd4j-ggml equivalents - Removed module declaration from nd4j/pom.xml aeron.properties: - Orphaned test resource from removed Aeron networking support - Zero references in any test file

agibsonccc · 2026-06-15T12:20:49Z

Architecture Overview

This PR implements GGML/GGUF model import with full quantization codec support and the SPI-based pipeline framework that enables AutoModel.fromPretrained() dispatch by file extension or magic bytes. It handles 18 model architectures and round-trip export for 8 quantization formats.

Highlights

Full GGML quantization codec suite — byte-exact dequantizers for all GGML formats: Q4_0 through Q8_K, k-quant super-blocks (Q2_K–Q6_K), importance-matrix quants (IQ1–IQ4), and ternary formats (TQ1_0, TQ2_0); plus AdaptiveLayerQuantizer that walks Q2_K→F32 until size budget is met while protecting embeddings, LM head, and first/last transformer blocks
18 architecture handlers with multimodal support — maps GGUF tensor name patterns to SameDiff variables for LLaMA 1-4, Gemma 2-3, Mistral, Phi-3/3.5, ChatGLM, Granite, LFM2 (SSM hybrid), Nemotron, OLMo, OpenELM, SmolVLM2, Qwen3-VL, MiniCPM-V, Whisper; MultimodalGGUFLoader handles separate encoder/decoder GGUF shards

agibsonccc added 2 commits June 15, 2026 11:26

PR18: GGML Import & Pipeline Modules

7380dc5

Part of the 22-PR split of ag_new_release_updates_2 branch. Merge layer: 6 (import/gen) Files: 121 See pr-plans/00-master-plan.md for the full split plan and merge order.

Merge remote-tracking branch 'origin/master' into pr/18-ggml-import-p…

4e28077

…ipeline

agibsonccc requested a review from Copilot June 15, 2026 05:43

Copilot AI reviewed Jun 15, 2026

agibsonccc added 2 commits June 15, 2026 16:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PR18: GGML Import & Pipeline Modules#10451

PR18: GGML Import & Pipeline Modules#10451
agibsonccc wants to merge 4 commits into
masterfrom
pr/18-ggml-import-pipeline

agibsonccc commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

agibsonccc commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

agibsonccc commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

nd4j/nd4j-ggml — Core GGML/GGUF Module (87 new files)

nd4j/samediff-pipeline-core — Pipeline Framework (12 new files)

nd4j/samediff-pipeline-ggml — GGML Pipeline Loader (5 new files)

nd4j/samediff-pipeline-safetensors — SafeTensors Pipeline Loader (8 new files)

Dependencies

Merge Order

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

agibsonccc commented Jun 15, 2026

Architecture Overview

Highlights

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agibsonccc commented Jun 15, 2026 •

edited

Loading

`nd4j/nd4j-ggml` — Core GGML/GGUF Module (87 new files)

`nd4j/samediff-pipeline-core` — Pipeline Framework (12 new files)

`nd4j/samediff-pipeline-ggml` — GGML Pipeline Loader (5 new files)

`nd4j/samediff-pipeline-safetensors` — SafeTensors Pipeline Loader (8 new files)