PR18: GGML Import & Pipeline Modules#10451
Open
agibsonccc wants to merge 4 commits into
Open
Conversation
Part of the 22-PR split of ag_new_release_updates_2 branch. Merge layer: 6 (import/gen) Files: 121 See pr-plans/00-master-plan.md for the full split plan and merge order.
This was referenced Jun 15, 2026
…plementation
The samediff-pipeline-ggml module contained its own parallel GGUFReader (337
lines, RandomAccessFile) plus companion GGUFHeader, GGUFType, and
GGUFMetadataType classes, duplicating functionality already present in
nd4j-ggml/format/ (443 lines, memory-mapped I/O).
Consolidation:
- Add open() static factories and high-level INDArray-returning methods
(readTensor, readAllTensors, readTensors, getTensorNames, getTensorCount,
getTensorInfo) to org.nd4j.ggml.format.GGUFReader. Dequantization delegates
to DequantizerFactory which supports the full Q4_0..TQ2_0 type set.
- Add nd4j-ggml dependency to samediff-pipeline-ggml/pom.xml.
- Rewrite GGMLPipelineLoader to import org.nd4j.ggml.format.{GGUFReader,GGUFHeader}
directly; all header accessor methods (getArchitecture, getModelName,
getContextLength, getEmbeddingLength, getBlockCount) already exist on the
canonical GGUFHeader.
- Delete the four now-redundant pipeline classes: GGUFReader, GGUFHeader,
GGUFType, GGUFMetadataType.
Also fixes a latent bug: the pipeline GGUFType had wrong integer-type IDs
(I8=24, I16=25, I32=26, I64=27) which conflict with IQ1_M at ID 24 per the
GGUF spec. The canonical GGMLDataType uses the correct IDs (I8=25..I64=28).
…rties samediff-pipeline-ggml (8 files, 1,054 LOC): - Duplicate GGML/GGUF reader module with zero external imports - Canonical GGML support lives in nd4j-ggml module - GGMLPipelineLoader, GGUFReader, GGUFHeader, GGUFType, GGUFMetadataType all duplicates of nd4j-ggml equivalents - Removed module declaration from nd4j/pom.xml aeron.properties: - Orphaned test resource from removed Aeron networking support - Zero references in any test file
Contributor
Author
Architecture OverviewThis PR implements GGML/GGUF model import with full quantization codec support and the SPI-based pipeline framework that enables Highlights
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 18 of 22 PRs in the
ag_new_release_updates_2branch split. [Merge Layer 6]GGUFReaderhandles magic0x46554747, v1/v2/v3 binary layout;GGMLFormatDetectordistinguishes GGUF from legacy GGML via 4-byte magicGGMLModelExport+SameDiffToGGMLConverterfor exportAdaptiveLayerQuantizerwalks Q2_K → F32 until size budget is met; protects embeddings, LM head, and first/last transformer blocksMultimodalGGUFLoaderhandles separate encoder/decoder GGUF shards (Qwen3-VL, SmolVLM2)samediff-pipeline-coredefinesAutoModel.fromPretrained()dispatching by file extension/magic; format loaders registered viaServiceLoadersamediff-pipeline-safetensorsincludesSmolVLM2SafeTensorsBuilderandQwen3VLSafeTensorsBuilderfor direct SafeTensors loadingWhat Changed
nd4j/nd4j-ggml— Core GGML/GGUF Module (87 new files)Entry points:
GGMLModelImport.java—importModel(File),convertToSDZ(src, dst),inspectModel(File)GGMLModelExport.java—exportModel(SameDiff, File, ExportOptions)GGMLImportException.java/GGMLExportException.java— checked exception hierarchyFormat layer (
format/):GGUFReader.java— GGUF v1/v2/v3 binary parser (magic, version, KV metadata, tensor descriptors, data sections)GGUFWriter.java— GGUF binary output with alignment paddingGGMLReader.java/GGMLWriter.java— legacy GGML format supportGGMLFormat.java/GGMLFormatDetector.java— auto-detection from 4-byte magicGGMLDataType.java— enum of raw GGML dtype codesGGMLHeader.java/GGMLTensorInfo.java/GGMLMetadata.java— format descriptor typesMultimodalGGUFLoader.java— loads split multimodal GGUF shardsArchitecture layer (
architecture/):ModelArchitecture.java— interface:isCompatible(GGMLMetadata),buildSameDiff(GGUFReader, options)ArchitectureRegistry.java— priority-ordered; auto-discovers viaServiceLoaderLayerTensorDiscovery.java— maps GGUF tensor name patterns to SameDiff variable namesLLaMAArchitecture.java,LLaMAExportArchitecture.java,Llama4Architecture.java,GemmaArchitecture.java,MistralArchitecture.java,PhiArchitecture.java,GLMArchitecture.java,GraniteArchitecture.java,LFM2Architecture.java,NemotronArchitecture.java,OLMoArchitecture.java,OpenELMArchitecture.java,GptOssArchitecture.java,SmolVLM2Architecture.java,Qwen3VLArchitecture.java,MiniCPMVArchitecture.java,WhisperArchitecture.java,GenericArchitecture.javaArchitectureConfig.java/ExportArchitecture.java/ExportArchitectureRegistry.java— export-side counterpartsQuantization layer (
quantization/):GGMLQuantType.java— enum from Q2_K (2.5625 bpw) through F32 (32 bpw)Quantizer.java/Dequantizer.java/QuantizerFactory.java/DequantizerFactory.java/QuantizationInfo.java— quantization interfaces and dispatchQ4_0Dequantizer.java,Q4_1Dequantizer.java,Q5_0Dequantizer.java,Q5_1Dequantizer.java,Q8_0Dequantizer.java,Q8_KDequantizer.javaQ2_KDequantizer.java,Q3_KDequantizer.java,Q4_KDequantizer.java,Q5_KDequantizer.java,Q6_KDequantizer.javaIQ1_MDequantizer.java,IQ1_SDequantizer.java,IQ2_SDequantizer.java,IQ2_XSDequantizer.java,IQ2_XXSDequantizer.java,IQ3_SDequantizer.java,IQ3_XXSDequantizer.java,IQ4_NLDequantizer.java,IQ4_XSDequantizer.java,TQ1_0Dequantizer.java,TQ2_0Dequantizer.javaQ4_0Quantizer.java,Q4_1Quantizer.java,Q4_KQuantizer.java,Q5_0Quantizer.java,Q5_1Quantizer.java,Q5_KQuantizer.java,Q6_KQuantizer.java,Q8_0Quantizer.javaAdaptiveLayerQuantizer.java,AdaptiveQuantConfig.java,DynamicQuantizationAnalyzer.java,DynamicQuantConfig.javaConversion layer:
convert/GGMLToSameDiffConverter.java— reads GGUF tensors, dequantizes, creates SameDiff variablesconvert/ConversionOptions.java— quantization mode, forTraining flag, architecture overrideexport/SameDiffToGGMLConverter.java— reverse path: SameDiff variables → GGUF tensor streamexport/ExportOptions.java/TensorExportInfo.java— export configurationnd4j/samediff-pipeline-core— Pipeline Framework (12 new files)Pipeline.java— interface:generate(input),embed(text),classify(input)PipelineLoader.java— SPI interface for format-specific loadersPipelineLoaderRegistry.java— discovers loaders viaServiceLoaderAutoModel.java—AutoModel.fromPretrained(path)dispatches by file extension or header magicModelFormat.java— enum: GGUF, SAFETENSORS, ONNX, SDZ, TORCHSCRIPTChatTemplate.java/TokenizerConfig.java/GenerationConfig.java/ModelManifest.java/ModelIndex.java/WeightMapIndex.java/SchedulerConfig.java/PreprocessorConfig.java/SpecialTokensMap.java— inference and manifest typesnd4j/samediff-pipeline-ggml— GGML Pipeline Loader (5 new files)GGMLPipelineLoader.java— implementsPipelineLoader; detects architecture then delegates toArchitectureRegistryGGUFReader.java/GGUFHeader.java/GGUFMetadataType.java/GGUFType.java— lightweight reader for metadata-only extractionMETA-INF/services/org.eclipse.deeplearning4j.pipeline.PipelineLoadernd4j/samediff-pipeline-safetensors— SafeTensors Pipeline Loader (8 new files)SafeTensorsReader.java— JSON header + memory-mapped tensor dataSafeTensorsHeader.java/SafeTensorsDtype.java— format typesSafeTensorsPipelineLoader.java—PipelineLoaderfor.safetensorsfilesarchitecture/SafeTensorsArchitecture.java/SafeTensorsArchitectureRegistry.java— architecture registryarchitecture/SmolVLM2SafeTensorsBuilder.java— builds SameDiff graph from SmolVLM2 shardsarchitecture/Qwen3VLSafeTensorsBuilder.java— builds SameDiff graph from Qwen3-VL shardsDependencies
GGMLModelImportTest,GGUFReaderTest,TestAdaptiveLayerQuantizer,RoundTripTest, etc.)Merge Order
This PR is in Layer 6.