Releases: lance-format/lance
Releases · lance-format/lance
Release list
v9.0.0-beta.15
v9.0.0-beta.14
What's Changed
New Features 🎉
- feat(blob-v2): configure packed blob file max size with field metadata by @jo-migo in #7322
- feat: support per-base storage options with base_. prefix by @jackye1995 in #7608
- feat: support multi-base tables in merge insert with target base routing by @jackye1995 in #7610
Bug Fixes 🐛
- fix: chunk ngram posting list writes by byte size to avoid i32 offset overflow by @lancedb-robot in #7607
- fix: preserve DataFile base_id in DataReplacement commits by @jackye1995 in #7609
Full Changelog: v9.0.0-beta.13...v9.0.0-beta.14
v9.0.0-beta.13
v9.0.0-beta.12
What's Changed
New Features 🎉
- feat(compaction): introduce RowAddrRemap structure to avoid remap OOM caused by HashMap by @zhangyue19921010 in #7237
- feat(index): read a column's min/max from ZoneMap without a scan by @Ali2Arslan in #7463
- feat: support update with blob encoding column by @nyl3532016 in #7579
Bug Fixes 🐛
- fix(index): record IVF_HNSW index file sizes after writing footer by @Ali2Arslan in #7461
- fix(index): accept large_string (LargeUtf8) for BTREE and ZONEMAP scalar indices by @FANNG1 in #7525
- fix(cache): count key footprint in MokaCacheBackend eviction weight by @Ali2Arslan in #7573
- fix(index): exclude shared metadata cache from LanceIndexStore deep size by @Ali2Arslan in #7574
Documentation 📚
- docs(arrow): add SAFETY comments to lance-arrow unsafe blocks by @LuciferYang in #7511
Performance Improvements 🚀
- perf(index): coalesce concurrent scalar-index opens (single-flight) by @Ali2Arslan in #7464
- perf(dataset): reuse session-cached manifest when opening a dataset by @zhangyue19921010 in #7576
- perf(filtered-read): lower hot-path read/decode spans to debug by @LuQQiu in #7590
Full Changelog: v9.0.0-beta.11...v9.0.0-beta.12
v9.0.0-beta.11
What's Changed
Breaking Changes 🛠
- feat(fts)!: make v2 the default index format by @BubbleCal in #7512
New Features 🎉
- feat(namespace-dir): implement update_table and delete_from_table by @XuQianJin-Stars in #6923
- feat(namespace-dir): implement alter_transaction for DirectoryNamespace by @XuQianJin-Stars in #6974
- feat(mem_wal): support prefiltered LSM vector and FTS search by @touch-of-grey in #7138
Bug Fixes 🐛
- fix(core): allow all-null Map columns in schema evolution by @Ali2Arslan in #7462
- fix(index): preserve schema metadata when re-serializing loaded HNSW by @yanghua in #7476
- fix(rowids): tolerate sparse overlapping chunks in the stable row id index by @wkalt in #7480
- fix(arrow): preserve inner nulls in convert_to_floating_point by @LuciferYang in #7498
- fix: avoid stack overflow on deep logical filters by @Xuanwo in #7510
- fix(compaction): exclude system indices from compaction binning by @xuanyu-z in #7516
- fix(index): work around rustc nightly ICE in NGramIndexBuilder::stream_spill_reader by @westonpace in #7534
- fix: recover from stale cached manifest size on read by @wkalt in #7542
- fix(lance-encoding): rebase nested-list offsets correctly across pages by @yesunbmh in #7546
- fix(lance-io): include goosefs feature in DEFAULT_CLOUD_BLOCK_SIZE cfg gate by @XuQianJin-Stars in #7570
Documentation 📚
- docs: improve phrasing and update format spec image by @prrao87 in #7505
- docs: add FTS v2 migration note by @BubbleCal in #7522
- docs: document read_blobs for blob payload reads by @Xuanwo in #7530
- docs: add blob v2 overview images by @Xuanwo in #7581
- docs(python): document beta release install index by @Xuanwo in #7582
Performance Improvements 🚀
- perf: merge half-open range queries on the same BTree index by @xloya in #7477
- perf(arrow): avoid per-element allocation in BFloat16Array::from by @LuciferYang in #7500
- perf(index): skip FM-Index rebuild when merging a single fully-live segment by @jackye1995 in #7569
Other Changes
- refactor(index): introduce RowIdRemapper trait to decouple ScalarIndexPlugin from FragReuseIndex by @westonpace in #7394
- refactor(index): extract BasicTrainer trait from ScalarIndexPlugin by @westonpace in #7395
Full Changelog: v9.0.0-beta.10...v9.0.0-beta.11
v8.0.0
What's Changed
Breaking Changes 🛠
- feat!: migrate bitmap to index segment based by @Xuanwo in #6869
- fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
- refactor!: remove index segment builder by @Xuanwo in #6997
- refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
- feat!: return write summaries from file writers by @Xuanwo in #7096
- perf!: avoid listing index files after writes by @Xuanwo in #7129
- fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
- feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
- perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
- refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397
Critical Fixes ‼️
- fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251
New Features 🎉
- feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
- feat: add commit timeout to CommitBuilder by @wjones127 in #6773
- feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
- feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
- feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
- feat(java): allow schema override for fragment writes by @beinan in #6919
- feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
- feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
- feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
- feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
- feat: add TOS object store support via OpenDAL by @ddupg in #7019
- feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
- feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
- feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
- feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
- feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
- feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
- feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
- feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
- feat(index): support raw-query ivf rq search by @BubbleCal in #7078
- feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
- feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
- feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
- feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
- feat: support merging zonemap index segments by @Xuanwo in #7128
- feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
- feat(rust): add cleanup explain API by @yanghua in #7147
- feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
- feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
- feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
- feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
- feat(python): expose zonemap segment builds by @everySympathy in #7177
- feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
- feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
- feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
- feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
- fix(python): expose stable row id property in stub by @BubbleCal in #7249
- feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
- feat: configure blob inline threshold per column by @Xuanwo in #7269
- feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
- feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
- feat: expose session cache key inventory by @jackye1995 in #7298
- feat: support mixed-language FTS stop words by @Xuanwo in #7324
- feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
- feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356
Bug Fixes 🐛
- fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
- fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
- fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
- fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
- fix: support multivector IVF centroids in segment builds by @ddupg in #6995
- fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
- fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
- fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
- fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
- fix: restore simple FTS tokenizer default by @Xuanwo in #7006
- fix: expose Hugging Face download mode by @Xuanwo in #7022
- fix(python): clamp target partition sizing by @ddupg in #7036
- fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
- fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
- fix: specify roaring's patch version by @HuaHuaY in #7056
- fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
- fix: handle nested JSON conversion recursively by @Xuanwo in #7060
- fix: retry S3 multipart request timeouts by @Xuanwo in #7061
- fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
- fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
- fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
- fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
- fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @westonpace ...
v8.0.0-rc.3
What's Changed
Breaking Changes 🛠
- feat!: migrate bitmap to index segment based by @Xuanwo in #6869
- fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
- refactor!: remove index segment builder by @Xuanwo in #6997
- refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
- feat!: return write summaries from file writers by @Xuanwo in #7096
- perf!: avoid listing index files after writes by @Xuanwo in #7129
- fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
- feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
- perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
- refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397
Critical Fixes ‼️
- fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251
New Features 🎉
- feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
- feat: add commit timeout to CommitBuilder by @wjones127 in #6773
- feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
- feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
- feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
- feat(java): allow schema override for fragment writes by @beinan in #6919
- feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
- feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
- feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
- feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
- feat: add TOS object store support via OpenDAL by @ddupg in #7019
- feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
- feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
- feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
- feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
- feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
- feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
- feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
- feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
- feat(index): support raw-query ivf rq search by @BubbleCal in #7078
- feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
- feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
- feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
- feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
- feat: support merging zonemap index segments by @Xuanwo in #7128
- feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
- feat(rust): add cleanup explain API by @yanghua in #7147
- feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
- feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
- feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
- feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
- feat(python): expose zonemap segment builds by @everySympathy in #7177
- feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
- feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
- feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
- feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
- fix(python): expose stable row id property in stub by @BubbleCal in #7249
- feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
- feat: configure blob inline threshold per column by @Xuanwo in #7269
- feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
- feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
- feat: expose session cache key inventory by @jackye1995 in #7298
- feat: support mixed-language FTS stop words by @Xuanwo in #7324
- feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
- feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356
Bug Fixes 🐛
- fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
- fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
- fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
- fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
- fix: support multivector IVF centroids in segment builds by @ddupg in #6995
- fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
- fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
- fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
- fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
- fix: restore simple FTS tokenizer default by @Xuanwo in #7006
- fix: expose Hugging Face download mode by @Xuanwo in #7022
- fix(python): clamp target partition sizing by @ddupg in #7036
- fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
- fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
- fix: specify roaring's patch version by @HuaHuaY in #7056
- fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
- fix: handle nested JSON conversion recursively by @Xuanwo in #7060
- fix: retry S3 multipart request timeouts by @Xuanwo in #7061
- fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
- fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
- fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
- fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
- fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @weston...
v9.0.0-beta.9
What's Changed
New Features 🎉
- feat(index): support distributed LabelList scalar index builds by @jackye1995 in #7223
- feat(schema_evolution): allow Dict <-> value-type casts in alter_columns by @valkum in #7289
- feat(mem-wal): add ShardWriter::abort + Sealed manifest fence for drop-table by @hamersaw in #7361
- feat(file): v2 writer/reader support columns of unequal length by @wjones127 in #7406
- feat: add ICU split tokenizer variant by @Xuanwo in #7474
- feat(mem_wal): tombstone-preserving point lookup by @hamersaw in #7482
- feat(mem_wal): non-blocking ShardWriter::delete_no_wait by @hamersaw in #7483
- feat(bitpacking): add owned bitpacking codecs by @BubbleCal in #7496
Bug Fixes 🐛
- fix(compaction): reject defer_index_remap with stable row IDs by @zhangyue19921010 in #7468
- fix: account for SQ offset in dot distance by @Xuanwo in #7481
- fix(merge-insert): apply Delete/Fail on the indexed-scan path by @hamersaw in #7484
- fix: invalidate indices for fields rewritten by Merge by @wkalt in #7491
- fix(index): implement with_io_priority for FailNewFileStore test store by @LuciferYang in #7495
- fix: improve FM index query performance by @jackye1995 in #7507
Documentation 📚
- docs(python): correct scalar index type count by @hfutatzhanghb in #7353
- docs: correct file-format version matrix for 2.2 stable / 2.3 unstable by @LuciferYang in #7479
Performance Improvements 🚀
- perf(index): parallelize FMIndex partition builds by @jackye1995 in #7422
- perf(fts): prewarm larger chunks concurrently by @BubbleCal in #7436
Full Changelog: v9.0.0-beta.8...v9.0.0-beta.9