Skip to content

Releases: lance-format/lance

v9.0.0-beta.15

v9.0.0-beta.15 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 03 Jul 23:17

What's Changed

Bug Fixes 🐛

  • fix: treat blob descriptors as opaque projections by @Xuanwo in #7618

Full Changelog: v9.0.0-beta.14...v9.0.0-beta.15

v9.0.0-beta.14

v9.0.0-beta.14 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 03 Jul 16:55

What's Changed

New Features 🎉

  • feat(blob-v2): configure packed blob file max size with field metadata by @jo-migo in #7322
  • feat: support per-base storage options with base_. prefix by @jackye1995 in #7608
  • feat: support multi-base tables in merge insert with target base routing by @jackye1995 in #7610

Bug Fixes 🐛

  • fix: chunk ngram posting list writes by byte size to avoid i32 offset overflow by @lancedb-robot in #7607
  • fix: preserve DataFile base_id in DataReplacement commits by @jackye1995 in #7609

Full Changelog: v9.0.0-beta.13...v9.0.0-beta.14

v9.0.0-beta.13

v9.0.0-beta.13 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 03 Jul 13:13

What's Changed

New Features 🎉

Bug Fixes 🐛

  • fix: preserve structural page load errors by @ddupg in #7571

Full Changelog: v9.0.0-beta.12...v9.0.0-beta.13

v9.0.0-beta.12

v9.0.0-beta.12 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 03 Jul 03:44

What's Changed

New Features 🎉

  • feat(compaction): introduce RowAddrRemap structure to avoid remap OOM caused by HashMap by @zhangyue19921010 in #7237
  • feat(index): read a column's min/max from ZoneMap without a scan by @Ali2Arslan in #7463
  • feat: support update with blob encoding column by @nyl3532016 in #7579

Bug Fixes 🐛

  • fix(index): record IVF_HNSW index file sizes after writing footer by @Ali2Arslan in #7461
  • fix(index): accept large_string (LargeUtf8) for BTREE and ZONEMAP scalar indices by @FANNG1 in #7525
  • fix(cache): count key footprint in MokaCacheBackend eviction weight by @Ali2Arslan in #7573
  • fix(index): exclude shared metadata cache from LanceIndexStore deep size by @Ali2Arslan in #7574

Documentation 📚

  • docs(arrow): add SAFETY comments to lance-arrow unsafe blocks by @LuciferYang in #7511

Performance Improvements 🚀

  • perf(index): coalesce concurrent scalar-index opens (single-flight) by @Ali2Arslan in #7464
  • perf(dataset): reuse session-cached manifest when opening a dataset by @zhangyue19921010 in #7576
  • perf(filtered-read): lower hot-path read/decode spans to debug by @LuQQiu in #7590

Full Changelog: v9.0.0-beta.11...v9.0.0-beta.12

v9.0.0-beta.11

v9.0.0-beta.11 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 02 Jul 16:55

What's Changed

Breaking Changes 🛠

New Features 🎉

Bug Fixes 🐛

  • fix(core): allow all-null Map columns in schema evolution by @Ali2Arslan in #7462
  • fix(index): preserve schema metadata when re-serializing loaded HNSW by @yanghua in #7476
  • fix(rowids): tolerate sparse overlapping chunks in the stable row id index by @wkalt in #7480
  • fix(arrow): preserve inner nulls in convert_to_floating_point by @LuciferYang in #7498
  • fix: avoid stack overflow on deep logical filters by @Xuanwo in #7510
  • fix(compaction): exclude system indices from compaction binning by @xuanyu-z in #7516
  • fix(index): work around rustc nightly ICE in NGramIndexBuilder::stream_spill_reader by @westonpace in #7534
  • fix: recover from stale cached manifest size on read by @wkalt in #7542
  • fix(lance-encoding): rebase nested-list offsets correctly across pages by @yesunbmh in #7546
  • fix(lance-io): include goosefs feature in DEFAULT_CLOUD_BLOCK_SIZE cfg gate by @XuQianJin-Stars in #7570

Documentation 📚

  • docs: improve phrasing and update format spec image by @prrao87 in #7505
  • docs: add FTS v2 migration note by @BubbleCal in #7522
  • docs: document read_blobs for blob payload reads by @Xuanwo in #7530
  • docs: add blob v2 overview images by @Xuanwo in #7581
  • docs(python): document beta release install index by @Xuanwo in #7582

Performance Improvements 🚀

  • perf: merge half-open range queries on the same BTree index by @xloya in #7477
  • perf(arrow): avoid per-element allocation in BFloat16Array::from by @LuciferYang in #7500
  • perf(index): skip FM-Index rebuild when merging a single fully-live segment by @jackye1995 in #7569

Other Changes

  • refactor(index): introduce RowIdRemapper trait to decouple ScalarIndexPlugin from FragReuseIndex by @westonpace in #7394
  • refactor(index): extract BasicTrainer trait from ScalarIndexPlugin by @westonpace in #7395

Full Changelog: v9.0.0-beta.10...v9.0.0-beta.11

v8.0.0

Choose a tag to compare

@lance-community lance-community released this 01 Jul 14:56

What's Changed

Breaking Changes 🛠

  • feat!: migrate bitmap to index segment based by @Xuanwo in #6869
  • fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
  • refactor!: remove index segment builder by @Xuanwo in #6997
  • refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
  • feat!: return write summaries from file writers by @Xuanwo in #7096
  • perf!: avoid listing index files after writes by @Xuanwo in #7129
  • fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
  • feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
  • perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
  • refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397

Critical Fixes ‼️

  • fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251

New Features 🎉

  • feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
  • feat: add commit timeout to CommitBuilder by @wjones127 in #6773
  • feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
  • feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
  • feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
  • feat(java): allow schema override for fragment writes by @beinan in #6919
  • feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
  • feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
  • feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
  • feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
  • feat: add TOS object store support via OpenDAL by @ddupg in #7019
  • feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
  • feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
  • feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
  • feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
  • feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
  • feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
  • feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
  • feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
  • feat(index): support raw-query ivf rq search by @BubbleCal in #7078
  • feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
  • feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
  • feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
  • feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
  • feat: support merging zonemap index segments by @Xuanwo in #7128
  • feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
  • feat(rust): add cleanup explain API by @yanghua in #7147
  • feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
  • feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
  • feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
  • feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
  • feat(python): expose zonemap segment builds by @everySympathy in #7177
  • feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
  • feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
  • feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
  • feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
  • fix(python): expose stable row id property in stub by @BubbleCal in #7249
  • feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
  • feat: configure blob inline threshold per column by @Xuanwo in #7269
  • feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
  • feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
  • feat: expose session cache key inventory by @jackye1995 in #7298
  • feat: support mixed-language FTS stop words by @Xuanwo in #7324
  • feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
  • feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356

Bug Fixes 🐛

  • fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
  • fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
  • fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
  • fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
  • fix: support multivector IVF centroids in segment builds by @ddupg in #6995
  • fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
  • fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
  • fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
  • fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
  • fix: restore simple FTS tokenizer default by @Xuanwo in #7006
  • fix: expose Hugging Face download mode by @Xuanwo in #7022
  • fix(python): clamp target partition sizing by @ddupg in #7036
  • fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
  • fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
  • fix: specify roaring's patch version by @HuaHuaY in #7056
  • fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
  • fix: handle nested JSON conversion recursively by @Xuanwo in #7060
  • fix: retry S3 multipart request timeouts by @Xuanwo in #7061
  • fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
  • fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
  • fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
  • fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
  • fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @westonpace ...
Read more

v8.0.0-rc.3

v8.0.0-rc.3 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 30 Jun 10:59

What's Changed

Breaking Changes 🛠

  • feat!: migrate bitmap to index segment based by @Xuanwo in #6869
  • fix(python)!: derive index type from details instead of opening the index by @wjones127 in #6903
  • refactor!: remove index segment builder by @Xuanwo in #6997
  • refactor(index)!: move distributed BTree build to segmented index framework by @zhangyue19921010 in #7013
  • feat!: return write summaries from file writers by @Xuanwo in #7096
  • perf!: avoid listing index files after writes by @Xuanwo in #7129
  • fix(dataset)!: fail-fast casting for columns with attached indices by @WenDing-Y in #7158
  • feat(vector)!: add approx mode for RaBitQ search by @BubbleCal in #7179
  • perf(vector)!: add dedicated SIMD kernels for RaBitQ ex-code reranking by @BubbleCal in #7205
  • refactor!: rename FMIndexIndexDetails to FMIndexDetails by @westonpace in #7397

Critical Fixes ‼️

  • fix: merge_insert silently drops matches when a leading payload column is all-null by @Ar-maan05 in #7251

New Features 🎉

  • feat: expose tracked_files and all_files on LanceDataset by @wjones127 in #6011
  • feat: add commit timeout to CommitBuilder by @wjones127 in #6773
  • feat: add segmented BTree index merge_segments support by @zhangyue19921010 in #6889
  • feat(index): add streaming ivf kmeans training by @BubbleCal in #6913
  • feat: use indexes to accelerate filtered count_rows by @westonpace in #6916
  • feat(java): allow schema override for fragment writes by @beinan in #6919
  • feat: make ICU the default FTS tokenizer by @Xuanwo in #6968
  • feat: support Utf8View and BinaryView in encoding and filter coercion by @xuanyu-z in #6985
  • feat(python): add shared RaBitQ rotation for distributed IVF_RQ builds by @gstamatakis95 in #7014
  • feat(blobv2): support all BlobKind types in blob v2 compact_files by @yyzhao2025 in #7017
  • feat: add TOS object store support via OpenDAL by @ddupg in #7019
  • feat(index): implement FM-Index scalar index for exact substring search by @beinan in #7026
  • feat(python): expose external blob mode and outside-base option for fragments by @plotor in #7028
  • feat(lance-io): add GooseFS object store provider by @XuQianJin-Stars in #7034
  • feat(index): support multi-bit IVF_RQ storage by @BubbleCal in #7038
  • feat: expose getters for ScalarIndexExec by @LuQQiu in #7039
  • feat: expose methods and getters of scalar index for possible distributed execution by @LuQQiu in #7045
  • feat(mem_wal): configurable HNSW build params for MemTable writers by @touch-of-grey in #7054
  • feat: dedup FTS results across LSM tiers in LsmFtsSearchPlanner by @hamersaw in #7066
  • feat(index): support raw-query ivf rq search by @BubbleCal in #7078
  • feat: add EnforceDistribution to physical optimizer by @wjones127 in #7086
  • feat(java): add missing scanner and merge insert params to align with Python/Rust by @WenDing-Y in #7100
  • feat: populate enriched IndexContent fields in dir namespace ListTableIndices by @wjones127 in #7109
  • feat(index): support configurable multi-segment FM-Index builds by @beinan in #7123
  • feat: support merging zonemap index segments by @Xuanwo in #7128
  • feat(index): accelerate regex and infix LIKE with the ngram index by @wombatu-kun in #7139
  • feat(rust): add cleanup explain API by @yanghua in #7147
  • feat: stabilize cache codec with a versioned envelope by @wjones127 in #7163
  • feat(lance-select): expose selected rows accessor on NullableRowAddrSet by @LuQQiu in #7164
  • feat: branch-aware table version ops in directory and rest namespaces by @brendanclement in #7166
  • feat(python): expose segment FTS build through create_index_uncommitted by @ddupg in #7170
  • feat(python): expose zonemap segment builds by @everySympathy in #7177
  • feat(dir-catalog): add reader/writer feature flags to __manifest by @jackye1995 in #7191
  • feat(index): expose per-query I/O metrics on ANN operators by @wombatu-kun in #7204
  • feat(mem-wal): snapshot-consistent as-of cut for fresh-tier membership by @hamersaw in #7215
  • feat: expose io_buffer_size in CompactionOptions by @aimanmalib in #7226
  • fix(python): expose stable row id property in stub by @BubbleCal in #7249
  • feat: bump lance-namespace-reqwest-client to 0.8.6 (source_task_size) by @justinrmiller in #7254
  • feat: configure blob inline threshold per column by @Xuanwo in #7269
  • feat(mem_wal): warm flushed generations into shared caches before query by @hamersaw in #7284
  • feat(java): expose RTree scalar index type to Java by @zhangyue19921010 in #7291
  • feat: expose session cache key inventory by @jackye1995 in #7298
  • feat: support mixed-language FTS stop words by @Xuanwo in #7324
  • feat(scalar): expose LogicalScalarIndex::try_new and load_named_scalar_segments by @LuQQiu in #7339
  • feat: allow tuning miniblock value chunks to 32k by @Xuanwo in #7356

Bug Fixes 🐛

  • fix(encoding): plan sparse structural miniblock pages by @Xuanwo in #6787
  • fix(java): resolve JNI classloader bug on dispatcher thread in Spark by @sezruby in #6946
  • fix: preserve zero-length buffers in binary copy compaction by @zhangyue19921010 in #6992
  • fix(storage): retry throttled fts metadata listing by @BubbleCal in #6994
  • fix: support multivector IVF centroids in segment builds by @ddupg in #6995
  • fix(python): validate BFloat16.from_bytes length by @ddupg in #6998
  • fix(test): tolerate boundary-tie in IVF distance_range assertions by @xuanyu-z in #6999
  • fix(datafusion): coerce filter literals for dictionary-encoded columns by @valkum in #7003
  • fix: stream object copies larger than cloud's CopyObject limit by @vivek-bharathan in #7004
  • fix: restore simple FTS tokenizer default by @Xuanwo in #7006
  • fix: expose Hugging Face download mode by @Xuanwo in #7022
  • fix(python): clamp target partition sizing by @ddupg in #7036
  • fix(fts): handle empty query tokens in flat full-text search by @vivek-bharathan in #7046
  • fix(lance-index): fix some flaky tests by @XuQianJin-Stars in #7052
  • fix: specify roaring's patch version by @HuaHuaY in #7056
  • fix(filtered-read): record IO metrics even when filter matches no rows by @westonpace in #7057
  • fix: handle nested JSON conversion recursively by @Xuanwo in #7060
  • fix: retry S3 multipart request timeouts by @Xuanwo in #7061
  • fix(fts): size cached posting lists by referenced slice by @vivek-bharathan in #7068
  • fix: cap exec-node parallelism to DataFusion target_partitions by @wjones127 in #7087
  • fix(mem-wal): fence predecessor with a WAL sentinel on claim by @hamersaw in #7110
  • fix(fts): reset TokenSet next_id and total_length after remap by @vivek-bharathan in #7115
  • fix(linalg): reduce cosine bench TOTAL to avoid FixedSizeBinaryArray i32 overflow by @weston...
Read more

v9.0.0-beta.9

v9.0.0-beta.9 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 29 Jun 17:24

What's Changed

New Features 🎉

  • feat(index): support distributed LabelList scalar index builds by @jackye1995 in #7223
  • feat(schema_evolution): allow Dict <-> value-type casts in alter_columns by @valkum in #7289
  • feat(mem-wal): add ShardWriter::abort + Sealed manifest fence for drop-table by @hamersaw in #7361
  • feat(file): v2 writer/reader support columns of unequal length by @wjones127 in #7406
  • feat: add ICU split tokenizer variant by @Xuanwo in #7474
  • feat(mem_wal): tombstone-preserving point lookup by @hamersaw in #7482
  • feat(mem_wal): non-blocking ShardWriter::delete_no_wait by @hamersaw in #7483
  • feat(bitpacking): add owned bitpacking codecs by @BubbleCal in #7496

Bug Fixes 🐛

  • fix(compaction): reject defer_index_remap with stable row IDs by @zhangyue19921010 in #7468
  • fix: account for SQ offset in dot distance by @Xuanwo in #7481
  • fix(merge-insert): apply Delete/Fail on the indexed-scan path by @hamersaw in #7484
  • fix: invalidate indices for fields rewritten by Merge by @wkalt in #7491
  • fix(index): implement with_io_priority for FailNewFileStore test store by @LuciferYang in #7495
  • fix: improve FM index query performance by @jackye1995 in #7507

Documentation 📚

Performance Improvements 🚀

Full Changelog: v9.0.0-beta.8...v9.0.0-beta.9

v9.0.0-beta.10

v9.0.0-beta.10 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 29 Jun 17:54

What's Changed

Bug Fixes 🐛

  • fix(mem-wal): apply cross-generation block-list to in-memory scan arms by @hamersaw in #7489

Full Changelog: v9.0.0-beta.9...v9.0.0-beta.10

v9.0.0-beta.8

v9.0.0-beta.8 Pre-release
Pre-release

Choose a tag to compare

@lance-community lance-community released this 24 Jun 23:18

What's Changed

Bug Fixes 🐛

  • fix(index): give each FTS partition a distinct scheduler base priority by @LuQQiu in #7449

Full Changelog: v9.0.0-beta.7...v9.0.0-beta.8