Skip to content

feat: avoid stack embedding for every search#1586

Merged
JoanFM merged 17 commits into
docarray:mainfrom
maxwelljin:feat-avoid-stack-embedding
Jun 1, 2023
Merged

feat: avoid stack embedding for every search#1586
JoanFM merged 17 commits into
docarray:mainfrom
maxwelljin:feat-avoid-stack-embedding

Conversation

@maxwelljin

Copy link
Copy Markdown
Contributor

In the current design of the ExactNNSearchIndexer instance, each time a search is performed, all the embeddings within the index are re-stacked during every 'find' operation.

To optimize this, we've introduced an 'embedding_map'. The initialization of this map is done in a lazy manner - meaning that embeddings are added to the map only when they are explicitly called for index. Embeddings are added to this map only when required, improving batch vector search speed.

Rebuilding of embeddings is only triggered upon user insertion or deletion.

Signed-off-by: Ge Jin <gejin@berkeley.edu>
@maxwelljin

Copy link
Copy Markdown
Contributor Author

I have also created a test to verify whether the _embedding_map correctly updates when the user inserts or deletes a document from the list.

@maxwelljin maxwelljin marked this pull request as ready for review May 31, 2023 07:47

@JoanFM JoanFM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to have extensive testing, and to measure performance impact of the PR

Comment thread docarray/index/backends/in_memory.py
Comment thread docarray/utils/find.py Outdated
@maxwelljin

Copy link
Copy Markdown
Contributor Author

Make sure to have extensive testing, and to measure performance impact of the PR

I have performed extensive testing on the query function as requested. Here are the results:

Data: A list of documents (using DocList) with 3 embedding tensors, each of length 128. The length of the list is denoted as 'm'.
Query: A list of documents. The length of the list is denoted as 'n'.

The implementation 'find_batched' function can be divided into three parts:

  1. Stacking the embeddings or retrieving them from the cache (embedding_map).
  2. Using the comp_backend to perform matrix multiplication to calculate similarity and obtain the top 'k' results (only the indices).
  3. For each query (since it's a batch query), adding the relevant documents to the result array. (the code of part 3 starts at this line)
    for _, (indices_per_query, scores_per_query) in enumerate( zip(top_indices, top_scores) ):

When m = 10,000 and n = 5,000, the timings for the first batched find (all times are in seconds) are as follows:
Time for the 1st part: 0.04855059299734421
Time for the 2nd part: 0.239690676004102
Time for the 3rd part: 2.9729396229959093

In the second batched find (using the same query, thus utilizing the cache version), the timings are as follows:
Time for the 1st part: 0.011098969000158831
Time for the 2nd part: 0.08534913699986646
Time for the 3rd part: 3.03791297099815

When m = 20,000 and n = 10,000, the timings for the first batched find are:
Time for the 1st part: 0.15310250900074607
Time for the 2nd part: 0.6652847430013935
Time for the 3rd part: 5.121941960998811

In the second batched find:
Time for the 1st part: 0.01820297400263371
Time for the 2nd part: 0.3696962500034715
Time for the 3rd part: 4.840899665003235

On average, the optimized part (part one) shows a 5-10 times improvement in speed. However, since the most time-consuming part is part three, the overall optimization is not as significant. It may be beneficial to focus on optimizing the third part to achieve better performance.

Signed-off-by: Ge Jin <gejin@berkeley.edu>
@JoanFM

JoanFM commented May 31, 2023

Copy link
Copy Markdown
Member

Make sure to have extensive testing, and to measure performance impact of the PR

I have performed extensive testing on the query function as requested. Here are the results:

Data: A list of documents (using DocList) with 3 embedding tensors, each of length 128. The length of the list is denoted as 'm'. Query: A list of documents. The length of the list is denoted as 'n'.

The implementation 'find_batched' function can be divided into three parts:

  1. Stacking the embeddings or retrieving them from the cache (embedding_map).
  2. Using the comp_backend to perform matrix multiplication to calculate similarity and obtain the top 'k' results (only the indices).
  3. For each query (since it's a batch query), adding the relevant documents to the result array. (the code of part 3 starts at this line)
    for _, (indices_per_query, scores_per_query) in enumerate( zip(top_indices, top_scores) ):

When m = 10,000 and n = 5,000, the timings for the first batched find (all times are in seconds) are as follows: Time for the 1st part: 0.04855059299734421 Time for the 2nd part: 0.239690676004102 Time for the 3rd part: 2.9729396229959093

In the second batched find (using the same query, thus utilizing the cache version), the timings are as follows: Time for the 1st part: 0.011098969000158831 Time for the 2nd part: 0.08534913699986646 Time for the 3rd part: 3.03791297099815

When m = 20,000 and n = 10,000, the timings for the first batched find are: Time for the 1st part: 0.15310250900074607 Time for the 2nd part: 0.6652847430013935 Time for the 3rd part: 5.121941960998811

In the second batched find: Time for the 1st part: 0.01820297400263371 Time for the 2nd part: 0.3696962500034715 Time for the 3rd part: 4.840899665003235

On average, the optimized part (part one) shows a 5-10 times improvement in speed. However, since the most time-consuming part is part three, the overall optimization is not as significant. It may be beneficial to focus on optimizing the third part to achieve better performance.

Can you add a script to reproduce the results? And also compare with the potential gain in #1598

@JoanFM

JoanFM commented May 31, 2023

Copy link
Copy Markdown
Member

Can this one help @maxwelljin ? #1598

Signed-off-by: Ge Jin <gejin@berkeley.edu>
@maxwelljin

Copy link
Copy Markdown
Contributor Author

Can this one help @maxwelljin ? #1598

Yeah, it's very helpful to include this optimization. I've included my test code in this pull request. To reproduce the results, we need to add a timer inside the find_batch function in in_memory.py. Specifically, I use the perf_counter function from the time module. I have created four breakpoints. The first one is before the line if cache is not None and search_field in cache:. The second one is before the line metric_fn = getattr(comp_backend.Metrics, metric). The third one is at the line batched_docs: List[DocList] = []. The fourth one is at the end of the function. By measuring the time difference between these breakpoints, we can evaluate the performance improvements.

The potential gain from #1598 is as follows:

Before the simplification:

The part 1 takes 0.14537029600614915 seconds
The part 2 takes 0.49421013099345146 seconds
The part 3 takes 4.325345135002863 seconds

The part 1 takes 0.018737553000391927 seconds
The part 2 takes 0.341395959003421 seconds
The part 3 takes 4.513382460994762 seconds

After the simplification:

The part 1 takes 0.1592650849997881 seconds
The part 2 takes 0.6122807069987175 seconds
The part 3 takes 2.7912922600007732 seconds

The part 1 takes 0.016888011996343266 seconds
The part 2 takes 0.3044082790001994 seconds
The part 3 takes 2.9079534870033967 seconds

I guess the reason for this is that the first two parts of the code make use of built-in optimizations from the numpy/torch module, which can significantly improve performance.

Comment thread tests/index/in_memory/test_in_memory.py Outdated
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Comment thread tests/units/util/test_find.py Outdated
search_field='tensor',
limit=7,
metric=metric,
cache={},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this cache, it should not be considered when using find.

Comment thread tests/units/util/test_find.py Outdated
search_field='tensor',
limit=7,
metric=metric,
cache={},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

@maxwelljin

Copy link
Copy Markdown
Contributor Author

I have finished a more thorough testing of our optimization on my local machine, and I would like to share the results:

Number of queries: 10000
Number of indexed documents: 10000
Before optimization: 4.8117079977993855 seconds
After optimization: 3.1941708801983624 seconds
Speedup: 1.50x

Number of queries: 10000
Number of indexed documents: 20000
Before optimization: 4.9425261351978405 seconds
After optimization: 3.0140384888014524 seconds
Speedup: 1.64x

Number of queries: 10000
Number of indexed documents: 30000
Before optimization: 6.004549367600703 seconds
After optimization: 3.56447235160158 seconds
Speedup: 1.68x

Number of queries: 10000
Number of indexed documents: 50000
Before optimization: 9.126258195599075 seconds
After optimization: 3.4907073292008137 seconds
Speedup: 2.61x

Signed-off-by: Ge Jin <gejin@berkeley.edu>
@maxwelljin maxwelljin force-pushed the feat-avoid-stack-embedding branch from caabceb to ca85ab3 Compare June 1, 2023 07:40
@JoanFM JoanFM merged commit 110f714 into docarray:main Jun 1, 2023
@samsja samsja mentioned this pull request Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ExactNNSearchIndexer can be optimized to avoid stacking embeddings every time

2 participants