feat: avoid stack embedding for every search#1586
Conversation
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Signed-off-by: Ge Jin <gejin@berkeley.edu>
|
I have also created a test to verify whether the _embedding_map correctly updates when the user inserts or deletes a document from the list. |
JoanFM
left a comment
There was a problem hiding this comment.
Make sure to have extensive testing, and to measure performance impact of the PR
I have performed extensive testing on the query function as requested. Here are the results: Data: A list of documents (using DocList) with 3 embedding tensors, each of length 128. The length of the list is denoted as 'm'. The implementation 'find_batched' function can be divided into three parts:
When m = 10,000 and n = 5,000, the timings for the first batched find (all times are in seconds) are as follows: In the second batched find (using the same query, thus utilizing the cache version), the timings are as follows: When m = 20,000 and n = 10,000, the timings for the first batched find are: In the second batched find: On average, the optimized part (part one) shows a 5-10 times improvement in speed. However, since the most time-consuming part is part three, the overall optimization is not as significant. It may be beneficial to focus on optimizing the third part to achieve better performance. |
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Can you add a |
|
Can this one help @maxwelljin ? #1598 |
Signed-off-by: Ge Jin <gejin@berkeley.edu>
Yeah, it's very helpful to include this optimization. I've included my test code in this pull request. To reproduce the results, we need to add a timer inside the find_batch function in in_memory.py. Specifically, I use the perf_counter function from the time module. I have created four breakpoints. The first one is before the line The potential gain from #1598 is as follows: Before the simplification: The part 1 takes 0.14537029600614915 seconds The part 1 takes 0.018737553000391927 seconds After the simplification: The part 1 takes 0.1592650849997881 seconds The part 1 takes 0.016888011996343266 seconds I guess the reason for this is that the first two parts of the code make use of built-in optimizations from the numpy/torch module, which can significantly improve performance. |
Signed-off-by: Ge Jin <gejin@berkeley.edu>
| search_field='tensor', | ||
| limit=7, | ||
| metric=metric, | ||
| cache={}, |
There was a problem hiding this comment.
can we remove this cache, it should not be considered when using find.
| search_field='tensor', | ||
| limit=7, | ||
| metric=metric, | ||
| cache={}, |
|
I have finished a more thorough testing of our optimization on my local machine, and I would like to share the results: Number of queries: 10000 Number of queries: 10000 Number of queries: 10000 Number of queries: 10000 |
Signed-off-by: Ge Jin <gejin@berkeley.edu>
caabceb to
ca85ab3
Compare
In the current design of the ExactNNSearchIndexer instance, each time a search is performed, all the embeddings within the index are re-stacked during every 'find' operation.
To optimize this, we've introduced an 'embedding_map'. The initialization of this map is done in a lazy manner - meaning that embeddings are added to the map only when they are explicitly called for index. Embeddings are added to this map only when required, improving batch vector search speed.
Rebuilding of embeddings is only triggered upon user insertion or deletion.