Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docarray/array/storage/weaviate/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,10 +33,18 @@ class WeaviateConfig:
name: Optional[str] = None
serialize_config: Dict = field(default_factory=dict)
n_dim: Optional[int] = None # deprecated, not used anymore since weaviate 1.10
# vectorIndexConfig parameters
ef: Optional[int] = None
ef_construction: Optional[int] = None
timeout_config: Optional[Tuple[int, int]] = None
max_connections: Optional[int] = None
dynamic_ef_min: Optional[int] = None
dynamic_ef_max: Optional[int] = None
dynamic_ef_factor: Optional[int] = None
vector_cache_max_objects: Optional[int] = None
flat_search_cutoff: Optional[int] = None
cleanup_interval_seconds: Optional[int] = None
skip: Optional[bool] = None


class BackendMixin(BaseBackendMixin):
Expand Down Expand Up @@ -120,6 +128,13 @@ def _get_schema_by_name(self, cls_name: str) -> Dict:
'ef': self._config.ef,
'efConstruction': self._config.ef_construction,
'maxConnections': self._config.max_connections,
'dynamicEfMin': self._config.dynamic_ef_min,
'dynamicEfMax': self._config.dynamic_ef_max,
'dynamicEfFactor': self._config.dynamic_ef_factor,
'vectorCacheMaxObjects': self._config.vector_cache_max_objects,
'flatSearchCutoff': self._config.flat_search_cutoff,
'cleanupIntervalSeconds': self._config.cleanup_interval_seconds,
'skip': self._config.skip,
}

return {
Expand Down
30 changes: 19 additions & 11 deletions docs/advanced/document-store/weaviate.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,17 +83,25 @@ Other functions behave the same as in-memory DocumentArray.

The following configs can be set:

| Name | Description | Default |
|--------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
| `host` | Hostname of the Weaviate server | 'localhost' |
| `port` | port of the Weaviate server | 8080 |
| `protocol` | protocol to be used. Can be 'http' or 'https' | 'http' |
| `name` | Weaviate class name; the class name of Weaviate object to presesent this DocumentArray | None |
| `serialize_config` | [Serialization config of each Document](../../../fundamentals/document/serialization.md) | None |
| `ef` | The size of the dynamic list for the nearest neighbors (used during the search). The higher ef is chosen, the more accurate, but also slower a search becomes. | `None`, defaults to the default value in Weaviate* |
| `ef_construction` | The size of the dynamic list for the nearest neighbors (used during the construction). Controls index search speed/build speed tradeoff. | `None`, defaults to the default value in Weaviate* |
| `timeout_config` | Set the timeout configuration for all requests to the Weaviate server. | `None`, defaults to the default value in Weaviate* |
| `max_connections` | The maximum number of connections per element in all layers. | `None`, defaults to the default value in Weaviate* |
| Name | Description | Default |
|----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|
| `host` | Hostname of the Weaviate server | 'localhost' |
| `port` | port of the Weaviate server | 8080 |
| `protocol` | protocol to be used. Can be 'http' or 'https' | 'http' |
| `name` | Weaviate class name; the class name of Weaviate object to presesent this DocumentArray | None |
| `serialize_config` | [Serialization config of each Document](../../../fundamentals/document/serialization.md) | None |
| `ef` | The size of the dynamic list for the nearest neighbors (used during the search). The higher ef is chosen, the more accurate, but also slower a search becomes. | `None`, defaults to the default value in Weaviate* |
| `ef_construction` | The size of the dynamic list for the nearest neighbors (used during the construction). Controls index search speed/build speed tradeoff. | `None`, defaults to the default value in Weaviate* |
| `timeout_config` | Set the timeout configuration for all requests to the Weaviate server. | `None`, defaults to the default value in Weaviate* |
| `max_connections` | The maximum number of connections per element in all layers. | `None`, defaults to the default value in Weaviate* |
| `dynamic_ef_min` | If using dynamic ef (set to -1), this value acts as a lower boundary. Even if the limit is small enough to suggest a lower value, ef will never drop below this value. This helps in keeping search accuracy high even when setting very low limits, such as 1, 2, or 3. | `None`, defaults to the default value in Weaviate* |
| `dynamic_ef_max` | If using dynamic ef (set to -1), this value acts as an upper boundary. Even if the limit is large enough to suggest a lower value, ef will be capped at this value. This helps to keep search speed reasonable when retrieving massive search result sets, e.g. 500+. | `None`, defaults to the default value in Weaviate* |
| `dynamic_ef_factor` | If using dynamic ef (set to -1), this value controls how ef is determined based on the given limit. E.g. with a factor of 8, ef will be set to 8*limit as long as this value is between the lower and upper boundary. It will be capped on either end, otherwise. | `None`, defaults to the default value in Weaviate* |
| `vector_cache_max_objects` | For optimal search and import performance all previously imported vectors need to be held in memory. However, Weaviate also allows for limiting the number of vectors in memory. By default, when creating a new class, this limit is set to 2M objects. A disk lookup for a vector is orders of magnitudes slower than memory lookup, so the cache should be used sparingly. | `None`, defaults to the default value in Weaviate* |
| `flat_search_cutoff` | Absolute number of objects configured as the threshold for a flat-search cutoff. If a filter on a filtered vector search matches fewer than the specified elements, the HNSW index is bypassed entirely and a flat (brute-force) search is performed instead. This can speed up queries with very restrictive filters considerably. Optional, defaults to 40000. Set to 0 to turn off flat-search cutoff entirely. | `None`, defaults to the default value in Weaviate* |
| `cleanup_interval_seconds` | How often the async process runs that “repairs” the HNSW graph after deletes and updates. (Prior to the repair/cleanup process, deleted objects are simply marked as deleted, but still a fully connected member of the HNSW graph. After the repair has run, the edges are reassigned and the datapoints deleted for good). Typically this value does not need to be adjusted, but if deletes or updates are very frequent it might make sense to adjust the value up or down. (Higher value means it runs less frequently, but cleans up more in a single batch. Lower value means it runs more frequently, but might not be as efficient with each run). | `None`, defaults to the default value in Weaviate* |
| `skip` | There are situations where it doesn’t make sense to vectorize a class. For example if the class is just meant as glue between two other class (consisting only of references) or if the class contains mostly duplicate elements (Note that importing duplicate vectors into HNSW is very expensive as the algorithm uses a check whether a candidate’s distance is higher than the worst candidate’s distance for an early exit condition. With (mostly) identical vectors, this early exit condition is never met leading to an exhaustive search on each import or query). In this case, you can skip indexing a vector all-together. To do so, set "skip" to "true". skip defaults to false; if not set to true, classes will be indexed normally. This setting is immutable after class initialization. | `None`, defaults to the default value in Weaviate* |


*You can read more about the HNSW parameters and their default values [here](https://weaviate.io/developers/weaviate/current/vector-index-plugins/hnsw.html#how-to-use-hnsw-and-parameters)

Expand Down
Loading