feat: Added IBM Db2 vector store integration by priyanshu-krishnan1 · Pull Request #3518 · deepset-ai/haystack-core-integrations

priyanshu-krishnan1 · 2026-07-01T14:31:13Z

Related Issues

Adds IBM Db2 vector store integration for Haystack

Proposed Changes:

Added a new integration for IBM Db2 database with vector search capabilities:

Db2DocumentStore: Document store with vector similarity search using DB2's native VECTOR type
Db2EmbeddingRetriever: Retriever component for semantic search with metadata filtering
FilterTranslator: Converts Haystack filters to DB2 SQL with support for complex logical operators

How did you test it?

Unit tests for document store operations, filter translation, and connection handling
Integration tests using Docker Compose with IBM Db2 Community Edition
Haystack document store mixin tests
Manual verification with local Db2 instance

Notes for the reviewer

Follows standard Haystack document store patterns (similar to pgvector, oracle)

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: feat: Add IBM Db2 vector store integration

socket-security · 2026-07-01T14:31:35Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	pypi/ibm-db@3.2.9

View full report

sjrl · 2026-07-01T15:38:31Z

Hey @bogdankostic I was already reviewing this in #3458 (comment) so I'll take this over

sjrl · 2026-07-02T07:08:16Z

+  - modules:
+      - haystack_integrations.document_stores.ibm_db.document_store


Missing the embedding retriever

Suggested change

- modules:

- haystack_integrations.document_stores.ibm_db.document_store

- modules:

- haystack_integrations.components.retrievers.ibm_db.embedding_retriever

- haystack_integrations.document_stores.ibm_db.document_store

sjrl · 2026-07-02T07:09:44Z

+all = 'pytest {args:tests}'
+unit-cov-retry = 'pytest --cov=haystack_integrations --reruns 3 --reruns-delay 30 -x -m "not integration" {args:tests}'
+integration-cov-append-retry = 'pytest --cov=haystack_integrations --cov-append --reruns 3 --reruns-delay 30 -x -m "integration" {args:tests}'
+types = "mypy -p haystack_integrations.document_stores.ibm_db {args}"


missing type checking on the retriever

Suggested change

types = "mypy -p haystack_integrations.document_stores.ibm_db {args}"

types = "mypy -p haystack_integrations.document_stores.ibm_db -p haystack_integrations.components.retrievers.ibm_db {args}"

sjrl · 2026-07-02T07:13:10Z

+                    # If it still fails, raise the error
+                    raise
+
+    def _validate_embedding(self, embedding: list[float] | None, allow_none: bool = True) -> None:


It looks like this method could be made static so lets do that

Suggested change

def _validate_embedding(self, embedding: list[float] | None, allow_none: bool = True) -> None:

@staticmethod

def _validate_embedding(embedding: list[float] | None, allow_none: bool = True) -> None:

sjrl · 2026-07-02T07:13:38Z

+            msg = "All embedding values must be numeric (int or float)"
+            raise TypeError(msg)
+
+    def _to_row(self, doc: Document) -> tuple:


Lets make this method static

Suggested change

def _to_row(self, doc: Document) -> tuple:

@staticmethod

def _to_row(doc: Document) -> tuple:

sjrl · 2026-07-02T07:15:45Z

This file is largely redundant and is covered by test_document_store.py I'd cut this down to just testing the util methods: _parse_embedding, _infer_field_type, _validate_embedding and drop the rest.

And to follow our test convention please move the unit tests for these util methods to test_document_store.py in their own test class.

sjrl · 2026-07-02T07:17:22Z

Please rename this file to test_filters.py to follow our test name convention of one test file per source file that is called the same except with a test_ prefix.

sjrl · 2026-07-02T07:26:58Z

+    def test_to_dict(self, document_store):
+        """Test serialization to dictionary."""


This uses the document_store fixture which is a live connection to the db. Could we create a mock version instead so this becomes a proper unit test?

sjrl · 2026-07-02T07:27:08Z

+        assert d["init_parameters"]["filter_policy"] == "replace"
+        assert "document_store" in d["init_parameters"]
+
+    def test_from_dict(self, document_store):


Same comment as here https://github.com/deepset-ai/haystack-core-integrations/pull/3518/changes#r3511217698

sjrl · 2026-07-02T07:28:21Z

+        assert result == {"documents": expected}
+        mock_store._embedding_retrieval_async.assert_awaited_once()
+
+    def test_from_dict_without_filter_policy(self):


Lets drop this test we don't need to "# Simulate an old serialization that lacks the filter_policy field." since this is a new integration.

sjrl · 2026-07-02T07:30:06Z

+@dataclass
+class Db2ConnectionConfig:


To be more consistent with our other document store integrations I'd prefer if we could just in-line all of these options in the init method of Db2DocumentStore instead of creating a separate dataclass.

sjrl · 2026-07-02T07:33:29Z

+    hostname: str
+    port: int = 50000
+    username: str = ""
+    password: str = ""


For all sensitive information like password we should be using the Secret class from Haystack otherwise we risk exposing this information especially when running serialization. See here for an example

haystack-core-integrations/integrations/pgvector/src/haystack_integrations/document_stores/pgvector/document_store.py

Line 84 in fce1c5d

connection_string: Secret = Secret.from_env_var("PG_CONN_STR"),

As a heads up this means the to_dict and from_dict may need to be updated to handle the Secret serde. See how its handle in to_dict here and from_dict here

Please also use Secret on other items you don't think should be exposed in the serialized format.

sjrl · 2026-07-02T07:36:48Z

+        """
+        return await asyncio.to_thread(self.count_unique_metadata_by_filter, filters, metadata_fields)
+
+        return await asyncio.to_thread(self.get_metadata_fields_info)


This is dead code, lets remove it

sjrl · 2026-07-02T07:38:40Z

+                # In this case, we'll return empty results or filter them out
+                error_msg = str(e)
+                # Check both the error message and the __cause__ attribute
+                cause_msg = str(e.__cause__) if hasattr(e, "____cause__") and e.__cause__ else ""


Too many underscores

Suggested change

cause_msg = str(e.__cause__) if hasattr(e, "____cause__") and e.__cause__ else ""

cause_msg = str(e.__cause__) if hasattr(e, "__cause__") and e.__cause__ else ""

sjrl · 2026-07-02T07:40:43Z

+            top_k: Override the constructor top_k for this call.
+
+        Returns:
+            ``{"documents": [Document, ...]}``


Please use :param: / :returns: type docstrings like you did in filters.py

sjrl · 2026-07-02T07:41:12Z

+    ) -> None:
+        if not isinstance(document_store, Db2DocumentStore):


Missing docstrings for all init parameters. Please add them.

sjrl · 2026-07-02T07:41:53Z

+    Use inside a Haystack pipeline after a text embedder::
+
+        pipeline.add_component("embedder", SentenceTransformersTextEmbedder())
+        pipeline.add_component("retriever", Db2EmbeddingRetriever(
+            document_store=store, top_k=5
+        ))
+        pipeline.connect("embedder.embedding", "retriever.query_embedding")


Please wrap python code in code blocks.

sjrl · 2026-07-02T07:42:46Z

+        filters: dict[str, Any] | None = None,
+        top_k: int | None = None,
+    ) -> dict[str, list[Document]]:
+        """Async variant of :meth:`run`."""


Add docstrings for the variables

sjrl · 2026-07-02T07:44:32Z

We keep our readmes in integrations very light. See PGVectors as an example https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/pgvector/README.md

Please follow that format. The more in depth code example will be included elsewhere in a separate docs contribution to Haystack core and the integration tile in haystack-integrations

sjrl · 2026-07-02T07:45:52Z

+        if not documents:
+            return 0


Lets move this after the if not isinstance(documents, list): check so if a user passes in a wrong value like documents="" then it will be caught and raise a ValueError instead of just returning 0

sjrl · 2026-07-02T07:49:35Z

Hey @priyanshu-krishnan1 thanks for opening the new PR! I've left an initial set of comments.

Also I noticed that the LICENSE.txt file is missing. Please add one. You can copy the one at the top-level of the repo which is here

sjrl · 2026-07-02T07:52:06Z

+    )
+
+
+class Db2DocumentStore:


To be consistent with our naming conventions for other doc stores I think it would be great if we could rename this to IBMDb2DocumentStore. WDYT? If so lets also update the embedding retriever to follow the same convention.

priyanshu-krishnan1 · 2026-07-02T07:54:57Z

Hi @sjrl Thanks for providing initial set of comment, currently looking into it.
we will update the PR with resolution for them.

…ct.toml

…ove obsolete test_from_dict_without_filter_policy test

CLAassistant · 2026-07-02T12:06:38Z

All committers have signed the CLA.

sjrl · 2026-07-02T12:14:42Z

Hey @priyanshu-krishnan1 and @GeetikaChughIBM thanks for the updates! @GeetikaChughIBM would it be possible for you to sign the CLA agreement as well? #3518 (comment)

Added IBM Db2 Vector Store

840d23d

priyanshu-krishnan1 requested a review from a team as a code owner July 1, 2026 14:31

priyanshu-krishnan1 requested review from bogdankostic and removed request for a team July 1, 2026 14:31

github-actions Bot added topic:CI type:documentation Improvements or additions to documentation labels Jul 1, 2026

priyanshu-krishnan1 mentioned this pull request Jul 1, 2026

feat: add IBM Db2 Vector Store (ibm-haystack) #3458

Closed

sjrl requested review from sjrl and removed request for bogdankostic July 1, 2026 15:37

sjrl self-assigned this Jul 1, 2026

sjrl reviewed Jul 2, 2026

View reviewed changes

GeetikaChughIBM and others added 6 commits July 2, 2026 15:01

feat: add embedding_retriever to ibm_db pydoc config_docusaurus.yml

80b3dca

fix: add embedding_retriever package to mypy types command in pyproje…

b232712

…ct.toml

refactor: make _validate_embedding() and _to_row() static methods

36c73c0

refactor: rename test_filter_translator.py to test_filters.py and rem…

83bdb7f

…ove obsolete test_from_dict_without_filter_policy test

fix: remove unused patch import from test_embedding_retriever.py

7ec832d

Merge pull request #4 from GeetikaChughIBM/ibm-db2-review-fixes

5f6075d

		- modules:
		- haystack_integrations.document_stores.ibm_db.document_store

	types = "mypy -p haystack_integrations.document_stores.ibm_db {args}"
	types = "mypy -p haystack_integrations.document_stores.ibm_db -p haystack_integrations.components.retrievers.ibm_db {args}"

	def _validate_embedding(self, embedding: list[float] \| None, allow_none: bool = True) -> None:
	@staticmethod
	def _validate_embedding(embedding: list[float] \| None, allow_none: bool = True) -> None:

	def _to_row(self, doc: Document) -> tuple:
	@staticmethod
	def _to_row(doc: Document) -> tuple:

		def test_to_dict(self, document_store):
		"""Test serialization to dictionary."""

	cause_msg = str(e.__cause__) if hasattr(e, "____cause__") and e.__cause__ else ""
	cause_msg = str(e.__cause__) if hasattr(e, "__cause__") and e.__cause__ else ""

		) -> None:
		if not isinstance(document_store, Db2DocumentStore):

Uh oh!

Conversation

priyanshu-krishnan1 commented Jul 1, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

socket-security Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjrl commented Jul 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjrl commented Jul 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

priyanshu-krishnan1 commented Jul 2, 2026

Uh oh!

CLAassistant commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjrl commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

socket-security Bot commented Jul 1, 2026 •

edited

Loading

CLAassistant commented Jul 2, 2026 •

edited

Loading