From Keywords To Meaning: Semantic Retrieval For S...

HANA Cloud vector search improved the matching, but exposed a harder problem: retrieval is not the same as business reasoning

SAP Community Blog · May 2026 · 12 min read · Part 2 of a series

In this post

Why keyword search was not enough
What the pattern store actually contains
Moving retrieval into HANA Cloud vector search
What improved: three behaviors in practice
Generating patterns from OData specs
Where it failed: the “SAP SE” case
What this taught me about retrieval vs reasoning
What comes next

Why keyword search was not enough

Keyword search works only when users use the same words as your stored patterns. SAP users rarely do that. A procurement user may say “supplier invoice,” a finance user may say “vendor invoice not yet paid,” and both may expect the same underlying S/4HANA query.

In Part 1, we saw an A2A prototype on SAP BTP Cloud Foundry that mapped natural-language questions to validated SAP OData patterns and executed live queries against S/4HANA. The first retrieval layer used PostgreSQL full-text search. It worked for exact phrasing, but broke on synonyms.

So I moved the retrieval layer to HANA Cloud vector search. Semantic matching improved meaningfully in my test cases, but the more important lesson was that better retrieval does not solve business interpretation. SAP users ask in business language; S/4HANA APIs expect IDs, codes, and precise filters. Bridging that gap requires more than embeddings.

At a high level, the prototype has two flows: a runtime flow for answering user questions, and an ingestion flow for generating, enriching, and validating OData patterns.

High-level retrieval architecture for the prototype. The runtime path starts with the React / Next.js app, embeds the user query through SAP AI Core, searches the HANA Cloud pattern store using vector similarity, and executes the selected OData pattern against S/4HANA. The dashed paths show ingestion, enrichment, and possible future integration with an API catalog service.

The important design choice is that retrieval metadata, validation state, and embeddings live together in HANA Cloud. SAP AI Core handles embedding, reranking, slot extraction, and pattern generation. S/4HANA remains the source of truth for execution. The pattern store does not answer from memory; it retrieves a query shape and executes it against live APIs.

What the pattern store actually contains

Before going further, it is worth being precise about what a “pattern” is in this system. It is not a prompt, not an API template, and not a search document.

A pattern is a reusable mapping between a business question and an executable OData query shape. Each row in the ODATA_PATTERNS table stores:

The business meaning: description and LLM-expanded retrieval text (8-15 synonym phrasings)
The execution target: service name, entity set, and filter template with typed placeholders
The lifecycle metadata: source (curated or generated), validation status, execution count
The embedding: a 1536-dimensional vector used for cosine similarity at query time

Example:

Question:    "purchase orders by supplier"
Service:     API_PURCHASEORDER_PROCESS_SRV
Entity set:  A_PurchaseOrder
Filter:      Supplier eq '{{supplier_id}}'

At query time, the user's question is embedded and compared against every stored embedding using cosine similarity. The closest pattern wins, or goes to an LLM reranker if the score is ambiguous. If a match is found, the filter template is filled with values extracted from the question and the OData call is executed against S/4HANA. The result is live data, not generated text.

The ODATA_PATTERNS table in HANA Cloud. The 1536-dimensional embedding vector lives in the same row as the business metadata: service, entity set, filter template, validation status. No separate vector store.

Moving retrieval into HANA Cloud vector search

SAP HANA Cloud supports vector storage natively via the REAL_VECTOR data type, and cosine similarity via a built-in SQL function. The similarity computation stays inside the database engine. No rows are loaded into application memory, no separate vector service is needed:

SELECT TOP 5 ID, DESCRIPTION, FILTER_TEMPLATE, SERVICE_NAME,
  COSINE_SIMILARITY(EMBEDDING, TO_REAL_VECTOR(?)) AS SCORE
FROM ODATA_PATTERNS
ORDER BY SCORE DESC

The embedding model is text-embedding-3-small via SAP AI Core, at 1536 dimensions. At write time, each pattern's description is first expanded by an LLM into 8-15 synonym phrasings before embedding. “Purchase orders by supplier” becomes “POs for vendor, orders by BP, vendor PO list, supplier purchase orders…” All of that gets embedded together into a single vector, so the embedding represents the range of business phrasings a user might realistically use, not just the original description.

What improved: three behaviors in practice

In this prototype, the reranker is a second LLM check that reviews the top retrieved candidates and either confirms the best match or rejects the query as unsupported. Queries flow through three paths depending on the cosine score: fast path (score at or above 0.65, no LLM), LLM reranker (0.45 to 0.65), or auto-reject (below 0.45).

Vocabulary mismatch: where the improvement was clearest

The clearest improvement was on vocabulary mismatch. “Vendor invoices not yet paid” matched “Show supplier invoices in the system” with a cosine score of 0.516. Only one word overlaps: invoices. Full-text search would likely have treated this as a weak match. Vector search found the semantic proximity and, after LLM reranking confirmed the match, returned live invoice data from S/4HANA.

“Vendor invoices not yet paid” matched “Show supplier invoices in the system.” Only “invoices” overlaps as a keyword. Vector search found the semantic proximity that keyword search misses.

The fast path: zero LLM calls for confident matches

When the cosine score is at or above 0.65, the system accepts the match and executes without any LLM call. In this test case, “Open sales orders” scored 0.679. The curated pattern encoded the relevant S/4HANA status filter: OverallSDProcessStatus eq 'A'. The query ran, and live sales order data came back in one round trip to HANA and one to S/4HANA.

Slot extraction: filling values from the question

When a pattern has {{placeholders}}, the LLM extracts values from the question. “Purchase orders by 17401710” – the vendor ID was detected, Supplier eq '17401710' composed, and PO 4500000060 returned live from S/4HANA.

“17401710” extracted as supplier_id. Filter composed. PO 4500000060 returned from S/4HANA.

Generating patterns from OData specs

A pattern store that starts empty is not useful. Writing patterns by hand is slow: 8 curated patterns cover 3 domains. A real SAP landscape spans dozens of APIs and hundreds of relevant queries.

The OData specification files (OpenAPI/Edmx JSON committed alongside the code) contain much of what an LLM needs to propose candidate patterns: entity sets, properties, filter value enumerations, relationships. An LLM running on SAP AI Core reads a spec and proposes patterns with filter templates. One run, seven specs, 139 candidate patterns, without hand-authoring.

Seven OData specs. 140 generated candidate patterns. Business Partner, Maintenance Order, Outbound Delivery, Purchase Order, Purchase Requisition, Sales Order, Supplier Invoice.

In my test cases, generated patterns worked for common queries. “Purchase requisitions waiting for approval” matched a generated pattern using ReleaseIsNotCompleted eq true on A_PurchaseRequisitionItem, the expected field in my test system.

A generated pattern using the expected SAP status field. Real requisition data returned.

Generated patterns enter the store as candidates: SOURCE='generated', VALIDATED=FALSE. They can be retrieved and executed, but they are not promoted until confirmed.

On trust: HTTP 200 alone is not sufficient to promote a pattern. A successful response with zero records, as the next section shows, looks the same as an HTTP 200 with valid results. Promotion requires result-shape checks, expected outcome checks, repeated successful usage, or explicit human review. Execution feedback accumulates trust over time; it does not grant it immediately.

Where it failed: the “SAP SE” case

Then came the more interesting failure: “Purchase orders by SAP SE.”

The slot signal detector correctly identified “SAP SE” as a supplier name after a preposition. The LLM extracted supplier_id = "SAP SE". The OData call executed, HTTP 200, and returned zero records.

Structurally correct. Semantically useless. S/4HANA filters on vendor number 17401710, not the string “SAP SE.” The execution completes with HTTP 200 and returns nothing. No error. No signal that anything went wrong.

This is the failure mode that most AI demonstrations avoid showing. There is no error, no stack trace. The system did exactly what it was designed to do and returned a response indistinguishable from “no matching purchase orders exist,” when the real situation was that the query was structurally correct but semantically wrong.

This is not an edge case in SAP environments. Users think in supplier names, customer names, plant names, and material descriptions. The system identifiers (vendor number, plant code, material group) are database keys that users encounter only incidentally. For genuinely unsupported questions, the reranker returns a clean rejection; that part worked. The harder problem was when the system returned a technically successful but semantically empty result.

What this taught me about retrieval vs reasoning

Semantic retrieval solves matching. It does not solve interpretation. In SAP, users ask in business language; the APIs work in system identifiers. Closing that gap, reliably, at scale, requires something beyond embeddings.

Vector search genuinely improved vocabulary mismatch: “vendor invoices not yet paid” finding “Show supplier invoices,” “Open” resolving through a curated status-code pattern. These hold up in practice.

What vector search does not address is the gap between business language and system identifiers. When a user says “SAP SE,” the retrieval system correctly identifies the intent (supplier filter) and correctly identifies the slot (supplier_id). The embedding did its job. The failure is downstream: filling a system identifier slot with a human-readable name. No similarity threshold, no synonym expansion, no reranker fixes this. It requires a lookup, a small MDM problem embedded inside a retrieval problem.

What this implies about trust is equally important: a pattern can be structurally correct and semantically wrong at the same time. HTTP 200 is not a signal of correctness; it is a signal of execution. A zero-record result from a name-based filter is indistinguishable from a zero-record result from a legitimate empty query. The trust pipeline has to account for this, which means execution feedback alone is insufficient as a validation signal.

How patterns earn trust: the full lifecycle

OData spec to LLM generator: reads entity sets and fields, proposes candidate patterns with filter templates
Candidate pattern: enters store as SOURCE=generated, VALIDATED=FALSE
Validation: field checks, allowlist checks, result-shape checks
S/4HANA execution: result-shape and expected outcome checks; pass promotes the pattern
VALIDATED=TRUE: execution count accumulates; repeated failure flags the pattern for review

Trust accumulates through use, not assumption. Promotion requires result-shape checks, repeated successful usage, or human review. HTTP 200 alone is not enough.

What comes next

Three open problems remain within the pattern store model, becoming more acute as the corpus grows.

Score calibration at scale. The 0.65 threshold was calibrated on 139 patterns where the right answer is almost always the top result. With 5,000 patterns across 50 domains, the embedding space becomes denser. Adjacent domain vocabulary competes, the reranker fires more often, and latency compounds. Domain pre-filtering before the cosine comparison (WHERE SERVICE_NAME = ?) is the structural fix, but requires knowing the domain before retrieval.

Slot values as system identifiers. “SAP SE” is the visible case, but the same gap exists for plant codes, material group codes, customer numbers, and cost center codes. Every slot that takes a master data value has this problem. Closing it requires a lookup layer between slot detection and slot filling.

Single-shot retrieval has a reasoning ceiling. The pattern store maps one question to one pattern. That covers a large share of transactional queries. But some questions require chaining: look up a supplier, then find their open POs; find the goods receipt for this invoice; show everything in this Procure-to-Pay process. These cannot be answered by retrieving a single pattern.

Part 3: retrieval based on API semantics

Part 2 improved retrieval inside a pattern-store model. Part 3 moves closer to the A2A goal: API-semantic retrieval, runtime tool loops, and multi-step SAP business capability execution.

Spoiler

Running alongside the pattern store, I explored a different retrieval model, one that embeds entire API specifications rather than individual query patterns, and uses an LLM tool loop to compose queries at runtime rather than filling pre-validated templates. That approach handled “purchase orders by SAP SE” more naturally, because the LLM could chain a supplier lookup before the purchase order query. But it surfaces different questions about trust, scale, and how you validate dynamically generated OData calls. Part 3 covers how that system works, where it scales, and where it does not.

This post reflects a personal learning exercise built on SAP BTP Cloud Foundry with HANA Cloud and SAP AI Core. All views are my own and do not represent SAP's positions, roadmap, or product direction. Code snippets are from personal prototypes and are shared for illustrative purposes only.

Source link