Feat : Implementations of RAG pipeline for merchant policy#4
Conversation
This is going to be the first of the iterations to getting in touch with how we can query on local documents to have better responses, and this for now runs on the local ollama model - which is slow but those are considerations for later to optimize or move to cloud. While the current focus will lie on the feature implementation and making / extending it to be accurate for various needs
This is a quick save point right when introducing the return policy (generated via Gemini holding 22 pages for merchants on Indian terms) but the results are very slow. The 22 page document was provided in full without any chunking or embedding and that when queries resulting in a 5 to 8x delay to provide the answer considering everything is run locally on ollama qwen coder 2.7. This will need to be compared again future versions after chunking
Todo: clean up later
📝 WalkthroughWalkthroughAdds a new Maven Spring Boot module ChangesAI RAG Service Module Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.idea/compiler.xml:
- Around line 13-21: Update the IntelliJ annotation-processor configuration so
it does not hard-code a specific Lombok JAR: edit the <processorPath> entry that
currently points to
"$MAVEN_REPOSITORY$/org/projectlombok/lombok/1.18.46/lombok-1.18.46.jar" and
either remove that <entry> entirely or replace it with a Maven-managed reference
(e.g., rely on classpath-based processorPath or the IDE's Maven-managed artifact
resolution) so the <processorPath> and <entry> do not pin lombok to 1.18.46 and
will honor the version declared/managed by the project.
In `@ai-rag-service/.mvn/wrapper/maven-wrapper.properties`:
- Around line 1-3: The maven-wrapper properties file is missing checksum pins,
so add distributionSha256Sum (for the apache-maven-3.9.16-bin.zip referenced by
distributionUrl) and wrapperSha256Sum (for the maven-wrapper binary) to enable
checksum validation: verify the official apache-maven-3.9.16-bin.zip (using
Apache’s PGP/SHA-512 artifacts), compute its SHA-256 and set
distributionSha256Sum=<computed-sha256>, then compute the SHA-256 for the
maven-wrapper jar used by this project and set
wrapperSha256Sum=<computed-sha256>; keep existing keys (wrapperVersion,
distributionType, distributionUrl) unchanged.
In `@ai-rag-service/mvnw`:
- Around line 194-223: The script can fall through without downloading if
neither wget/curl are found and set_java_home fails; add a clear fail-fast error
path by inserting an else branch after the existing 'elif set_java_home; then'
block that calls die with a descriptive message (e.g., "No download mechanism
available: install wget or curl, or ensure Java is available") so failures
surface immediately; reference the existing helpers set_java_home and die and
include the distributionUrl in the message for context.
In
`@ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java`:
- Around line 55-57: Remove the raw System.out.println calls that print model
output and latency in AssistantController (delete the prints of
response.chatResponse() and the "LATENCY ms" line); if latency must be recorded,
replace with a structured logger call that logs only the numeric latency metric
(compute end-start into a long millis variable) using the class logger (e.g.,
logger.info or metrics recording) and do not include the model response text or
any request/context data in logs.
- Around line 53-54: AssistantController currently does a blocking synchronous
call via chatClient.prompt(prompt).call() with no timeouts/retries; update the
client invocation to enforce explicit connection/read timeouts and a retry
policy and/or offload the call to a separate thread so servlet threads aren't
exhausted: change the ChatClient usage in AssistantController to use a
configured non-blocking client or a timed call (e.g., use a WebClient-backed
Spring AI ChatClient or wrap the .call() in a CompletableFuture with a timeout
and proper exception handling), configure retry (exponential backoff, max
attempts) for transient failures, and ensure timeouts are set via configuration
(application.properties) or a dedicated bean that builds the Ollama
client/WebClient with connect/read timeouts and retry filter; alternatively
annotate the controller method to delegate to an `@Async` service method that
performs the timed/retried chatClient.prompt(prompt).call() so the servlet
thread is not blocked.
In `@ai-rag-service/src/main/java/com/mert/airagservice/dto/AskRequest.java`:
- Line 3: Add Bean Validation to the AskRequest DTO and enforce it on the
controller: annotate the record component in AskRequest (the String question)
with validation annotations such as `@NotBlank` and `@Size`(max = 2000) (e.g.,
public record AskRequest(`@NotBlank` `@Size`(max = 2000) String question) { }) and
update the AssistantController.ask method signature to accept a validated body
by adding `@Valid` and `@RequestBody` to the parameter (e.g., public String
ask(`@Valid` `@RequestBody` AskRequest request) throws IOException { ... }), so
empty or oversized prompts are rejected at the DTO boundary.
In
`@ai-rag-service/src/main/java/com/mert/airagservice/service/PdfKnowledgeService.java`:
- Around line 13-18: getPolicyText() in PdfKnowledgeService leaks resources and
re-parses the PDF on every call; fix by loading and extracting the PDF once
(e.g., in the PdfKnowledgeService constructor or a `@PostConstruct` init method)
into a final cached field like policyText, and ensure all IO resources are
closed with try-with-resources (close the InputStream and PDDocument after
extraction); then have getPolicyText() simply return the cached policyText.
Ensure you reference the existing method getPolicyText(), the PDDocument usage
(Loader.loadPDF) and PDFTextStripper extraction when moving logic into init and
add proper exception handling for the startup load.
In `@ai-rag-service/src/main/resources/application.properties`:
- Around line 2-5: The properties file currently hardcodes
spring.ai.ollama.base-url, spring.ai.ollama.chat.model and server.port; change
these to use externalized configuration placeholders (e.g., ${OLLAMA_BASE_URL},
${OLLAMA_CHAT_MODEL}, ${SERVER_PORT}) so values come from environment variables
or a config server at runtime, and provide sensible local defaults only for
development; update any code reading these properties (references to
spring.ai.ollama.base-url and spring.ai.ollama.chat.model) to expect
externalized values rather than hardcoded constants.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d094aec1-48cc-4395-8bf1-bcface122baf
⛔ Files ignored due to path filters (1)
ai-rag-service/src/main/resources/knowledge/merchant-refund-policy.pdfis excluded by!**/*.pdf
📒 Files selected for processing (16)
.idea/compiler.xml.idea/encodings.xml.idea/misc.xml.idea/modules.xmlai-rag-service/.gitattributesai-rag-service/.gitignoreai-rag-service/.mvn/wrapper/maven-wrapper.propertiesai-rag-service/mvnwai-rag-service/mvnw.cmdai-rag-service/pom.xmlai-rag-service/src/main/java/com/mert/airagservice/AiRagServiceApplication.javaai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.javaai-rag-service/src/main/java/com/mert/airagservice/dto/AskRequest.javaai-rag-service/src/main/java/com/mert/airagservice/service/PdfKnowledgeService.javaai-rag-service/src/main/resources/application.propertiesai-rag-service/src/test/java/com/mert/airagservice/AiRagServiceApplicationTests.java
| <profile name="Annotation profile for ai-rag-service" enabled="true"> | ||
| <sourceOutputDir name="target/generated-sources/annotations" /> | ||
| <sourceTestOutputDir name="target/generated-test-sources/test-annotations" /> | ||
| <outputRelativeToContentRoot value="true" /> | ||
| <processorPath useClasspath="false"> | ||
| <entry name="$MAVEN_REPOSITORY$/org/projectlombok/lombok/1.18.46/lombok-1.18.46.jar" /> | ||
| </processorPath> | ||
| <module name="ai-rag-service" /> | ||
| </profile> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate ai-rag-service pom.xml and any other pom that mentions lombok
echo "== POM files containing 'lombok' =="
rg -n --hidden --glob '!*node_modules/*' --glob '*.xml' 'lombok' . || true
# Print the exact lombok dependency/version from ai-rag-service/pom.xml (if present)
if [ -f "ai-rag-service/pom.xml" ]; then
echo "== ai-rag-service/pom.xml (lombok sections) =="
rg -n --hidden --glob '*.xml' 'lombok' ai-rag-service/pom.xml || true
echo
# Attempt to extract the lombok version value
rg -n --hidden 'lombok-([0-9]+\.[0-9]+\.[0-9]+)|<artifactId>lombok</artifactId>|<version>' ai-rag-service/pom.xml || true
else
echo "== ai-rag-service/pom.xml not found at path ai-rag-service/pom.xml =="
fi
# Read the referenced .idea/compiler.xml snippet
echo "== .idea/compiler.xml lines 1-80 =="
if [ -f ".idea/compiler.xml" ]; then
sed -n '1,80p' .idea/compiler.xml
else
echo ".idea/compiler.xml not found"
fiRepository: Rd4dev/Mert
Length of output: 3475
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== ai-rag-service/pom.xml: show Lombok-related blocks with context =="
# Show around each lombok groupId/artifactId occurrence
rg -n --hidden '<groupId>org\.projectlombok</groupId>|<artifactId>lombok</artifactId>' ai-rag-service/pom.xml
echo
# Dump a wider slice around the first Lombok occurrence to capture the version tag
# (adjusting line ranges based on matches)
# Use grep to get line numbers first
first=$(rg -n --hidden '<groupId>org\.projectlombok</groupId>' ai-rag-service/pom.xml | head -n1 | cut -d: -f1 || true)
second=$(rg -n --hidden '<groupId>org\.projectlombok</groupId>' ai-rag-service/pom.xml | sed -n '2p' | cut -d: -f1 || true)
if [ -n "${first:-}" ]; then
start=$((first-20))
end=$((first+60))
echo "== Around first Lombok occurrence (lines $start-$end) =="
sed -n "${start},${end}p" ai-rag-service/pom.xml
fi
if [ -n "${second:-}" ]; then
start=$((second-20))
end=$((second+80))
echo "== Around second Lombok occurrence (lines $start-$end) =="
sed -n "${start},${end}p" ai-rag-service/pom.xml
fi
echo
echo "== Full pom.xml lines 1-160 (small enough?) =="
wc -l ai-rag-service/pom.xml
sed -n '1,160p' ai-rag-service/pom.xmlRepository: Rd4dev/Mert
Length of output: 10910
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --glob '*.xml' 'lombok.*version|<version>.*lombok|1\.18\.[0-9]+' . || true
rg -n --hidden --glob '*.properties' 'lombok' . || true
rg -n --hidden --glob '*.pom' 'lombok' . || trueRepository: Rd4dev/Mert
Length of output: 177
🌐 Web query:
Spring Boot 4.1.0 dependency management Lombok version 1.18.x
💡 Result:
Spring Boot 4.1.0, released on June 10, 2026, manages the Project Lombok dependency automatically through its Bill of Materials (BOM) [1][2]. In this version of Spring Boot, the managed version for Lombok is 1.18.34 [3]. Because Spring Boot provides a curated dependency management system, you generally do not need to specify a version for Lombok in your Maven or Gradle build files when using the Spring Boot starters [4][5]. If you wish to override this managed version, you can do so by setting the lombok.version property in your project's configuration [6][5]. Top results: [1][2][6][3]
Citations:
- 1: https://github.com/spring-projects/spring-boot/releases/tag/v4.1.0
- 2: https://spring.io/blog/2026/06/10/spring-boot-4
- 3: https://github.com/spring-projects/spring-boot/blob/24d14743c48267ba264a823674c16c2b016cb239/spring-boot-project/spring-boot-dependencies/build.gradle
- 4: https://docs.spring.io/spring-boot/reference/using/build-systems.html
- 5: https://docs.spring.io/spring-boot/3.4/gradle-plugin/managing-dependencies.html
- 6: https://docs.spring.io/spring-boot/appendix/dependency-versions/properties.html
Avoid pinning Lombok annotation processor to lombok-1.18.46.jar in IntelliJ
.idea/compiler.xml hard-codes "$MAVEN_REPOSITORY$/org/projectlombok/lombok/1.18.46/lombok-1.18.46.jar, but ai-rag-service/pom.xml declares Lombok without a version (so it’s managed by the Spring Boot parent 4.1.0, which manages Lombok as 1.18.34). Point IntelliJ to the Maven-managed processor artifact instead (or remove the fixed processorPath entry) to avoid annotation processing breaking when the resolved Lombok version changes or isn’t present.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.idea/compiler.xml around lines 13 - 21, Update the IntelliJ
annotation-processor configuration so it does not hard-code a specific Lombok
JAR: edit the <processorPath> entry that currently points to
"$MAVEN_REPOSITORY$/org/projectlombok/lombok/1.18.46/lombok-1.18.46.jar" and
either remove that <entry> entirely or replace it with a Maven-managed reference
(e.g., rely on classpath-based processorPath or the IDE's Maven-managed artifact
resolution) so the <processorPath> and <entry> do not pin lombok to 1.18.46 and
will honor the version declared/managed by the project.
| wrapperVersion=3.3.4 | ||
| distributionType=only-script | ||
| distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.16/apache-maven-3.9.16-bin.zip |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ai-rag-service/.mvn/wrapper/maven-wrapper.properties"
echo "== $FILE =="
if [ -f "$FILE" ]; then
nl -ba "$FILE" | sed -n '1,120p'
else
echo "File not found: $FILE"
exit 1
fiRepository: Rd4dev/Mert
Length of output: 175
🌐 Web query:
Apache Maven 3.9.16 apache-maven-3.9.16-bin.zip SHA-256
💡 Result:
The official Apache Maven distribution directory for version 3.9.16 provides SHA-512 checksum files, but does not provide SHA-256 checksum files [1]. To verify the integrity of the apache-maven-3.9.16-bin.zip file, the Apache Maven project officially recommends verifying the PGP signature [2]. You can download the detached signature file (apache-maven-3.9.16-bin.zip.asc) from the official distribution index [1] and verify it against the public keys used by the Apache Maven developers [2]. If you specifically require the SHA-256 checksum for your own local setup (such as for use in a Maven wrapper configuration), you must calculate it yourself after downloading the file [3]. You can generate the SHA-256 checksum of the file on a system with standard tools using the following command: shasum -a 256 apache-maven-3.9.16-bin.zip Alternatively, if you are using Windows PowerShell, you can use: Get-FileHash apache-maven-3.9.16-bin.zip -Algorithm SHA256
Citations:
- 1: https://downloads.apache.org/maven/maven-3/3.9.16/binaries/
- 2: https://maven.apache.org/download.cgi?Preferred=https%3A%2F%2Fdlcdn.apache.org%2F
- 3: https://relentlesscoding.com/posts/maven-wrapper-integrity-validation/
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ai-rag-service/.mvn/wrapper/maven-wrapper.properties"
echo "== $FILE =="
if [ -f "$FILE" ]; then
sed -n '1,200p' "$FILE"
else
echo "File not found: $FILE"
exit 1
fiRepository: Rd4dev/Mert
Length of output: 282
🌐 Web query:
maven wrapper 3.3.4 distributionSha256Sum distributionSha512Sum properties
💡 Result:
In Maven Wrapper (including version 3.3.4), checksum verification is supported through specific properties in the.mvn/wrapper/maven-wrapper.properties file to ensure the integrity of downloaded files and prevent supply-chain attacks [1][2]. The supported properties for checksum verification are: 1. distributionSha256Sum: Specifies the expected SHA-256 checksum (in hex notation, small caps) of the Maven distribution being downloaded [1][2]. 2. wrapperSha256Sum: Specifies the expected SHA-256 checksum (in hex notation, small caps) of the maven-wrapper.jar file [3][1][2]. Regarding SHA-512 support: While there have been development discussions and pull requests (such as MWRAPPER-117) to introduce distributionSha512Sum and wrapperSha512Sum properties [4], these were not part of the standard, stable feature set for the 3.3.4 release. As of Maven Wrapper 3.3.4, the official documentation and API only explicitly recognize and implement SHA-256 checksum validation [3][1][5][6]. Key points for configuration: - Properties must be added to.mvn/wrapper/maven-wrapper.properties [1][2]. - When using the maven-wrapper-plugin to update or reinstall the wrapper, existing manual properties like distributionSha256Sum may be overwritten or removed unless handled carefully, as the plugin often regenerates the properties file [7]. - Checksums should be provided in hexadecimal format using lowercase letters [1][2].
Citations:
- 1: https://github.com/apache/maven-wrapper/blob/maven-wrapper-3.3.3/src/site/markdown/index.md
- 2: https://maven.apache.org/tools/wrapper/
- 3: https://maven.apache.org/tools/wrapper/maven-wrapper-plugin/wrapper-mojo.html
- 4: MWRAPPER-117 - added distributionSha512Sum and wrapperSha512Sum apache/maven-wrapper#384
- 5: https://maven.apache.org/tools/wrapper/maven-wrapper/apidocs/org/apache/maven/wrapper/WrapperConfiguration.html
- 6: https://maven.apache.org/wrapper/maven-wrapper/apidocs/org/apache/maven/wrapper/WrapperExecutor.html
- 7:
distributionSha256Sumproperty gets removed when upgrading/reinstalling apache/maven-wrapper#367
Pin Maven Wrapper checksums to prevent unchecked downloads
ai-rag-service/.mvn/wrapper/maven-wrapper.properties doesn’t set distributionSha256Sum (and also lacks wrapperSha256Sum), so the wrapper can’t validate downloaded artifacts via checksums beyond TLS.
Apache Maven doesn’t publish SHA-256 for Maven 3.9.16 (it provides SHA-512 and PGP signatures); verify the official apache-maven-3.9.16-bin.zip using those, compute its SHA-256, then pin distributionSha256Sum (and wrapperSha256Sum) to enable checksum validation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/.mvn/wrapper/maven-wrapper.properties` around lines 1 - 3, The
maven-wrapper properties file is missing checksum pins, so add
distributionSha256Sum (for the apache-maven-3.9.16-bin.zip referenced by
distributionUrl) and wrapperSha256Sum (for the maven-wrapper binary) to enable
checksum validation: verify the official apache-maven-3.9.16-bin.zip (using
Apache’s PGP/SHA-512 artifacts), compute its SHA-256 and set
distributionSha256Sum=<computed-sha256>, then compute the SHA-256 for the
maven-wrapper jar used by this project and set
wrapperSha256Sum=<computed-sha256>; keep existing keys (wrapperVersion,
distributionType, distributionUrl) unchanged.
| if [ -z "${MVNW_USERNAME-}" ] && command -v wget >/dev/null; then | ||
| verbose "Found wget ... using wget" | ||
| wget ${__MVNW_QUIET_WGET:+"$__MVNW_QUIET_WGET"} "$distributionUrl" -O "$TMP_DOWNLOAD_DIR/$distributionUrlName" || die "wget: Failed to fetch $distributionUrl" | ||
| elif [ -z "${MVNW_USERNAME-}" ] && command -v curl >/dev/null; then | ||
| verbose "Found curl ... using curl" | ||
| curl ${__MVNW_QUIET_CURL:+"$__MVNW_QUIET_CURL"} -f -L -o "$TMP_DOWNLOAD_DIR/$distributionUrlName" "$distributionUrl" || die "curl: Failed to fetch $distributionUrl" | ||
| elif set_java_home; then | ||
| verbose "Falling back to use Java to download" | ||
| javaSource="$TMP_DOWNLOAD_DIR/Downloader.java" | ||
| targetZip="$TMP_DOWNLOAD_DIR/$distributionUrlName" | ||
| cat >"$javaSource" <<-END | ||
| public class Downloader extends java.net.Authenticator | ||
| { | ||
| protected java.net.PasswordAuthentication getPasswordAuthentication() | ||
| { | ||
| return new java.net.PasswordAuthentication( System.getenv( "MVNW_USERNAME" ), System.getenv( "MVNW_PASSWORD" ).toCharArray() ); | ||
| } | ||
| public static void main( String[] args ) throws Exception | ||
| { | ||
| setDefault( new Downloader() ); | ||
| java.nio.file.Files.copy( java.net.URI.create( args[0] ).toURL().openStream(), java.nio.file.Paths.get( args[1] ).toAbsolutePath().normalize() ); | ||
| } | ||
| } | ||
| END | ||
| # For Cygwin/MinGW, switch paths to Windows format before running javac and java | ||
| verbose " - Compiling Downloader.java ..." | ||
| "$(native_path "$JAVACCMD")" "$(native_path "$javaSource")" || die "Failed to compile Downloader.java" | ||
| verbose " - Running Downloader.java ..." | ||
| "$(native_path "$JAVACMD")" -cp "$(native_path "$TMP_DOWNLOAD_DIR")" Downloader "$distributionUrl" "$(native_path "$targetZip")" | ||
| fi |
There was a problem hiding this comment.
Fail fast when no download mechanism is available.
If wget/curl are missing and set_java_home returns non-zero, the script currently falls through to checksum/extraction with no downloaded archive. That produces a misleading unzip/tar failure instead of a clear bootstrap error.
🔧 Proposed fix
if [ -z "${MVNW_USERNAME-}" ] && command -v wget >/dev/null; then
verbose "Found wget ... using wget"
wget ${__MVNW_QUIET_WGET:+"$__MVNW_QUIET_WGET"} "$distributionUrl" -O "$TMP_DOWNLOAD_DIR/$distributionUrlName" || die "wget: Failed to fetch $distributionUrl"
elif [ -z "${MVNW_USERNAME-}" ] && command -v curl >/dev/null; then
verbose "Found curl ... using curl"
curl ${__MVNW_QUIET_CURL:+"$__MVNW_QUIET_CURL"} -f -L -o "$TMP_DOWNLOAD_DIR/$distributionUrlName" "$distributionUrl" || die "curl: Failed to fetch $distributionUrl"
elif set_java_home; then
verbose "Falling back to use Java to download"
...
+else
+ die "Need wget, curl, or a working JDK to download Maven"
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/mvnw` around lines 194 - 223, The script can fall through
without downloading if neither wget/curl are found and set_java_home fails; add
a clear fail-fast error path by inserting an else branch after the existing
'elif set_java_home; then' block that calls die with a descriptive message
(e.g., "No download mechanism available: install wget or curl, or ensure Java is
available") so failures surface immediately; reference the existing helpers
set_java_home and die and include the distributionUrl in the message for
context.
| ChatClient.CallResponseSpec response = chatClient.prompt(prompt) | ||
| .call(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Locate app config files:"
fd -i 'application*.properties|application*.yml|application*.yaml'
echo
echo "Search for Spring AI/Ollama timeout & retry settings:"
rg -n --iglob 'application*.{properties,yml,yaml}' 'spring\.ai\.ollama|timeout|connect-timeout|read-timeout|retry'
echo
echo "Search Java config for client timeout/retry customization:"
rg -n --type=java -C2 'ChatClient\.Builder|WebClient|RestClient|HttpClient|timeout|Retry'Repository: Rd4dev/Mert
Length of output: 1310
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "All application*.properties/yml/yaml files:"
fd -i 'application*.properties|application*.yml|application*.yaml' .
echo
echo "Search for Ollama / Spring AI timeout & retry properties across config: "
rg -n --iglob '**/application*.{properties,yml,yaml}' '(spring\.ai\.ollama\..*|ollama\..*|timeout|connect-timeout|read-timeout|retry)' .
echo
echo
echo "Search Java for timeout/retry configuration in ChatClient / HTTP clients:"
rg -n --type=java -S '(ChatClient\.Builder|WebClient|RestClient|HttpClient|timeout|Timeout|retry|Retry)' ai-rag-service/src/main/javaRepository: Rd4dev/Mert
Length of output: 899
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Search for ChatClient.Builder creation/config in ai-rag-service:"
rg -n --type=java '`@Bean`|ChatClient\.Builder|ChatClient\.builder|spring\.ai\.ollama|Ollama' ai-rag-service/src/main/java
echo
echo "Search for any custom HttpClient/WebClient/timeout usage in ai-rag-service:"
rg -n --type=java '(WebClient|RestClient|HttpClient|timeout|Timeout|connect-timeout|read-timeout|retry|Retry)' ai-rag-service/src/main/javaRepository: Rd4dev/Mert
Length of output: 389
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Search whole repo for ChatClient usage:"
rg -n --type=java 'ChatClient(\.Builder|\.builder)?|spring\.ai\.ollama|Ollama' .
echo
echo
echo "Search for Spring `@Configuration` / `@Bean` creating ChatClient or HTTP client infrastructure:"
rg -n --type=java '(`@Configuration`|`@Bean`).*(ChatClient|ollama|Ollama|WebClient|RestClient|HttpClient|timeout|Retry|retry)' ai-rag-service/src/main/java
echo
echo "Search for any application-* property files:"
fd -i 'application*.{properties,yml,yaml}' ai-rag-service/src/main/resourcesRepository: Rd4dev/Mert
Length of output: 1043
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java"
nl -ba "$FILE" | sed -n '1,120p'Repository: Rd4dev/Mert
Length of output: 98
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java"
echo "First 140 lines of $FILE:"
cat -n "$FILE" | sed -n '1,140p'Repository: Rd4dev/Mert
Length of output: 2683
Add explicit timeout/retry for the synchronous Ollama/Spring AI call
AssistantControllermakes a blocking call on the request thread:chatClient.prompt(prompt).call()(lines 53-54).ai-rag-service/src/main/resources/application.propertiesonly setsspring.ai.ollama.base-urlandspring.ai.ollama.chat.model; no explicit timeout/retry (connection/read) configuration exists in config or Java.
Configure explicit client timeouts/retry policy for the Ollama/Spring AI call (and consider async/offloading) to avoid servlet-thread exhaustion on slow/hung model responses.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java`
around lines 53 - 54, AssistantController currently does a blocking synchronous
call via chatClient.prompt(prompt).call() with no timeouts/retries; update the
client invocation to enforce explicit connection/read timeouts and a retry
policy and/or offload the call to a separate thread so servlet threads aren't
exhausted: change the ChatClient usage in AssistantController to use a
configured non-blocking client or a timed call (e.g., use a WebClient-backed
Spring AI ChatClient or wrap the .call() in a CompletableFuture with a timeout
and proper exception handling), configure retry (exponential backoff, max
attempts) for transient failures, and ensure timeouts are set via configuration
(application.properties) or a dedicated bean that builds the Ollama
client/WebClient with connect/read timeouts and retry filter; alternatively
annotate the controller method to delegate to an `@Async` service method that
performs the timed/retried chatClient.prompt(prompt).call() so the servlet
thread is not blocked.
| @@ -0,0 +1,3 @@ | |||
| package com.mert.airagservice.dto; | |||
|
|
|||
| public record AskRequest(String question) { } | |||
There was a problem hiding this comment.
Validate question at the DTO boundary.
Line 3 currently accepts null/blank/unbounded input, so /ask can forward empty or oversized prompts to the model path. Add bean validation here and enforce it with @Valid on the controller method argument.
Proposed fix
package com.mert.airagservice.dto;
+import jakarta.validation.constraints.NotBlank;
+import jakarta.validation.constraints.Size;
+
-public record AskRequest(String question) { }
+public record AskRequest(
+ `@NotBlank`(message = "question is required")
+ `@Size`(max = 1000, message = "question is too long")
+ String question
+) { }// In ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java
public String ask(`@Valid` `@RequestBody` AskRequest request) throws IOException { ... }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/src/main/java/com/mert/airagservice/dto/AskRequest.java` at
line 3, Add Bean Validation to the AskRequest DTO and enforce it on the
controller: annotate the record component in AskRequest (the String question)
with validation annotations such as `@NotBlank` and `@Size`(max = 2000) (e.g.,
public record AskRequest(`@NotBlank` `@Size`(max = 2000) String question) { }) and
update the AssistantController.ask method signature to accept a validated body
by adding `@Valid` and `@RequestBody` to the parameter (e.g., public String
ask(`@Valid` `@RequestBody` AskRequest request) throws IOException { ... }), so
empty or oversized prompts are rejected at the DTO boundary.
| spring.ai.ollama.base-url=http://localhost:11434 | ||
| spring.ai.ollama.chat.model=qwen2.5-coder:7b | ||
|
|
||
| server.port=4003 No newline at end of file |
There was a problem hiding this comment.
Externalize runtime configuration instead of hardcoding local endpoints.
Hardcoding Ollama host/model and server port makes this service fragile outside local dev. In non-local environments, requests will fail unless the exact localhost setup is replicated.
Suggested fix
spring.application.name=ai-rag-service
-spring.ai.ollama.base-url=http://localhost:11434
-spring.ai.ollama.chat.model=qwen2.5-coder:7b
+spring.ai.ollama.base-url=${OLLAMA_BASE_URL:http://localhost:11434}
+spring.ai.ollama.chat.model=${OLLAMA_CHAT_MODEL:qwen2.5-coder:7b}
-server.port=4003
+server.port=${SERVER_PORT:4003}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/src/main/resources/application.properties` around lines 2 - 5,
The properties file currently hardcodes spring.ai.ollama.base-url,
spring.ai.ollama.chat.model and server.port; change these to use externalized
configuration placeholders (e.g., ${OLLAMA_BASE_URL}, ${OLLAMA_CHAT_MODEL},
${SERVER_PORT}) so values come from environment variables or a config server at
runtime, and provide sensible local defaults only for development; update any
code reading these properties (references to spring.ai.ollama.base-url and
spring.ai.ollama.chat.model) to expect externalized values rather than hardcoded
constants.
|
Additional Context on the runs: When the prompt context is small (3 lines) the response time was around 4 seconds To extend the context as we incorporate the Pdf context the latency shot over 8x time |
|
… retrieval with LLM The system was very slow initially for every call to handle answer chunk embedding and question embedding reaching over 5 to 6 mins. This is handled by caching the results of the embedding, but still the question chunks need to be embedded so that still costs but overall bringing it down to 24 to 90 seconds, though I heavily doubt that the prolonged time is due to the local model taking time to print them all at once and not the time when it started to push out the answers. As smaller outputs pushed out first uncached question output in 8 seconds and a cached question output in 3 seconds, still a bit on the slower end, but all this is happening in the CPU.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java`:
- Around line 34-37: In AssistantController.ask validate request.question() for
null or blank before calling ragService.buildContext: check that
AskRequest.request.question() is non-null and not empty after trim, and if
invalid throw a client error (e.g. throw new
ResponseStatusException(HttpStatus.BAD_REQUEST, "question must be provided") or
return a 400 response) so embedding is never called with empty input; update the
validation at the start of the ask method to perform this guard and include a
clear error message.
In
`@ai-rag-service/src/main/java/com/mert/airagservice/rag/CosineSimilarity.java`:
- Around line 4-17: The CosineSimilarity.score method must validate inputs:
check that a and b are non-null and have the same length (throw
IllegalArgumentException with a clear message if not), compute dot, normA and
normB as before, and if either norm is zero return 0.0 (instead of performing
the division) to avoid NaN/Infinity; otherwise return dot / (Math.sqrt(normA) *
Math.sqrt(normB)). Ensure these guards are applied at the start of score(float[]
a, float[] b) and keep the core loop as-is once validation passes.
In `@ai-rag-service/src/main/java/com/mert/airagservice/service/RagService.java`:
- Around line 16-17: The chunkEmbeddingCache is declared as a non-thread-safe
HashMap in RagService and is mutated during request handling (symbol:
chunkEmbeddingCache in class RagService), which can race under concurrent
requests; replace the HashMap with a thread-safe map (e.g., new
ConcurrentHashMap<>()) and, where the cache is updated (the mutation sites that
currently call put/contains/putIfAbsent around chunkEmbeddingCache), use atomic
map operations such as computeIfAbsent or putIfAbsent to avoid races; update
imports and ensure no other code assumes HashMap-specific behavior.
- Around line 29-39: The comparator in RagService (when building topChunks)
repeatedly recomputes embeddings and CosineSimilarity.score during sorting;
precompute each chunk's embedding and its cosine score against questionVector
once (e.g., build a Map<String, Float[]> or List of pairs and a Map<String,
Double> scores using chunkEmbeddingCache.computeIfAbsent and
CosineSimilarity.score), then sort chunks using those cached scores (sorting by
the precomputed scoreB/scoreA) and collect the top results; update the code that
references chunkEmbeddingCache, embeddingService.embed, CosineSimilarity.score,
and the topChunks construction to use the precomputed score lookup rather than
computing inside the comparator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1cac9d3b-c095-4b3b-bd43-d6214d036067
📒 Files selected for processing (6)
ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.javaai-rag-service/src/main/java/com/mert/airagservice/rag/CosineSimilarity.javaai-rag-service/src/main/java/com/mert/airagservice/service/EmbeddingService.javaai-rag-service/src/main/java/com/mert/airagservice/service/PdfKnowledgeService.javaai-rag-service/src/main/java/com/mert/airagservice/service/RagService.javaai-rag-service/src/main/resources/application.properties
🚧 Files skipped from review as they are similar to previous changes (1)
- ai-rag-service/src/main/resources/application.properties
| public String ask(@RequestBody AskRequest request) throws IOException { | ||
| long start = System.currentTimeMillis(); | ||
| String context = ragService.buildContext(request.question()); | ||
|
|
There was a problem hiding this comment.
Validate question before calling RAG.
Line [36] dereferences request.question() without guards; null/blank input can cascade into embedding failures and return 500 instead of a client error.
Suggested minimal fix
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;
+import org.springframework.web.server.ResponseStatusException;
+import org.springframework.http.HttpStatus;
@@
`@PostMapping`("/ask")
public String ask(`@RequestBody` AskRequest request) throws IOException {
+ if (request == null || request.question() == null || request.question().isBlank()) {
+ throw new ResponseStatusException(HttpStatus.BAD_REQUEST, "question must not be blank");
+ }
long start = System.currentTimeMillis();
String context = ragService.buildContext(request.question());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java`
around lines 34 - 37, In AssistantController.ask validate request.question() for
null or blank before calling ragService.buildContext: check that
AskRequest.request.question() is non-null and not empty after trim, and if
invalid throw a client error (e.g. throw new
ResponseStatusException(HttpStatus.BAD_REQUEST, "question must be provided") or
return a 400 response) so embedding is never called with empty input; update the
validation at the start of the ask method to perform this guard and include a
clear error message.
| public static double score(float[] a, float[] b) { | ||
|
|
||
| double dot = 0; | ||
| double normA = 0; | ||
| double normB = 0; | ||
|
|
||
| for (int i = 0; i < a.length; i++) { | ||
| dot += a[i] * b[i]; | ||
| normA += a[i] * a[i]; | ||
| normB += b[i] * b[i]; | ||
| } | ||
|
|
||
| return dot / (Math.sqrt(normA) * Math.sqrt(normB)); | ||
| } |
There was a problem hiding this comment.
Guard vector dimensions and zero-norm denominator.
Line [10]-[16] assumes equal lengths and non-zero norms. Mismatched arrays can crash; zero vectors produce NaN/Infinity and corrupt ranking behavior.
Suggested fix
public class CosineSimilarity {
public static double score(float[] a, float[] b) {
+ if (a == null || b == null) {
+ throw new IllegalArgumentException("Vectors must not be null");
+ }
+ if (a.length != b.length) {
+ throw new IllegalArgumentException("Vectors must have the same dimension");
+ }
double dot = 0;
double normA = 0;
double normB = 0;
@@
- return dot / (Math.sqrt(normA) * Math.sqrt(normB));
+ double denominator = Math.sqrt(normA) * Math.sqrt(normB);
+ if (denominator == 0d) {
+ return 0d;
+ }
+ return dot / denominator;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public static double score(float[] a, float[] b) { | |
| double dot = 0; | |
| double normA = 0; | |
| double normB = 0; | |
| for (int i = 0; i < a.length; i++) { | |
| dot += a[i] * b[i]; | |
| normA += a[i] * a[i]; | |
| normB += b[i] * b[i]; | |
| } | |
| return dot / (Math.sqrt(normA) * Math.sqrt(normB)); | |
| } | |
| public static double score(float[] a, float[] b) { | |
| if (a == null || b == null) { | |
| throw new IllegalArgumentException("Vectors must not be null"); | |
| } | |
| if (a.length != b.length) { | |
| throw new IllegalArgumentException("Vectors must have the same dimension"); | |
| } | |
| double dot = 0; | |
| double normA = 0; | |
| double normB = 0; | |
| for (int i = 0; i < a.length; i++) { | |
| dot += a[i] * b[i]; | |
| normA += a[i] * a[i]; | |
| normB += b[i] * b[i]; | |
| } | |
| double denominator = Math.sqrt(normA) * Math.sqrt(normB); | |
| if (denominator == 0d) { | |
| return 0d; | |
| } | |
| return dot / denominator; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/src/main/java/com/mert/airagservice/rag/CosineSimilarity.java`
around lines 4 - 17, The CosineSimilarity.score method must validate inputs:
check that a and b are non-null and have the same length (throw
IllegalArgumentException with a clear message if not), compute dot, normA and
normB as before, and if either norm is zero return 0.0 (instead of performing
the division) to avoid NaN/Infinity; otherwise return dot / (Math.sqrt(normA) *
Math.sqrt(normB)). Ensure these guards are applied at the start of score(float[]
a, float[] b) and keep the core loop as-is once validation passes.
After chunking with top 3 scoresInitial Run Subsequent run |
The time taken to load things on the fly especially on first query is heavy so moving to the postcontruct, while still as things run on the cpu it takes time and question query embedding takes time, but most of it is to print the output
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ai-rag-service/src/main/java/com/mert/airagservice/service/RagService.java`:
- Around line 42-43: buildContext currently calls
embeddingService.embed(question) without validating the input; add a guard at
the start of the buildContext(String question) method to check for null or blank
(e.g., question == null || question.trim().isEmpty()) and throw a clear
IllegalArgumentException (or a service-level BadRequestException) with a message
like "question must not be null or blank" so embeddingService.embed(...) is
never invoked for invalid input; this prevents runtime/model errors when
AskRequest.question() is missing or empty.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6d9fe252-9c73-49cb-bd1f-8b98aaa11033
📒 Files selected for processing (3)
ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.javaai-rag-service/src/main/java/com/mert/airagservice/service/PdfKnowledgeService.javaai-rag-service/src/main/java/com/mert/airagservice/service/RagService.java
💤 Files with no reviewable changes (2)
- ai-rag-service/src/main/java/com/mert/airagservice/controller/AssistantController.java
- ai-rag-service/src/main/java/com/mert/airagservice/service/PdfKnowledgeService.java
| public String buildContext(String question) { | ||
| float[] questionVector = embeddingService.embed(question); |
There was a problem hiding this comment.
Validate question before embedding to prevent avoidable request-time failures.
Line [43] sends question directly to the embedding model. Since AskRequest.question() is not validated in the provided path, null/blank payloads can trigger model/runtime errors and fail /ask with a server error.
Suggested patch
public String buildContext(String question) {
+ if (question == null || question.isBlank()) {
+ throw new IllegalArgumentException("question must not be null or blank");
+ }
+
float[] questionVector = embeddingService.embed(question);
List<String> topChunks = chunks.stream()
.map(chunk -> Map.entry(
chunk,
CosineSimilarity.score(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@ai-rag-service/src/main/java/com/mert/airagservice/service/RagService.java`
around lines 42 - 43, buildContext currently calls
embeddingService.embed(question) without validating the input; add a guard at
the start of the buildContext(String question) method to check for null or blank
(e.g., question == null || question.trim().isEmpty()) and throw a clear
IllegalArgumentException (or a service-level BadRequestException) with a message
like "question must not be null or blank" so embeddingService.embed(...) is
never invoked for invalid input; this prevents runtime/model errors when
AskRequest.question() is missing or empty.
Todo:
|
Resolves #3
Included in this PR
[WIP] - To clear debug logs
Summary by CodeRabbit