Skip to content

[SP-3848] [scanoss.java] Implement CLI scan configuration parameters#37

Open
descorchator wants to merge 1 commit intomainfrom
gepeto/SP-3848
Open

[SP-3848] [scanoss.java] Implement CLI scan configuration parameters#37
descorchator wants to merge 1 commit intomainfrom
gepeto/SP-3848

Conversation

@descorchator
Copy link

@descorchator descorchator commented Feb 26, 2026

[scanoss.java] Implement CLI scan configuration parameters

Changes

Here's a summary of all changes made to implement the CLI scan configuration parameters (SP-3848):

Summary of Changes

New Files

src/main/java/com/scanoss/settings/ScanConfig.java

  • New data class holding scan configuration parameters: minSnippetHits, minSnippetLines, honourFileExts, rankingEnabled, rankingThreshold, skipHeaders, skipHeadersLimit
  • Uses @SerializedName for JSON deserialization from scanoss.json
  • Includes isXxxSet() methods to check if values are "set" (non-default/non-unset)
  • Static resolve() method implementing the 3-layer priority merge: file_snippet (highest) > settings (middle) > CLI (lowest)

src/test/resources/scanoss-scan-config.json

  • Test resource with scan configuration at both settings and settings.file_snippet levels

Modified Files

src/main/java/com/scanoss/settings/ScanossSettings.java

  • Added scan config fields (min_snippet_hits, etc.) and file_snippet sub-section to Settings inner class
  • Added toScanConfig() method on Settings to convert settings-level values to ScanConfig
  • Added getResolvedScanConfig(ScanConfig cliConfig) to resolve effective config from all 3 priority layers

src/main/java/com/scanoss/cli/ScanCommandLine.java

  • Added 7 new CLI options: --min-snippet-hits, --min-snippet-lines, --honour-file-exts, --ranking, --ranking-threshold, --skip-headers, --skip-headers-limit
  • Added buildCliScanConfig() to create ScanConfig from CLI args
  • Added parseTriStateBoolean() for tri-state flags (true/false/unset/omitted)
  • Passes cliScanConfig to Scanner.builder()

src/main/java/com/scanoss/Scanner.java

  • Added cliScanConfig and resolved scanConfig fields
  • In constructor: resolves effective config via settings.getResolvedScanConfig(cliConfig)
  • Passes skipHeaders/skipHeadersLimit to Winnowing builder
  • Passes resolved scanConfig to ScanApi builder

src/main/java/com/scanoss/rest/ScanApi.java

  • Added scanConfig field
  • In scan(): adds non-unset server params (min_snippet_hits, min_snippet_lines, honour_file_exts, ranking_enabled, ranking_threshold) to the multipart request data
  • Note: skip_headers and skip_headers_limit are local-only and are NOT sent to the server

src/main/java/com/scanoss/Winnowing.java

  • Added skipHeaders and skipHeadersLimit builder fields
  • Added detectHeaderLines() method that detects license comment blocks, single-line comments, blank lines, and import/package statements
  • Modified wfpForContents() to skip header lines during snippet fingerprint generation when skipHeaders is enabled

src/test/java/com/scanoss/TestSettings.java

  • Added 5 new test methods: testScanConfigFromJson, testScanConfigResolution, testScanConfigResolutionFromSettingsFile, testScanConfigUnsetValues, testScanConfigResolutionWithNulls, testExistingSettingsWithoutScanConfig
  • Tests cover JSON parsing, priority resolution, unset value handling, and backward compatibility

src/test/java/com/scanoss/TestWinnowing.java

  • Added TestDetectHeaderLines - tests header detection with various code patterns
  • Added TestWinnowingSkipHeaders - tests that skip_headers affects WFP generation while preserving file hash

Key Design Decisions

  1. Priority system: settings.file_snippet > settings > CLI (as specified in ticket)
  2. Unset values: When a setting is unset, it's omitted from the API request entirely, letting server config take precedence
  3. Local vs server params: skip_headers and skip_headers_limit apply locally (winnowing), all others are sent to the SCANOSS scanning engine
  4. Tri-state booleans: --ranking and --honour-file-exts accept true/false/unset string values, mapping to Boolean (null = unset)

This PR was automatically generated by Gepeto Escalante.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added CLI options for scan configuration: minimum snippet hits/lines, file extension handling, ranking settings, and header skipping.
    • Introduced multi-level configuration support with priority-based resolution across CLI flags, configuration files, and file-level overrides.
    • Implemented header-skipping feature to improve file fingerprinting accuracy.
    • Enhanced scanning API to accept additional configuration parameters.
  • Tests

    • Added comprehensive test coverage for configuration resolution and header detection functionality.

@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a multi-tiered scan configuration system enabling CLI, settings file, and per-file snippet tuning of scanning parameters. A new ScanConfig model is added with priority-based resolution (file_snippet > settings > CLI), propagated through Scanner, Winnowing, and ScanApi layers. Winnowing gains header-line detection and skipping capabilities for fingerprint generation.

Changes

Cohort / File(s) Summary
Configuration Model
src/main/java/com/scanoss/settings/ScanConfig.java
New public configuration class with Lombok/Gson support defining min snippet hits/lines, file extension handling, ranking, and header-skipping parameters. Includes static resolve() method implementing three-tier priority merge and setter detection helpers.
Scanner Integration
src/main/java/com/scanoss/Scanner.java
Updated to accept CLI and resolved ScanConfig objects. Initializes settings with defaults, stores CLI config, and resolves final config via settings.getResolvedScanConfig(). Propagates resolved config to Winnowing and ScanApi builders.
Winnowing Header-Skipping
src/main/java/com/scanoss/Winnowing.java
Added skipHeaders and skipHeadersLimit builder fields controlling header-line exclusion during WFP generation. New detectHeaderLines() method identifies comment, blank, and import/package lines; skipped lines excluded from gram/hash computation.
Settings Enhancement
src/main/java/com/scanoss/settings/ScanossSettings.java
Added eight scan configuration fields to Settings inner class with Gson annotations. New Settings.toScanConfig() method converts settings to config model; new getResolvedScanConfig(cliConfig) on parent class merges CLI and settings-level configs.
CLI Integration
src/main/java/com/scanoss/cli/ScanCommandLine.java
Added CLI flags for min-snippet-hits, min-snippet-lines, honour-file-exts, ranking, ranking-threshold, skip-headers, skip-headers-limit. New private buildCliScanConfig() assembles config from CLI values; parseTriStateBoolean() helper converts string tri-state to Boolean/null.
REST API Enhancement
src/main/java/com/scanoss/rest/ScanApi.java
Updated constructor to accept optional ScanConfig parameter. Enhanced POST request body to conditionally include min_snippet_hits, min_snippet_lines, honour_file_exts, ranking_enabled, ranking_threshold when config is provided and values are set.
Test Coverage
src/test/java/com/scanoss/TestSettings.java, src/test/java/com/scanoss/TestWinnowing.java
Added six settings tests validating JSON loading, layered config resolution, unset-value detection, null handling, and backward compatibility. Added two winnowing tests verifying header line detection with Java comments/imports and WFP generation with/without header skipping.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI as ScanCommandLine
    participant Scanner
    participant Settings as ScanossSettings
    participant Config as ScanConfig
    participant Winnowing
    participant ScanApi as ScanApi

    User->>CLI: Provide CLI options (min-snippet-hits, skip-headers, etc.)
    CLI->>CLI: buildCliScanConfig()
    CLI->>Scanner: Scanner.builder()<br/>.cliScanConfig(cliConfig)
    
    Scanner->>Settings: Ensure settings initialized with defaults
    Scanner->>Settings: getResolvedScanConfig(cliScanConfig)
    
    Settings->>Settings: Settings.toScanConfig()
    Settings->>Config: ScanConfig.resolve(cli, settingsLevel, fileSnippet)
    
    Config->>Config: Apply priority merge:<br/>fileSnippet > settings > cli
    Config-->>Settings: Return resolved scanConfig
    
    Settings-->>Scanner: Return resolved scanConfig
    
    Scanner->>Winnowing: builder()<br/>.skipHeaders()<br/>.skipHeadersLimit()
    Winnowing-->>Scanner: Configured builder
    
    Scanner->>ScanApi: builder()<br/>.scanConfig(scanConfig)
    ScanApi-->>Scanner: Configured builder
    
    Scanner->>Winnowing: Generate WFP (with detectHeaderLines if enabled)
    Winnowing-->>Scanner: File fingerprints
    
    Scanner->>ScanApi: POST with config parameters<br/>(min_snippet_hits, ranking, etc.)
    ScanApi-->>Scanner: Scan results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A config cascade, three tiers deep,
CLI whispers, settings sleep,
File snippets reign with headers skipped,
Through Winnowing's grace, priorities flipped!
Resolution merges, the perfect blend—
Let scanning tuning never end! 🎯

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.39% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'SP-3848 [scanoss.java] Implement CLI scan configuration parameters' accurately describes the main feature added: CLI parameters for scan configuration with priority-based resolution and integration across Scanner, Winnowing, and ScanApi layers.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gepeto/SP-3848

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/main/java/com/scanoss/settings/ScanConfig.java (1)

144-146: Asymmetric handling of skipHeaders in merge logic.

The skipHeaders field is handled differently from other boolean fields: it can only be set to true by higher-priority layers, but cannot be overridden to false. This means if CLI sets skipHeaders=true, a higher-priority settings file cannot disable it.

Compare with honourFileExts and rankingEnabled which use isXxxSet() and can be overridden to either true or false.

If this is intentional ("once enabled, always enabled"), consider documenting it. Otherwise, consider aligning with other booleans:

♻️ Suggested fix for consistent boolean handling
-        if (source.getSkipHeaders() != null && source.getSkipHeaders()) {
-            target.setSkipHeaders(true);
+        if (source.getSkipHeaders() != null) {
+            target.setSkipHeaders(source.getSkipHeaders());
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main/java/com/scanoss/settings/ScanConfig.java` around lines 144 - 146,
The merge logic treats skipHeaders asymmetrically by only allowing it to be set
true (using source.getSkipHeaders() and target.setSkipHeaders(true)), preventing
higher-priority layers from setting it false; change the code to follow the
pattern used by honourFileExts and rankingEnabled: check the source's
"isSkipHeadersSet()" (or equivalent isXxxSet method) and, if set, assign the
actual value into target via target.setSkipHeaders(source.getSkipHeaders()) so
higher-priority layers can set it to true or false; if one-way behavior was
intended, add a comment/documentation near getSkipHeaders/isSkipHeadersSet
explaining the "once enabled, always enabled" semantics instead of changing
code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/test/java/com/scanoss/TestSettings.java`:
- Around line 63-66: The setup silently leaves scanConfigSettingsPath null when
getResource("scanoss-scan-config.json") returns null, causing
testScanConfigFromJson to fail; update the setup in TestSettings so that if
getResource("scanoss-scan-config.json") is null it fails fast (throw an
exception with a clear message) following the same pattern used for the existing
scanoss.json check, or alternatively add the missing test resource file
"scanoss-scan-config.json" under src/test/resources; locate the code that
assigns scanConfigSettingsPath (the block using
getClass().getClassLoader().getResource and Paths.get) and either add the
resource or replace the null branch with an exception to ensure
scanConfigSettingsPath is never null.

---

Nitpick comments:
In `@src/main/java/com/scanoss/settings/ScanConfig.java`:
- Around line 144-146: The merge logic treats skipHeaders asymmetrically by only
allowing it to be set true (using source.getSkipHeaders() and
target.setSkipHeaders(true)), preventing higher-priority layers from setting it
false; change the code to follow the pattern used by honourFileExts and
rankingEnabled: check the source's "isSkipHeadersSet()" (or equivalent isXxxSet
method) and, if set, assign the actual value into target via
target.setSkipHeaders(source.getSkipHeaders()) so higher-priority layers can set
it to true or false; if one-way behavior was intended, add a
comment/documentation near getSkipHeaders/isSkipHeadersSet explaining the "once
enabled, always enabled" semantics instead of changing code.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5be762a and 58270eb.

📒 Files selected for processing (8)
  • src/main/java/com/scanoss/Scanner.java
  • src/main/java/com/scanoss/Winnowing.java
  • src/main/java/com/scanoss/cli/ScanCommandLine.java
  • src/main/java/com/scanoss/rest/ScanApi.java
  • src/main/java/com/scanoss/settings/ScanConfig.java
  • src/main/java/com/scanoss/settings/ScanossSettings.java
  • src/test/java/com/scanoss/TestSettings.java
  • src/test/java/com/scanoss/TestWinnowing.java

Comment on lines +63 to +66
var scanConfigResource = getClass().getClassLoader().getResource("scanoss-scan-config.json");
if (scanConfigResource != null) {
scanConfigSettingsPath = Paths.get(scanConfigResource.toURI());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the test resource file exists
echo "=== Searching for scanoss-scan-config.json ==="
fd -t f "scanoss-scan-config.json" src/test/resources 2>&1 || echo "Not found in src/test/resources"

echo ""
echo "=== Checking all test resources ==="
fd -t f . src/test/resources 2>&1 | head -20

echo ""
echo "=== Searching for references to scanoss-scan-config.json ==="
rg "scanoss-scan-config.json" --type java

Repository: scanoss/scanoss.java

Length of output: 398


🏁 Script executed:

#!/bin/bash
# Read the TestSettings.java file to check the actual implementation
echo "=== Reading TestSettings.java ==="
wc -l src/test/java/com/scanoss/TestSettings.java

echo ""
echo "=== Lines 60-70 (context around lines 63-66) ==="
sed -n '60,70p' src/test/java/com/scanoss/TestSettings.java

echo ""
echo "=== Lines 155-165 (context around lines 156-161) ==="
sed -n '155,165p' src/test/java/com/scanoss/TestSettings.java

Repository: scanoss/scanoss.java

Length of output: 1328


🏁 Script executed:

#!/bin/bash
# Check the scanConfigSettingsPath field declaration
echo "=== Searching for scanConfigSettingsPath declaration ==="
rg "scanConfigSettingsPath" src/test/java/com/scanoss/TestSettings.java -B 2 -A 2

echo ""
echo "=== Full `@Before` or `@BeforeEach` setup method ==="
sed -n '50,80p' src/test/java/com/scanoss/TestSettings.java

Repository: scanoss/scanoss.java

Length of output: 2332


🏁 Script executed:

#!/bin/bash
# Check if there are any other files that create this config file
echo "=== Searching for any test setup or creation of scanoss-scan-config.json ==="
rg -r "scanoss-scan-config" --type java --type json

echo ""
echo "=== Check git history for the file ==="
git log --name-status --oneline -- "*scanoss-scan-config*" 2>&1 | head -20

Repository: scanoss/scanoss.java

Length of output: 594


Pipeline failure: Missing test resource scanoss-scan-config.json.

The test testScanConfigFromJson fails at line 160 because the resource file scanoss-scan-config.json does not exist. The setup at lines 63-66 silently leaves scanConfigSettingsPath as null when the resource is missing, causing the assertion to fail.

Either add the missing test resource file to src/test/resources/scanoss-scan-config.json, or throw an error in the setup to match the pattern used for scanoss.json:

Suggested fix: Throw error for missing resource
         var scanConfigResource = getClass().getClassLoader().getResource("scanoss-scan-config.json");
-        if (scanConfigResource != null) {
-            scanConfigSettingsPath = Paths.get(scanConfigResource.toURI());
+        if (scanConfigResource == null) {
+            throw new IllegalStateException(
+                    "Required test resource 'scanoss-scan-config.json' not found. Please ensure it exists in src/test/resources/"
+            );
         }
+        scanConfigSettingsPath = Paths.get(scanConfigResource.toURI());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var scanConfigResource = getClass().getClassLoader().getResource("scanoss-scan-config.json");
if (scanConfigResource != null) {
scanConfigSettingsPath = Paths.get(scanConfigResource.toURI());
}
var scanConfigResource = getClass().getClassLoader().getResource("scanoss-scan-config.json");
if (scanConfigResource == null) {
throw new IllegalStateException(
"Required test resource 'scanoss-scan-config.json' not found. Please ensure it exists in src/test/resources/"
);
}
scanConfigSettingsPath = Paths.get(scanConfigResource.toURI());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/test/java/com/scanoss/TestSettings.java` around lines 63 - 66, The setup
silently leaves scanConfigSettingsPath null when
getResource("scanoss-scan-config.json") returns null, causing
testScanConfigFromJson to fail; update the setup in TestSettings so that if
getResource("scanoss-scan-config.json") is null it fails fast (throw an
exception with a clear message) following the same pattern used for the existing
scanoss.json check, or alternatively add the missing test resource file
"scanoss-scan-config.json" under src/test/resources; locate the code that
assigns scanConfigSettingsPath (the block using
getClass().getClassLoader().getResource and Paths.get) and either add the
resource or replace the null branch with an exception to ensure
scanConfigSettingsPath is never null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant