Quality Scores — origin and limitations

Every enriched product carries five quality-related fields:

durability_score — 1–10
quality_perception — 1–10
value_for_money — 1–10
price_positioning — budget / mid-range / premium / luxury
typical_competitors — array of strings

This page documents where those numbers come from, what they can and can't tell you, and the roadmap to improve them.

TL;DR

These are LLM-inferred opinions based on your catalog text. They are not backed by reviews, returns, certifications, sales data or any external evidence — yet.

Origin

The values are produced in a single LLM call per product, using this prompt (simplified — source: apps/products-api/src/lib/promptTemplates.ts):

You are a product analyst. Analyze the product below and return a single JSON object.

Product
-------
Title: {{title}}
Description: {{description}}
Price: {{price}} {{currency}}
Vendor: {{vendor}}
Categories: {{categories}}
Enrichment hints: {{enrichment_hints}}    # from the Enrichment Wizard, if filled

Return ONLY valid JSON:
{
  "durability_score": <number 1-10>,
  "quality_perception": <number 1-10>,
  "value_for_money": <number 1-10>,
  "typical_competitors": ["<competitor 1>", "<competitor 2>"],
  "price_positioning": "<budget|mid-range|premium|luxury>"
}

What the LLM uses

Title
Description
Price + currency
Vendor
Categories
Optional enrichment_hints from the Enrichment Wizard

What the LLM does NOT use

Real reviews (Google / Trustpilot / internal)
Returns rate
Sales / conversion data
External catalog comparatives
Manufacturer certifications (ISO, CE, GOTS, OEKO-TEX, MIL-STD, etc.)
Lifecycle test results
Sustainability databases

Consequences

Scores are synthetic opinions — not verified facts.
Reproducibility is not guaranteed — same product can score differently across runs if the model temperature is non-zero.
Useful relatively, not absolutely — the LLM has common sense about market tiers (Hermès → luxury, Primark → budget) so scores differentiate products inside your own catalog. They're not a benchmark against competitors.
typical_competitors has the same limitation — inferred, not verified.

UI disclosure

The dashboard shows a badge above every Quality Scores block indicating the evidence level:

Badge	Meaning
🟢 Data-grounded	External evidence backs the score (future: reviews, returns, certifications)
🔵 AI + owner hints	Enrichment Wizard was filled for this product
🟡 AI-inferred	Pure LLM opinion (default — synthetic)

The level is stored in metadata.scores_evidence_level and persisted across re-enrichments.

How scores improve over time

Scores start as AI-inferred (pure LLM opinion) and become more accurate as you provide additional evidence:

Fill the Enrichment Wizard — answering questions about warranty, certifications, and materials upgrades your scores to AI + owner hints.
Add verified competitors — replacing LLM-inferred competitors with your own verified list in the Edit tab gives the system ground truth for positioning.
External data integrations — connecting Google Analytics, Search Console, or reviews data enables data-grounded scores backed by real-world evidence.

The three-badge system (data-grounded, AI + hints, AI-inferred) always shows the current evidence level so you know how much to trust each score.

Origin​

What the LLM uses​

What the LLM does NOT use​

Consequences​

UI disclosure​

How scores improve over time​