Skip to main content

Quality Scores — origin and limitations

Every enriched product carries five quality-related fields:

  • durability_score — 1–10
  • quality_perception — 1–10
  • value_for_money — 1–10
  • price_positioningbudget / mid-range / premium / luxury
  • typical_competitors — array of strings

This page documents where those numbers come from, what they can and can't tell you, and the roadmap to improve them.

TL;DR

These are LLM-inferred opinions based on your catalog text. They are not backed by reviews, returns, certifications, sales data or any external evidence — yet.

Origin

The values are produced in a single LLM call per product, using this prompt (simplified — source: apps/products-api/src/lib/promptTemplates.ts):

You are a product analyst. Analyze the product below and return a single JSON object.

Product
-------
Title: {{title}}
Description: {{description}}
Price: {{price}} {{currency}}
Vendor: {{vendor}}
Categories: {{categories}}
Enrichment hints: {{enrichment_hints}} # from the Enrichment Wizard, if filled

Return ONLY valid JSON:
{
"durability_score": <number 1-10>,
"quality_perception": <number 1-10>,
"value_for_money": <number 1-10>,
"typical_competitors": ["<competitor 1>", "<competitor 2>"],
"price_positioning": "<budget|mid-range|premium|luxury>"
}

What the LLM uses

  • Title
  • Description
  • Price + currency
  • Vendor
  • Categories
  • Optional enrichment_hints from the Enrichment Wizard

What the LLM does NOT use

  • Real reviews (Google / Trustpilot / internal)
  • Returns rate
  • Sales / conversion data
  • External catalog comparatives
  • Manufacturer certifications (ISO, CE, GOTS, OEKO-TEX, MIL-STD, etc.)
  • Lifecycle test results
  • Sustainability databases

Consequences

  1. Scores are synthetic opinions — not verified facts.
  2. Reproducibility is not guaranteed — same product can score differently across runs if the model temperature is non-zero.
  3. Useful relatively, not absolutely — the LLM has common sense about market tiers (Hermès → luxury, Primark → budget) so scores differentiate products inside your own catalog. They're not a benchmark against competitors.
  4. typical_competitors has the same limitation — inferred, not verified.

UI disclosure

The dashboard shows a badge above every Quality Scores block indicating the evidence level:

BadgeMeaning
🟢 Data-groundedExternal evidence backs the score (future: reviews, returns, certifications)
🔵 AI + owner hintsEnrichment Wizard was filled for this product
🟡 AI-inferredPure LLM opinion (default — synthetic)

The level is stored in metadata.scores_evidence_level and persisted across re-enrichments.

How scores improve over time

Scores start as AI-inferred (pure LLM opinion) and become more accurate as you provide additional evidence:

  • Fill the Enrichment Wizard — answering questions about warranty, certifications, and materials upgrades your scores to AI + owner hints.
  • Add verified competitors — replacing LLM-inferred competitors with your own verified list in the Edit tab gives the system ground truth for positioning.
  • External data integrations — connecting Google Analytics, Search Console, or reviews data enables data-grounded scores backed by real-world evidence.

The three-badge system (data-grounded, AI + hints, AI-inferred) always shows the current evidence level so you know how much to trust each score.