Specialists do the pressure-testing first
Research packet, skeptic pass, six specialist reviews, and simulation all happen before the final score is allowed to land.
SCORING
Each claim first goes through 6 specialized agentic research passes. Six specialist passes are actively trying to break the claim before RealityScore™ applies published weights, documented penalties, and a separate confidence model. The goal is to make every score explainable, auditable, and challengeable.
Research packet, skeptic pass, six specialist reviews, and simulation all happen before the final score is allowed to land.
RealityScore treats direct receipts, benchmark-based inferences, and simulation-backed notes as different classes of evidence on purpose.
A claim can be strong enough to publish as an audit without being strong enough to recommend as a play worth copying.
PROCESS FLOW
The stack is designed to keep direct evidence, benchmark context, and simulation support separate instead of blending them into one confident-looking blob.
X posts or YouTube videos are normalized into the same claim packet with URL, source text, metadata, and traction signals.
Benchmark URLs, comparable claims, source-backed priors, and failure hypotheses are assembled before any scoring happens.
A challenge layer looks for unsupported assumptions, alternative explanations, and the next hard evidence that would actually matter.
Numbers, timeline, costs, proof, execution detail, and repeatability are pressure-tested independently before aggregation.
Monte Carlo and stress-test notes support the review when the economics are concrete enough to model responsibly.
A non-hallucination validator classifies each major note as observed, inferred, or modeled and flags unsupported wording.
The system decides whether a claim is fit for recommendation, fit only for audit, or too weak to publish publicly at all.
TRUST MODEL
That is the main anti-hallucination rule. Strong claims should lean hardest on observed receipts, use inferred context carefully, and label modeled outputs clearly.
Quoted claim text, transcript excerpts, timestamps, screenshots, metrics shown on screen, and links back to the source.
Benchmark-backed notes about market norms, likely weak points, cost realism, and what the evidence still does not prove.
Monte Carlo outputs, break-even stress tests, and scenario checks that help pressure-test the economics without pretending they are direct proof.
HOW THE MODEL WORKS
Specialists collect evidence first. Then deterministic math computes the score, a grounding check penalizes unsupported assertions, and the final gate decides whether the claim is recommendation-ready or audit-only.
RealityScore first captures source text, benchmark links, skeptic notes, and six specialist reviews before the score is computed.
The aggregator applies published weights and penalties, then the non-hallucination layer checks whether the wording is actually grounded in receipts.
Confidence stays separate from score, and the release gate decides whether the claim is strong enough to recommend, useful only as an audit, or too weak to publish.
The trust layer comes from specialist research plus deterministic math: same 6 passes, same weights, same deductions, same confidence model, and now a direct grounding check on the language itself.
RealityScore™ is based on public evidence. If a creator adds better proof, discloses costs, corrects missing information, or supplies stronger receipts, the score can be updated against the same public model. A weak claim can become a stronger audit before it ever becomes a recommendation.
The public rubric stays the same. What changes internally is who is trying to break the claim. Each score is backed by 6 specialist passes that pressure-test one job each before the final math runs.
Checks quantified claims, arithmetic coherence, screenshots, and whether the numbers are concrete enough to audit.
Tests whether the timeline is bounded, sustained, and realistic enough to treat as more than a one-off spike.
Looks for spend, labor, tooling, CAC, and hidden economics that would change the real viability of the claim.
Pushes on the difference between self-reported wins and actual third-party proof from buyers, users, or clients.
Examines whether the mechanics, steps, and constraints are documented clearly enough to study or challenge.
Tests whether a normal operator could actually reproduce the play without hidden leverage or privileged access.
These six axes define the base score before penalties and confidence. Together they answer a simple question: how credible, transparent, and repeatable does this claim look from public evidence?
Raw revenue exports, dashboard screenshots, or verifiable numbers. Vague claims without evidence score lower.
Claims score better when the timeline shows sustained performance instead of a one-off spike or launch-day burst.
Ad spend, tools, labor, and hidden operational costs should be visible enough for someone else to estimate real economics.
Third-party confirmation from buyers, users, or clients carries more weight than self-reported wins with no outside signal.
We look for the actual mechanics: steps taken, channels used, constraints faced, and what happened between start and result.
A strong claim gives another operator enough context to test the play without private access, celebrity distribution, or hidden leverage.
Penalties keep the model from rewarding polished hype. They are applied after the weighted rubric and make the reasons for low scores easier to interpret.
Claim is primarily anchored to selling a course, tool, membership, or service rather than documenting the business itself.
Large numbers appear without enough supporting proof, context, timeline, or a believable path to the stated result.
The claim offers no meaningful data, screenshots, third-party confirmation, or other public evidence to score against.
RealityScore™ can also flag optional incentive-distortion penalties during secondary review when conflicts, upsell dependency, or hidden distribution leverage are explicit enough to document publicly.
Confidence reflects evidence density and signal consistency. It is intentionally separate from the numeric score.
Multiple axes are supported by dense evidence and the claim stays internally consistent under review.
Some evidence is present, but one or more axes still rely on partial proof or unverified assumptions.
Evidence is sparse, contradictory, or too incomplete to treat the score as a strong decision signal.
We filter noise, score the week's loudest claims, and send the few that look realistically worth your time.