SVRIS
SourceVerify Reference Identity Standard
Foreword
The SourceVerify Reference Identity Standard (SVRIS) establishes a transparent method for evaluating whether a citation refers to one or more real documents. The standard prioritizes transparency, auditability, and explainability in scientific, academic, policy, and technical environments, including automated systems that generate or validate citations.
SVRIS opens the verification process to inspection and audit. Every decision is traceable, every comparison is recorded, and every outcome is explainable.
1. Introduction
1.1 Purpose
This standard defines the requirements and procedures for determining whether a citation corresponds to:
- a uniquely identified real document (VERIFIED),
- a uniquely identified real document, but the citation has errors or weak evidence (VERIFIED WITH ERRORS),
- multiple possible documents or ambiguous evidence (NEEDS HUMAN REVIEW), or
- no matching document (UNVERIFIED).
1.2 Scope
SVRIS applies to all citation formats and all document types, including academic articles, books, reports, datasets, and websites. It does not require structured or complete citations.
1.3 Design Principles
SVRIS is grounded in four principles:
- Transparency – every comparison decision is recorded with reasoning.
- Auditability – outcomes are explainable and manually verifiable.
- Safety – ambiguous or underdetermined citations are flagged, not misclassified.
- Generality – applicable across fields and source types.
1.4 Audience
This document targets:
- software engineers,
- AI and LLM developers,
- research integrity officers,
- librarians,
- policy analysts,
- scientific publishers.
2. Terms and Definitions
Citation – A reference containing zero or more metadata fields describing a document.
Candidate Document – A real document record available for comparison.
Metadata Field – A structured element (e.g., title, authors, year).
Normalized String – A string transformed for comparison (lowercase, punctuation removed).
Consistent Document – A candidate document for which no field comparison yields a CONTRADICTION.
3. Metadata Fields
SVRIS evaluates identity fields—the metadata sufficient to uniquely locate a document:
- Title
- Authors (list)
- Year
- Venue (journal, conference, publisher, organization)
- Identifier (DOI, URL, ISBN, report number)
Why these fields? The primary question SVRIS answers is: "Does this document exist?" Once a document is located via title, authors, year, venue, or identifier, its identity is established. Supplementary bibliographic details (volume, issue, pages) may be useful for formatting but do not affect whether the citation refers to a real document.
Only fields present in the citation are evaluated. Missing citation fields are treated as ABSENT.
4. Normalization
Before comparison, strings are normalized to handle common variations:
Title normalization:
- Lowercase, all non-alphanumeric characters removed
- Handles hyphen-space inconsistencies, punctuation differences
Standard normalization (authors, venue, identifier):
- Lowercase, punctuation removed, whitespace collapsed
Normalization ensures that superficial differences (capitalization, punctuation) don't cause false mismatches.
5. Field Labels
Each field comparison yields exactly one of six labels:
5.1 MATCH
Citation and document fields are equivalent after normalization. This includes exact matches and abbreviation matches.
5.2 MATCH_WITH_TYPO
Fields match but contain minor spelling errors or typos. This is displayed separately for transparency, but counts as MATCH in verification logic.
5.3 CONTAINS
Partial overlap found:
- The citation field is a contiguous substring of the document field
- One title contains the other (e.g., with subtitle)
- Some but not all authors overlap
- Year within ±1 (online-first vs print dates)
5.4 ABSENT
The citation does NOT provide this field. There is nothing to verify.
ABSENT is neutral — the citation made no claim about this field, so there is nothing to confirm or contradict. An omission is not an error in the strong sense.
5.5 UNCONFIRMED
The citation provides this field, but we could not verify it:
- DOI not found in databases
- Evidence source lacks data for this field
- Search returned no results that include this field
UNCONFIRMED is more negative than ABSENT. The citation made a claim we could not verify.
5.6 CONTRADICTION
Evidence explicitly shows a DIFFERENT value than the citation claims:
- Mismatching years (2+ years apart)
- No overlapping author surnames
- No shared title substring
- Venue wrong or unrelated
- Differing identifiers
KEY RULE: "Not found" = UNCONFIRMED, not CONTRADICTION. CONTRADICTION requires evidence of a DIFFERENT value.
6. Field Evaluation
6.1 Title
- Citation title ABSENT → ABSENT
- Citation has title, evidence lacks title → UNCONFIRMED
- Titles match after normalization → MATCH
- Titles match with minor typos → MATCH_WITH_TYPO
- One title contains the other → CONTAINS
- Titles clearly different → CONTRADICTION
6.2 Authors
- Citation authors empty/missing → ABSENT
- Citation has authors, evidence lacks author data → UNCONFIRMED
- All citation authors found in evidence → MATCH
- Authors match with spelling variations → MATCH_WITH_TYPO
- Some authors overlap → CONTAINS
- No author overlap → CONTRADICTION
"Et al." handling: Filter out "et al." and similar abbreviations. Compare only the named authors.
6.3 Year
- Missing in citation → ABSENT
- Citation has year, evidence lacks year → UNCONFIRMED
- Years equal → MATCH
- Off by 1 year → CONTAINS (online-first vs print dates)
- Off by 2+ years → CONTRADICTION
6.4 Venue
- Citation venue ABSENT → ABSENT
- Citation has venue, evidence lacks venue → UNCONFIRMED
- Venues match → MATCH
- Match with typos → MATCH_WITH_TYPO
- Abbreviation matches full name → MATCH
- Partial overlap (substring, word overlap) → CONTAINS
- Clearly different venues → CONTRADICTION
6.5 Identifier (DOI and URL)
DOI and URL are evaluated separately but combined into a single identifier result for status calculation.
6.5.1 DOI Evaluation
- Missing in citation → ABSENT
- Not found in database → UNCONFIRMED
- Identical or contains → MATCH
- Different DOI → CONTRADICTION
6.5.2 URL Evaluation
URL can be verified through multiple methods (in priority order):
-
URL Content Verification (highest priority)
- If check_url returns page content, LLM verifies if content matches citation
- MATCH: Page content clearly matches citation (title, authors visible)
- CONTRADICTION: Page shows different article
- UNCONFIRMED: Page has insufficient details (paywall, login, generic page)
-
URL Found in Evidence
- If URL appears in search results for this article → MATCH
- Fallback when URL content verification returns UNCONFIRMED
-
Direct URL Comparison
- If candidate has same URL → MATCH
- Different URLs → LLM comparison or UNCONFIRMED
-
URL Check Failed
- If URL returns 404 or error → fall back to evidence check or candidate comparison
- If no evidence and no candidate URL → UNCONFIRMED
6.5.3 Combined Identifier Result
The identifier field uses the best result from DOI and URL comparisons:
Priority (highest to lowest):
- MATCH (either DOI or URL matches → identifier = MATCH)
- MATCH_WITH_TYPO
- CONTAINS
- CONTRADICTION (wins over UNCONFIRMED - evidence of mismatch is important)
- UNCONFIRMED
- ABSENT
Example:
- DOI = CONTRADICTION, URL = UNCONFIRMED → identifier = CONTRADICTION
- DOI = UNCONFIRMED, URL = MATCH → identifier = MATCH
- DOI = MATCH, URL = CONTRADICTION → identifier = MATCH (positive wins)
7. Key Concepts
7.1 Identifying Fields
Title and identifier (DOI) are identifying fields—they point to a specific document. Venue, authors, and year are supporting fields—they corroborate but cannot identify alone.
VERIFIED requires title MATCH or identifier MATCH, plus corroboration.
VERIFIED WITH ERRORS may be achieved in two ways:
- Title or identifier evidence with minor issues (CONTAINS, or single CONTRADICTION with strong support), OR
- Title ABSENT in citation (not provided) with 2+ MATCHes on other fields—the document was found despite missing title.
UNVERIFIED when title is UNCONFIRMED (citation provided title but no match found) and no identifier MATCH.
7.2 Corroboration
A single MATCH is not enough for verification. Corroboration strength determines outcome:
Strong corroboration (VERIFIED):
- ≥3 total MATCHes, OR
- ≥2 MATCHes + ≥2 CONTAINS
Weak corroboration (VERIFIED WITH ERRORS):
- 2 MATCHes + 0-1 CONTAINS, OR
- 1 MATCH + 1 CONTAINS
7.3 Consistency
A candidate is consistent iff no field yields CONTRADICTION.
8. Classification Outcomes
Outcomes are determined by the following decision tree, evaluated in order:
8.1 VERIFIED
Requirements (ALL must be true):
- No CONTRADICTION
- title = MATCH or identifier = MATCH
- Strong supporting evidence (one of the following):
- ≥3 total MATCHes (title/identifier + 2 others), OR
- ≥2 MATCHes + ≥2 CONTAINS (2 CONTAINS can substitute for 1 MATCH)
Rationale: Title + one other match (e.g., title + year) is too weak—many articles could share those. Verification requires a definitive identifier (title or DOI) plus strong corroboration.
8.2 VERIFIED WITH ERRORS
Checked in order:
8.2.0 Definitive Double Match (Highest Priority)
- title = MATCH AND identifier = MATCH
If both title AND identifier match, this is definitive proof of the same document. The outcome is AT MINIMUM verified-with-error, regardless of contradictions in other fields (authors, venue, year). The same title at the same DOI/URL cannot be a different document.
8.2.1 Strong Match Override
- title = MATCH AND 1+ other MATCHes AND ≤1 CONTRADICTION
Handles metadata discrepancies (e.g., corporate vs individual authors) when strong positive evidence exists.
8.2.2 Absent Title But Found
- title = ABSENT AND no CONTRADICTION AND 2+ MATCHes
Citation lacked a title, but the article was found by other fields (e.g., DOI + authors + year).
8.2.3 Strong Corroboration With Absent Identifying Fields
- 3+ MATCHes AND (title = ABSENT or MATCH or CONTAINS) AND (identifier = ABSENT or MATCH or CONTAINS) AND no CONTRADICTION
When authors, year, and venue all match, but title and/or identifier are absent, the document is still verified with high confidence. The same authors publishing in the same venue in the same year is strong evidence.
8.2.4 CONTAINS With Corroboration
- No CONTRADICTION AND has CONTAINS AND has ≥1 MATCH
Partial match (CONTAINS) backed by at least one exact match (MATCH).
8.3 NEEDS HUMAN REVIEW
Found something but evidence is ambiguous. Requires meaningful evidence, not just a single weak signal. Checked in order:
8.3.1 Title Strong Match Alone
- title = MATCH (or MATCH_WITH_TYPO) AND no CONTRADICTION
A strong title match alone (without corroboration) is worth human review. The exact title was found, but no other fields confirm it's the right document.
8.3.2 Title/DOI Match With Some Support
- (title OR identifier) = MATCH AND 2+ positive signals AND no CONTRADICTION
Found exact title or DOI with some corroborating evidence.
8.3.3 Borderline With Contradiction
- title = MATCH AND CONTAINS AND 1 CONTRADICTION (no other MATCH)
- OR title = CONTAINS AND MATCH AND 1 CONTRADICTION
Found title with some support, but one field contradicts. Human judgment needed.
8.3.4 Significant Matches With Contradiction
- 2+ positive signals AND has CONTRADICTION AND positives > CONTRADICTIONs
- Exception: Does NOT apply if title is CONTRADICTION (see 8.3.5)
Found a document with significant matching fields, but a supporting field (authors, year, venue) contradicts.
8.3.5 Title CONTRADICTION With Identifier Match
- title = CONTRADICTION AND identifier = MATCH AND 2+ positive signals
Even when the identifier (DOI/URL) matches, a title contradiction is a significant red flag. The user might have cited the wrong article with an incorrect DOI. Requires human review to confirm.
8.3.6 Multiple CONTAINS
- 2+ CONTAINS AND ≤2 inconclusives AND no CONTRADICTION
Multiple partial matches without strong matches may still be worth reviewing.
8.4 UNVERIFIED
Default fallback when none of the above conditions are met:
- Article not found (title UNCONFIRMED, no identifier MATCH)
- Title CONTRADICTION without identifier match – strong evidence of different article
- No identifying match (only supporting fields match)
- Single CONTAINS with mostly inconclusives (not enough evidence)
- Title contains alone (no support)
- Evenly split signals (MATCHes ≤ CONTRADICTIONs)
9. Transparency and Audit Trail
9.1 Recorded Information
For every verification, SVRIS records:
- Citation fields – the original citation values
- Candidate fields – values from each evidence source
- Normalized values – what was actually compared
- Field labels – the result for each field (MATCH, CONTAINS, etc.)
- Reasoning – why each label was assigned
- Final status – the verification outcome
9.2 Explainability
Every outcome can be explained:
- "Title matched exactly after normalization"
- "Authors matched with minor typo: 'Smth' vs 'Smith'"
- "Years differ by 3 (2019 vs 2022) → CONTRADICTION"
- "Venue 'J Econ Perspect' matches abbreviation of 'Journal of Economic Perspectives'"
9.3 Audit Support
The recorded information enables:
- Manual review of any verification decision
- Identification of systematic issues (e.g., sources that frequently fail)
- Research queries across verification results
- Debugging and quality improvement
Annex A (Examples)
A.1 Title Matching
Example A1.1 — MATCH Citation: "Machine Learning in Health" Document: same → MATCH
Example A1.2 — MATCH_WITH_TYPO Citation: "Machine Lerning in Health" (typo) Document: "Machine Learning in Health" → MATCH_WITH_TYPO
Example A1.3 — CONTAINS (substring) Citation: "Machine Learning" Document: "Machine Learning in Health" → CONTAINS
Example A1.4 — CONTRADICTION Citation: "Deep Reinforcement Learning" Document: "Statistical Methods in Genomics" → CONTRADICTION
A.2 Authors
Example A2.1 — MATCH Citation authors: Smith; Johnson Document authors: John Smith; Emily Johnson → MATCH
Example A2.2 — MATCH_WITH_TYPO Citation authors: Smth; Jonson Document authors: Smith; Johnson → MATCH_WITH_TYPO
Example A2.3 — CONTAINS Citation: Smith Document: Smith; Patel; Gomez → CONTAINS
Example A2.4 — CONTRADICTION Citation: Smith Document: Patel; Gomez → CONTRADICTION
A.3 Year
Example A3.1 — MATCH Citation: 2021 Document: 2021 → MATCH
Example A3.2 — CONTAINS (off by 1 year) Citation: 2019 Document: 2020 → CONTAINS
Example A3.3 — CONTRADICTION Citation: 2019 Document: 2022 → CONTRADICTION (off by 2+ years)
A.4 Venue Matching
Example A4.1 — MATCH Citation: "nature climate change" Document: "nature climate change" → MATCH
Example A4.2 — MATCH (abbreviation) Citation: "J Econ Perspect" Document: "Journal of Economic Perspectives" → MATCH
Example A4.3 — CONTAINS (substring) Citation: "world bank" Document: "the world bank group" → CONTAINS
Example A4.4 — CONTRADICTION Citation: "Nature" Document: "Science" → CONTRADICTION
A.5 Combined Examples
Example A5.1 — VERIFIED (3 MATCHes) title = MATCH, authors = MATCH, year = MATCH, venue = ABSENT, identifier = UNCONFIRMED → 3 MATCHes (title + 2 others) = VERIFIED
Example A5.2 — VERIFIED (2 MATCHes + 2 CONTAINS) title = MATCH, venue = MATCH, authors = CONTAINS, year = CONTAINS, identifier = ABSENT → 2 MATCHes + 2 CONTAINS = VERIFIED
Example A5.3 — VERIFIED WITH ERRORS (2 MATCHes + 1 CONTAINS) title = MATCH, authors = MATCH, year = CONTAINS, venue = ABSENT, identifier = ABSENT → 2 MATCHes + 1 CONTAINS = VERIFIED WITH ERRORS
Example A5.4 — VERIFIED WITH ERRORS (typo match) title = MATCH_WITH_TYPO, authors = MATCH, year = MATCH, venue = ABSENT, identifier = ABSENT → 3 MATCHes (MATCH_WITH_TYPO counts as MATCH) = VERIFIED
Example A5.5 — NEEDS HUMAN REVIEW (title only) title = MATCH, authors = ABSENT, year = ABSENT, venue = ABSENT, identifier = ABSENT → title MATCH alone (no corroboration) = NEEDS HUMAN REVIEW
Example A5.6 — UNVERIFIED (article not found) title = UNCONFIRMED, authors = MATCH, year = MATCH, venue = MATCH, identifier = ABSENT → title UNCONFIRMED + no identifier MATCH = UNVERIFIED
Annex B (Quick Reference)
B.1 Outcome Decision Table
VERIFIED
| Scenario | Requirement |
|---|---|
| Strong evidence | title/DOI MATCH + 2 other MATCHes |
| CONTAINS substitute | title/DOI MATCH + 1 other MATCH + 2 CONTAINS |
VERIFIED WITH ERRORS
| Scenario | Requirement |
|---|---|
| Definitive double | title MATCH + identifier MATCH |
| Weak corroboration | 2 MATCHes only (title + 1 other) |
| Partial corroboration | 2 MATCHes + 1 CONTAINS |
| Single contradiction | title MATCH + other MATCH + 1 CONTRADICTION |
| Title absent but found | title ABSENT + 2+ other MATCHes |
| Strong corroboration, absent identifiers | 3+ MATCHes + title/identifier ABSENT (not CONTRADICTION) |
| Partial title | title CONTAINS + 1 MATCH |
NEEDS HUMAN REVIEW
| Scenario | Requirement |
|---|---|
| Title strong match alone | title MATCH only (no corroboration, no contradiction) |
| Title match with support | title MATCH + 2+ positive signals |
| Borderline conflict | title MATCH + CONTAINS + 1 CONTRADICTION |
| Significant matches with conflict | 2+ positives + supporting field CONTRADICTION (not title/id) |
| Title contradiction with identifier | title CONTRADICTION + identifier MATCH + positive signals |
| Multiple contains | 2+ CONTAINS + ≤2 inconclusives |
UNVERIFIED
| Scenario | Requirement |
|---|---|
| Article not found | title UNCONFIRMED + no identifier MATCH |
| Title contradiction | title CONTRADICTION + no identifier MATCH (likely different article) |
| No identifying match | only supporting fields match (authors/year/venue) |
| Single CONTAINS alone | 1 CONTAINS with mostly inconclusives |
| Partial title alone | title CONTAINS only (no support) |
| Evenly split signals | MATCHes ≤ CONTRADICTIONs |
End of SVRIS Standard.