Contents
Why this matters
AI finance fails when systems treat all information as equally reliable. A price from a live feed, a model-generated sentiment score, a backtest result, and a missing data point are not the same kind of input. Yet most financial systems consume them as if they were.
Truth-state labeling is how a system separates observation from assumption, simulation from fact, and uncertainty from action. Without it, models and operators operate in an uncritical fog of false confidence.
This taxonomy is an operating standard. It defines what the system must know about every data point before that data point is allowed to influence reasoning, simulation, or governance.
Live
Definition
Data that is directly observed, validated, and within freshness thresholds.
When it appears
When a primary source confirms a reading in real time or near-real time, and the ingestion pipeline has validated checksums, timestamps, and source authority.
Example
A real-time equity price from an exchange feed with a verified timestamp and no pipeline delay exceeding the defined freshness threshold.
Operator risk
Operator may over-weight recency. Live data can still be wrong if the source itself is corrupted or manipulated.
Allowed system behavior
Use as primary input for reasoning and simulation. Flag for operator review if the signal contradicts other live sources.
Prohibited system behavior
Treat as immutable fact without cross-source validation. Skip freshness checks because the label says live.
Stale
Definition
Data that was previously live but has exceeded its freshness threshold without an updated confirmation.
When it appears
When the last validated observation is older than the acceptable window for the given signal type, and no new validated input has arrived.
Example
An earnings estimate that was published 72 hours ago when the firm's policy requires updates within 24 hours of material guidance changes.
Operator risk
Operators may not notice the decay. Stale data can silently poison models that assume recency.
Allowed system behavior
Use with explicit staleness flags. Down-weight in scoring. Trigger re-ingestion workflows. Notify the operator.
Prohibited system behavior
Feed stale data into real-time allocation decisions without downgrade or explicit override. Hide the staleness timestamp.
Inferred
Definition
Data produced by models, patterns, or estimations rather than direct observation.
When it appears
When a signal is generated by statistical inference, machine learning output, pattern matching, or any process that generalizes from observed data to unobserved cases.
Example
A sentiment score derived from natural language processing of news articles. The score is a model estimation, not a market observation.
Operator risk
Inferred data carries model risk, distribution shift, and unknown failure modes. It is often presented with false confidence.
Allowed system behavior
Use as a secondary signal with full model provenance exposed. Require higher simulation thresholds before any downstream action.
Prohibited system behavior
Treat inferred data as equivalent to live observation. Use it as a sole justification for action without operator review.
Simulated
Definition
Data produced by backtests, synthetic scenarios, counterfactual modeling, or any artificial environment.
When it appears
When the signal is generated inside a modeled environment rather than observed in live markets.
Example
A stress-test result showing portfolio drawdown under a 2008-style liquidity crisis. The number is produced by a model, not observed.
Operator risk
Simulated data is only as good as its assumptions. Operators may confuse plausible scenarios with predicted futures.
Allowed system behavior
Use for pre-decision rehearsal, risk budgeting, and governance review. Clearly separate from live or inferred signals.
Prohibited system behavior
Present simulated outcomes as forecasts. Blend simulated and live signals without clear labeling. Act on simulation as if it were observation.
Incomplete
Definition
Data that is partial, with known missing dimensions, sources, or contexts.
When it appears
When a required field, source, or contextual variable is absent, but the remaining data is still transmitted into the system.
Example
A corporate filing where the risk factors section is available but the cash flow statement is delayed due to a filing extension.
Operator risk
Operators may fill gaps with assumptions. Incomplete data creates invisible blind spots that look like confidence.
Allowed system behavior
Use with explicit missing-data flags. Trigger source recovery workflows. Require operator acknowledgment before action.
Prohibited system behavior
Impute missing values without logging the imputation. Proceed to action as if the dataset were complete.
Disputed
Definition
Data where multiple sources conflict and confidence is low until resolution.
When it appears
When two or more authoritative sources provide materially different values for the same variable at the same time.
Example
Two reputable macro data providers report conflicting unemployment figures for the same release period.
Operator risk
Operators may pick the source that confirms their bias. Disputed data creates decision paralysis or false consensus.
Allowed system behavior
Surface the conflict to the operator. Freeze downstream action until resolution or explicit override. Log both sources.
Prohibited system behavior
Arbitrarily select one source without logging the conflict. Proceed to action while the dispute is unresolved.
Unknown
Definition
No reliable signal exists. The system must flag absence and operate around it.
When it appears
When a required variable has no valid source, no model can estimate it with acceptable confidence, and no simulation covers the gap.
Example
The true counterparty exposure in an opaque derivatives market where no disclosure is available and no model has been validated.
Operator risk
The most dangerous state is unacknowledged unknown. Systems often hallucinate data rather than admit ignorance.
Allowed system behavior
Explicitly flag the absence. Reduce position size or halt action. Require the operator to acknowledge the information gap.
Prohibited system behavior
Fabricate a placeholder value. Proceed as if the absence of signal were neutral information. Hide the unknown state from the operator.
Governance rule
No financial AI system should move from reasoning to action without truth-state labels, uncertainty exposure, auditability, and operator review.
Disclaimer
This framework is for research, educational, and product-development purposes only. It is not investment advice, not a recommendation to buy or sell any financial instrument, and not an offer to manage capital.
Veldarium Capital is a research and software initiative of Veldarium Technology Systems LLC. It does not manage outside capital, provide personalized investment advice, or offer trade recommendations.