A backtest that shows positive returns is not evidence that a strategy works. It's evidence that someone found a set of rules that would have worked on a specific dataset in a specific time period. Those two things are not the same, and conflating them is the most common way retail systematic traders deploy strategies that fail immediately on live capital.

A verifiable trading strategy is one that demonstrates its edge under conditions it was not optimized to fit. This is a higher bar than most strategy presentations meet, and a necessary one before committing real capital.

The Backtest Is Not the Proof

A backtest answers the question: "Would these rules have made money on this historical data?" That's a useful question, but it's not the question you actually need answered, which is: "Will these rules make money on future data?"

Historical performance can be manufactured. Given enough parameters and enough historical data, any optimization algorithm will find a set of rules that produces a positive equity curve. This is not strategy discovery — it is curve fitting. The parameters describe the past; they do not predict the future. When the strategy meets live market data that doesn't look like the training period, the curve-fitted parameters become actively harmful.

Three things distinguish a fitted result from a verifiable one.

Out-of-sample testing. The strategy was developed and optimized on one dataset and then tested — without modification — on a completely separate dataset that was never used during development. If the strategy shows positive expectation on the out-of-sample data, that's meaningful evidence. If it only works on the training data, it's an artifact.

Logical mechanism. The edge has a defensible explanation for why it should work. Not "the RSI crossing 30 produced positive returns from 2015 to 2022" — but "this strategy exploits the behavioral tendency of retail investors to sell in panic at opening gaps, creating a short-term mean-reversion opportunity that reverses within two to three sessions." See what makes a trading edge real for the full framework. A strategy with no mechanism that just "looked good on the backtest" has no reason to continue working when market conditions shift.

Cross-regime consistency. Markets go through meaningfully different regimes: trending, ranging, high-volatility, low-volatility, risk-on, risk-off. A strategy that only works in one regime is not a systematic edge — it's a regime bet. A verifiable strategy shows positive expectation across multiple distinct market regimes, not just the one that dominated the backtest period.

What Most Strategy Presentations Show You

When a trading platform, newsletter, or signal service presents a strategy with a performance chart, the standard presentation shows:

  • Total return over the backtest period
  • Maximum drawdown
  • Sharpe ratio
  • Win rate

These numbers, without context, prove almost nothing.

Total return on a backtest is computed on data the strategy was built to fit. Maximum drawdown is the worst historical drawdown — it does not bound future drawdown, which can exceed it. Sharpe ratio computed in-sample is optimistically biased. Win rate is not a measure of edge quality — a strategy can have a 30% win rate and positive expected value if the winners are large enough relative to the losers.

What's almost never shown: the out-of-sample period results. The train/test split. The walk-forward analysis across different time windows. The performance in specific market regimes. The number of parameters optimized (more parameters = higher overfitting risk at constant dataset length). The transaction cost assumptions.

The absence of these details is not always intentional deception. Many strategy builders genuinely don't perform out-of-sample testing. They optimize, see a good equity curve, and mistake the fitting process for discovery. The result is a strategy that has never actually been tested — only optimized.

The Three Verifiability Tests

Before deploying any strategy with real capital, it should pass three tests.

Test 1: Walk-forward out-of-sample validation. Split the historical data: train on the first 70%, test on the remaining 30%. The strategy parameters are optimized only on the training set. Performance on the test set — which the optimizer never saw — is the out-of-sample result. If it holds up, you have meaningful (not conclusive) evidence of edge. If it degrades significantly, the training performance was curve fitting.

Test 2: Logical mechanism review. Can you explain in plain language why this strategy should work, independent of the backtest results? The explanation should describe a market inefficiency — behavioral, structural, or informational — that the strategy exploits. If the answer is "the parameters worked historically," that's not a mechanism. That's a description of the fitting process.

Test 3: Paper trading period. Before live deployment, run the strategy in paper trading — simulated execution using real-time market data — for a minimum of one to three months. This tests the execution infrastructure, confirms the signal logic behaves as expected in live market conditions, and begins building a live track record before any real capital is at risk. See paper trading vs. live trading for the full protocol.

None of these tests guarantee future performance. They reduce the probability of deploying a strategy that only appeared to work because it was optimized to fit a specific historical dataset.

What Verified Performance Actually Looks Like

A verified strategy shows:

  • Positive out-of-sample expectation — the test set shows a Sharpe ratio and return that are meaningfully positive, even if lower than the training set (some degradation is expected and acceptable)
  • Consistent direction across regimes — the strategy made money in both the bull and ranging periods in the test set; it wasn't only profitable in one market condition
  • Live paper trading results that track the backtest — execution behavior in live conditions matches what the backtest predicted within a reasonable margin
  • A defensible edge mechanism — someone reading the strategy description can understand why it works, not just that it worked historically

The verification process is time-consuming. Most retail traders skip it. That's why most retail systematic strategies fail shortly after live deployment — not because the underlying idea was wrong, but because the strategy was never actually tested, only optimized.

The Oyamori Approach

Every strategy in Oyamori's catalog is required to document its edge mechanism before it appears in the catalog. Backtest performance is shown with out-of-sample results, not in-sample-only results. Walk-forward validation is part of the publishing standard.

This is the foundational reason the catalog exists: to remove the verification burden from investors who want to run systematic strategies but don't have the research infrastructure to validate them independently. The investor choosing a strategy from the catalog is not skipping verification — they're accessing verification that was already performed.

The trading strategy marketplace model only works if the strategies in it are verifiably sound. An investor running an unverified strategy on live capital is not running a systematic strategy — they're running a sophisticated-looking guess. Oyamori's catalog is designed so that every entry has cleared the bar that most strategies presented online never reach.


Next: Backtesting Is Not Prediction — Why Your Backtest Lies →