Most developers who try algorithmic trading don't fail at the code. They fail because they skip phases — they jump from a market observation to a live position without the steps between, and they lose money doing things they could have caught for free. The workflow below exists to prevent that. Each phase has a specific output. Each phase has a specific failure mode. Skipping a phase does not save time — it moves the cost downstream where it is more expensive.

Phase 1 — Form a Testable Hypothesis

A trading idea is not an edge. "Tech stocks go up after earnings beats" is an observation. An edge is a specific, testable claim: "S&P 500 components that beat EPS estimates by more than 5% close higher than open on earnings day, measured over 2019–2024, with a win rate above 58%." The hypothesis must be falsifiable — if the backtest disproves it, the idea is discarded, not adjusted until it fits.

The discipline of forming a specific hypothesis before touching the data is what separates systematic traders from data miners. The common failure mode here is overfitting: retrofitting parameters to make a hypothesis pass is not hypothesis testing. It is curve-fitting the strategy to noise in historical data. Once you start adjusting the hypothesis to match the backtest results, the backtest is no longer telling you anything true about the future.

Write down the hypothesis before opening a data file. If you cannot state it in one sentence with a specific, measurable claim, it is not ready. This constraint feels artificial until you try it — then it becomes obvious how many "ideas" dissolve on contact with the requirement to be specific.

Phase 2 — Source and Validate Historical Data

Data quality determines backtest quality. Bad data — survivorship bias, incorrectly applied price adjustments, gaps treated as zeros — produces misleading backtests that pass in testing and fail in live trading.

Where to get data: Alpaca and Polygon.io for reliable OHLCV data with good historical coverage; Yahoo Finance for fast prototyping and exploration, but not for production backtesting where data quality matters. The distinction between adjusted prices and unadjusted prices is critical: use adjusted prices for returns calculations because they account for splits and dividends correctly; use unadjusted prices when absolute price level matters to the logic — screening for stocks above a price threshold, for example.

The survivorship bias problem is the most common data error in backtesting by developers new to the domain. If your dataset only contains tickers that survived to today, your backtest excludes every company that went bankrupt or was delisted during your test window. The removed companies are disproportionately the ones that lost money. Your results will be optimistic by a significant margin. Use a point-in-time dataset that includes delisted stocks when testing strategies on broad universes.

Validate the data before running any strategy on it. Check for missing trading days. Verify that price moves across the dataset are within plausible ranges. Confirm that volume figures are non-zero on days the market was open. These checks take an hour and catch errors that would otherwise corrupt weeks of backtest work.

Phase 3 — Backtest the Hypothesis

The goal of backtesting is not to find a strategy that works. It is to test whether the specific hypothesis holds on historical data. The backtest either validates or falsifies the claim stated in Phase 1.

If it validates, check for overfitting. Does the result hold on data outside the test window — a walk-forward test run against a hold-out period the strategy was never trained on? Does the mechanism make logical sense? A strategy that produces strong returns only in one narrow subperiod of the test window is not validated; it is coincidental. And if you cannot articulate why the pattern should persist — what market structure or human behavioral bias creates the inefficiency — there is no reason to believe it will.

If it fails, the hypothesis is discarded. Most hypotheses fail. That is the point — it is far cheaper to discard a hypothesis in backtesting than to discover it has no edge with real capital. This phase should be uncomfortable. The temptation to adjust parameters until the result looks good is the most common way developers corrupt their own process. When you find yourself tweaking thresholds to make the curve look better, you are no longer backtesting — you are fitting.

For deeper coverage of backtest construction and interpretation, see How to Backtest a Strategy →.

Phase 4 — Paper Trade the Validated Strategy

A passing backtest is a validated hypothesis, not a proven strategy. Paper trading tests whether the code executes correctly in real market conditions — correct order logic, correct position tracking, correct handling of edge cases at market open and close.

Run for at least 30 trades before evaluating. Do not adjust parameters during paper trading. The goal is to observe the strategy as specified, not to improve it mid-run. If you see a problem in paper trading and immediately adjust, you have introduced another variable; you no longer know whether the original strategy would have worked.

Paper trading also tests the operational dimension. Backtesting runs offline on static data. Paper trading runs live, which means it encounters the full range of real-market messiness: orders placed while the market is moving, partial fills, momentary data feed issues, and the experience of watching a strategy take losses without any ability to intervene. That last point matters more than it sounds.

For a detailed comparison of the two environments, see Paper Trading vs. Live Trading →.

Phase 5 — Define Risk Parameters Before Going Live

Before the first live trade, document four things: maximum position size, maximum daily loss (the halt point where the strategy stops executing for the session), maximum drawdown threshold (the kill switch level where the strategy is shut down pending review), and maximum holding period limits.

These are set once and not changed during a drawdown. Changing risk parameters during a losing streak is rationalization. It is the trader convincing themselves that this particular loss is an anomaly, that the next trade will recover it — and it is almost never an anomaly. The risk parameters exist precisely because judgment degrades under loss. A parameter set in advance, when you are not under the pressure of a losing position, is more trustworthy than a parameter set while watching the account decline.

If the drawdown threshold is reached, the strategy stops. The correct response is review and investigation, not a parameter adjustment to let it run longer.

Phase 6 — Deploy and Monitor

Deployment is not the end of the process. It is the beginning of the monitoring phase. Before the strategy executes a single live trade, configure alerts: per-trade P&L outside expected range, execution quality below threshold — slippage significantly worse than paper trading — position state anomaly where the strategy holds something it should not, and process health confirming the strategy is actually running.

What to watch after deployment: per-trade P&L versus paper performance, execution quality against expectations from the backtest and paper trading periods, position state at all times, and process uptime. These are not periodic checks — they are automated alerts. A strategy that runs unmonitored will eventually have a problem that no one notices until it is expensive. The monitoring layer is not optional.

The question to ask before deploying: if this strategy behaved unexpectedly at 3 AM on a day you were not watching, how long would it take you to notice? If the answer is "I don't know," the monitoring is not ready.

The Common Thread

Each phase has a clear output and a clear failure mode. Phase 1 produces a falsifiable hypothesis or produces nothing useful. Phase 3 validates or falsifies that hypothesis — it does not improve it by adjustment. Phase 4 confirms execution or reveals bugs. Phase 5 documents risk bounds that will not be changed under pressure. Phase 6 establishes monitoring before capital is at risk.

The failure mode at every phase is the same: skipping the disconfirmation step. Skipping the out-of-sample test. Skipping paper trading. Skipping risk parameter documentation before going live. The traders who skip phases are the ones who lose capital and conclude "algo trading doesn't work." The workflow is the structure that prevents that conclusion from being necessary.

The Oyamori Approach

Oyamori's platform is built around this workflow — not as a philosophical endorsement but as a structural reality. The execution layer handles deployment. The monitoring dashboard handles observation. The edge catalog provides validated hypotheses that have already passed the backtest and out-of-sample stages.

The trader's job is to configure the parameters at each phase — position sizing, halt thresholds, kill switch levels — not to build the infrastructure that executes and monitors the strategy from scratch. That infrastructure problem has already been solved.

The workflow above still applies entirely. Oyamori removes the cost of building each phase's tooling. It does not remove the discipline each phase requires, and it does not make the decisions that belong to the trader.


Next: Setting Up Your Trading Development Environment →