Wavy Studios

The Creative Testing Framework We Run Every Week

How we structure tests, set kill thresholds, and iterate winners into new variations systematically. No gut feelings -- just data-driven creative decisions.

11 Minute Read Framework Creative Testing

Most creative testing is chaos disguised as process. Teams throw ads at the wall, wait for "enough data," argue about what the results mean, and eventually pick winners based on whoever has the loudest opinion.

We've run thousands of creative tests. Along the way, we built a framework that removes the guesswork -- clear rules for when to kill, when to scale, and when to iterate. Here's exactly how it works.

The Testing Mindset Shift

Before diving into the framework, you need to internalize one concept: testing is about learning velocity, not finding winners.

Most teams optimize for "finding the winning ad." This leads to running fewer, bigger tests and waiting too long to make decisions. The result? Slow learning, stale creative, and missed opportunities.

We optimize for learning velocity. The faster we can run tests, gather data, and extract insights, the faster we improve. Winners emerge as a byproduct of rapid iteration -- not from trying to find them directly.

The Math: If your win rate is 15% (typical for good creative teams), running 20 tests finds ~3 winners. Running 50 tests finds ~7-8. Volume beats precision every time.

The Weekly Testing Cadence

Everything runs on a weekly cycle. This creates urgency, forces decisions, and prevents tests from lingering indefinitely.

Monday: Review and Plan

Review all active tests from last week
Categorize results: Winner, Loser, Inconclusive
Extract learnings from each test
Plan this week's test slate

Tuesday-Wednesday: Brief and Produce

Write briefs for new tests
Production team executes
QA and finalize assets

Thursday: Launch

New creative goes live
Proper UTMs and naming conventions
Budget allocation per test structure

Friday-Sunday: Gather Data

Tests run and accumulate data
No premature decisions
Weekend traffic patterns included

The Test Structure

We use a specific structure for every test that ensures clean data and clear learnings:

Anatomy of a Creative Test

Hypothesis: What we're testing and why we think it might work
Variable: The ONE thing that's different (hook, format, angle, CTA, etc.)
Control: What we're testing against (existing winner or baseline)
Success Metric: Primary KPI we're optimizing for
Kill Threshold: At what point we declare it a loser
Scale Threshold: At what point we declare it a winner
Minimum Spend: Budget required before making any decision

The golden rule: Test ONE variable at a time. If you change the hook AND the CTA AND the format, you won't know what caused the result. Discipline in isolation leads to compounding learnings.

Kill and Scale Thresholds

This is where most teams fail. They either kill too early (missing potential winners) or wait too long (wasting budget on losers). We use specific thresholds based on statistical confidence.

The Kill Threshold

A creative is killed when:

It has spent the minimum test budget (typically $100-200)
AND it's performing 30%+ worse than control on the primary metric
AND we have at least 1,000 impressions

If a creative is clearly underperforming at minimum spend, there's no reason to keep feeding it budget. Cut it and redirect spend to tests with potential.

The Scale Threshold

A creative is scaled when:

It has spent 2x the minimum test budget
AND it's performing 15%+ better than control on the primary metric
AND we have at least 2,500 impressions
AND the result is statistically significant (p < 0.1)

Statistical significance matters. A 20% improvement with 500 impressions means nothing -- it's likely noise. We use simple significance calculators to validate before scaling.

The Inconclusive Zone

What about tests that don't clearly win or lose? If a creative is within +/-15% of control after 2x minimum spend:

If it's a new angle: Keep running with 50% more budget. New angles deserve more patience.
If it's an iteration: Kill it. If an iteration can't beat control clearly, it's not worth the complexity.

Test Types We Run

Not all tests are created equal. We categorize tests by what we're trying to learn:

High Risk

Angle Tests

Testing entirely new messaging angles or positioning. Higher variance, bigger potential upside.

Medium Risk

Hook Tests

Same angle, different hooks. Testing the first 3 seconds / first line that stops the scroll.

Low Risk

Format Tests

Same concept, different format. Static vs video, square vs vertical, long vs short.

Low Risk

Element Tests

Same creative, one element changed. CTA, social proof placement, color, text overlay.

We maintain a ratio: 30% angle tests, 40% hook tests, 30% format/element tests. This balances finding net-new winners with optimizing existing ones.

The Learning Log

Tests are worthless if you don't capture learnings. Every test, win or lose, gets documented:

Learning Log Template

Test Name: [Descriptive name]
Hypothesis: [What we thought would happen]
Result: Winner / Loser / Inconclusive
Data: [Key metrics vs control]
Learning: [What this teaches us]
Next Action: [What we'll do with this insight]

Losers are often more valuable than winners. A losing test that tells you "this audience doesn't respond to discount messaging" saves you from repeating that mistake across 10 future tests.

Iteration Protocol

When a test wins, we don't just scale it -- we iterate on it. The goal is to find the ceiling of the concept.

Iteration Sequence

Hook Variations: Test 3-5 new hooks on the winning concept
Format Expansion: Adapt to formats you haven't tested (video to static, static to carousel)
CTA Tests: Try different calls-to-action
Length Tests: Shorter and longer versions
Talent/Creator Swap: Same script, different person

A single winning angle typically spawns 10-15 iterations before it's fully exploited. Most teams stop at 2-3.

Common Testing Mistakes

Mistake #1: Testing Too Many Variables

If you change 3 things and the ad wins, you don't know which change caused it. Isolate variables ruthlessly.

Mistake #2: Killing Too Early

$50 of spend tells you almost nothing. Respect minimum spend thresholds before making decisions.

Mistake #3: Ignoring Statistical Significance

"It's up 25%!" means nothing with 200 impressions. Use a significance calculator -- it takes 10 seconds.

Mistake #4: Not Documenting Learnings

Running 50 tests and forgetting what you learned is worse than running 10 tests and capturing every insight.

Mistake #5: Testing What Doesn't Matter

Button color doesn't move the needle. Hook and angle do. Focus your testing energy on high-impact variables.

Example: A Real Test Week

Here's what an actual test week looks like for one of our fitness apparel accounts:

Winner

Test A: Pain Point Hook

"Tired of leggings that slide down?" vs control. +34% CTR, +18% conversion. Scale.

Loser

Test B: Athlete Testimonial

Pro athlete endorsement vs UGC. -22% CTR despite higher production. Kill.

Inconclusive

Test C: Carousel Format

Same creative as carousel vs single image. +8% CTR but within noise. Extend test.

Winner

Test D: Urgency CTA

"Shop now" vs "Get yours before they sell out." +12% conversion. Scale.

Two winners, one loser, one needs more data. The loser taught us that "polished" doesn't outperform "authentic" for this audience. That learning shaped 6 future tests.

Your Testing Framework Checklist

Weekly testing cadence established
Minimum spend thresholds defined
Kill thresholds documented (30% worse than control)
Scale thresholds documented (15% better with significance)
Test structure template in use
Learning log maintained
Iteration protocol for winners
Test mix ratio defined (angles/hooks/elements)

The framework isn't complicated. Discipline in execution is what separates teams that compound learnings from teams that spin their wheels.

Want a Testing Framework Built for Your Brand?

We build creative testing systems that generate compounding learnings week over week. Let's talk about your creative strategy.

Book a Creative Audit →

The Testing Mindset Shift

The Weekly Testing Cadence

Monday: Review and Plan

Tuesday-Wednesday: Brief and Produce

Thursday: Launch

Friday-Sunday: Gather Data

The Test Structure

Anatomy of a Creative Test

Kill and Scale Thresholds

The Kill Threshold

The Scale Threshold

The Inconclusive Zone

Test Types We Run

Angle Tests

Hook Tests

Format Tests

Element Tests

The Learning Log

Learning Log Template

Iteration Protocol

Iteration Sequence

Common Testing Mistakes

Mistake #1: Testing Too Many Variables

Mistake #2: Killing Too Early

Mistake #3: Ignoring Statistical Significance

Mistake #4: Not Documenting Learnings

Mistake #5: Testing What Doesn't Matter

Example: A Real Test Week

Test A: Pain Point Hook

Test B: Athlete Testimonial

Test C: Carousel Format

Test D: Urgency CTA

Your Testing Framework Checklist

Related Reading

Want a Testing Framework Built for Your Brand?