Wavy Studios

The Creative Testing Framework We Run Every Week

How we structure tests, set kill thresholds, and iterate winners into new variations systematically. No gut feelings -- just data-driven creative decisions.

11 Minute Read Framework Creative Testing

Most creative testing is chaos disguised as process. Teams throw ads at the wall, wait for "enough data," argue about what the results mean, and eventually pick winners based on whoever has the loudest opinion.

We've run thousands of creative tests. Along the way, we built a framework that removes the guesswork -- clear rules for when to kill, when to scale, and when to iterate. Here's exactly how it works.

The Testing Mindset Shift

Before diving into the framework, you need to internalize one concept: testing is about learning velocity, not finding winners.

Most teams optimize for "finding the winning ad." This leads to running fewer, bigger tests and waiting too long to make decisions. The result? Slow learning, stale creative, and missed opportunities.

We optimize for learning velocity. The faster we can run tests, gather data, and extract insights, the faster we improve. Winners emerge as a byproduct of rapid iteration -- not from trying to find them directly.

The Math: If your win rate is 15% (typical for good creative teams), running 20 tests finds ~3 winners. Running 50 tests finds ~7-8. Volume beats precision every time.

The Weekly Testing Cadence

Everything runs on a weekly cycle. This creates urgency, forces decisions, and prevents tests from lingering indefinitely.

Monday: Review and Plan

  • Review all active tests from last week
  • Categorize results: Winner, Loser, Inconclusive
  • Extract learnings from each test
  • Plan this week's test slate

Tuesday-Wednesday: Brief and Produce

  • Write briefs for new tests
  • Production team executes
  • QA and finalize assets

Thursday: Launch

  • New creative goes live
  • Proper UTMs and naming conventions
  • Budget allocation per test structure

Friday-Sunday: Gather Data

  • Tests run and accumulate data
  • No premature decisions
  • Weekend traffic patterns included

The Test Structure

We use a specific structure for every test that ensures clean data and clear learnings:

Anatomy of a Creative Test

  • Hypothesis: What we're testing and why we think it might work
  • Variable: The ONE thing that's different (hook, format, angle, CTA, etc.)
  • Control: What we're testing against (existing winner or baseline)
  • Success Metric: Primary KPI we're optimizing for
  • Kill Threshold: At what point we declare it a loser
  • Scale Threshold: At what point we declare it a winner
  • Minimum Spend: Budget required before making any decision

The golden rule: Test ONE variable at a time. If you change the hook AND the CTA AND the format, you won't know what caused the result. Discipline in isolation leads to compounding learnings.

Kill and Scale Thresholds

This is where most teams fail. They either kill too early (missing potential winners) or wait too long (wasting budget on losers). We use specific thresholds based on statistical confidence.

The Kill Threshold

A creative is killed when:

  1. It has spent the minimum test budget (typically $100-200)
  2. AND it's performing 30%+ worse than control on the primary metric
  3. AND we have at least 1,000 impressions

If a creative is clearly underperforming at minimum spend, there's no reason to keep feeding it budget. Cut it and redirect spend to tests with potential.

The Scale Threshold

A creative is scaled when:

  1. It has spent 2x the minimum test budget
  2. AND it's performing 15%+ better than control on the primary metric
  3. AND we have at least 2,500 impressions
  4. AND the result is statistically significant (p < 0.1)

Statistical significance matters. A 20% improvement with 500 impressions means nothing -- it's likely noise. We use simple significance calculators to validate before scaling.

The Inconclusive Zone

What about tests that don't clearly win or lose? If a creative is within +/-15% of control after 2x minimum spend:

  • If it's a new angle: Keep running with 50% more budget. New angles deserve more patience.
  • If it's an iteration: Kill it. If an iteration can't beat control clearly, it's not worth the complexity.

Test Types We Run

Not all tests are created equal. We categorize tests by what we're trying to learn:

High Risk
Angle Tests

Testing entirely new messaging angles or positioning. Higher variance, bigger potential upside.

Medium Risk
Hook Tests

Same angle, different hooks. Testing the first 3 seconds / first line that stops the scroll.

Low Risk
Format Tests

Same concept, different format. Static vs video, square vs vertical, long vs short.

Low Risk
Element Tests

Same creative, one element changed. CTA, social proof placement, color, text overlay.

We maintain a ratio: 30% angle tests, 40% hook tests, 30% format/element tests. This balances finding net-new winners with optimizing existing ones.

The Learning Log

Tests are worthless if you don't capture learnings. Every test, win or lose, gets documented:

Learning Log Template

  • Test Name: [Descriptive name]
  • Hypothesis: [What we thought would happen]
  • Result: Winner / Loser / Inconclusive
  • Data: [Key metrics vs control]
  • Learning: [What this teaches us]
  • Next Action: [What we'll do with this insight]

Losers are often more valuable than winners. A losing test that tells you "this audience doesn't respond to discount messaging" saves you from repeating that mistake across 10 future tests.

Iteration Protocol

When a test wins, we don't just scale it -- we iterate on it. The goal is to find the ceiling of the concept.

Iteration Sequence

  1. Hook Variations: Test 3-5 new hooks on the winning concept
  2. Format Expansion: Adapt to formats you haven't tested (video to static, static to carousel)
  3. CTA Tests: Try different calls-to-action
  4. Length Tests: Shorter and longer versions
  5. Talent/Creator Swap: Same script, different person

A single winning angle typically spawns 10-15 iterations before it's fully exploited. Most teams stop at 2-3.

Common Testing Mistakes

Mistake #1: Testing Too Many Variables

If you change 3 things and the ad wins, you don't know which change caused it. Isolate variables ruthlessly.

Mistake #2: Killing Too Early

$50 of spend tells you almost nothing. Respect minimum spend thresholds before making decisions.

Mistake #3: Ignoring Statistical Significance

"It's up 25%!" means nothing with 200 impressions. Use a significance calculator -- it takes 10 seconds.

Mistake #4: Not Documenting Learnings

Running 50 tests and forgetting what you learned is worse than running 10 tests and capturing every insight.

Mistake #5: Testing What Doesn't Matter

Button color doesn't move the needle. Hook and angle do. Focus your testing energy on high-impact variables.

Example: A Real Test Week

Here's what an actual test week looks like for one of our fitness apparel accounts:

Winner
Test A: Pain Point Hook

"Tired of leggings that slide down?" vs control. +34% CTR, +18% conversion. Scale.

Loser
Test B: Athlete Testimonial

Pro athlete endorsement vs UGC. -22% CTR despite higher production. Kill.

Inconclusive
Test C: Carousel Format

Same creative as carousel vs single image. +8% CTR but within noise. Extend test.

Winner
Test D: Urgency CTA

"Shop now" vs "Get yours before they sell out." +12% conversion. Scale.

Two winners, one loser, one needs more data. The loser taught us that "polished" doesn't outperform "authentic" for this audience. That learning shaped 6 future tests.

Your Testing Framework Checklist

  1. Weekly testing cadence established
  2. Minimum spend thresholds defined
  3. Kill thresholds documented (30% worse than control)
  4. Scale thresholds documented (15% better with significance)
  5. Test structure template in use
  6. Learning log maintained
  7. Iteration protocol for winners
  8. Test mix ratio defined (angles/hooks/elements)

The framework isn't complicated. Discipline in execution is what separates teams that compound learnings from teams that spin their wheels.

Want a Testing Framework Built for Your Brand?

We build creative testing systems that generate compounding learnings week over week. Let's talk about your creative strategy.

Book a Creative Audit →