Episode 14

Drug Developers' Lessons from Sports: Regression-to-the-Mean

May 26, 2025 · 41:09

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: All right.

Welcome everybody to, in the Interim,
this is a podcast of Berry Consultants

on all things science of clinical trials,
medical decision making, drug development,

and we, we are statisticians typically
talking about the science of this.

Today we have a really.

Cool topic and a fun one for me
and I know a fun one for my guest.

Uh, for the first time on.

In the interim, my guest is Dr.

Nicholas Berry, who also
happens to be my son.

And the topic for today is Lessons for
drug Developers from the World of Sports.

So we're gonna talk about what clinical
trials drug developers could learn

from examples in the world of sports.

So Nick, welcome to in the Interim.

Nick Berry: Thanks.

Yeah, happy to be here.

Um, yeah, I'm happy to talk about this.

I'm, I'm, I'm glad to be talking
about sports because, you know,

a lot of our relationship when
I was young was based on sports.

You were my coach for, I dunno,
15 years playing baseball.

And a lot of how I started to like
perceive statistics was through

sports, uh, through watching twins
games, through watching baseball,

through watching other sports, and
sort of learning about all of the.

The weird things that would happen
and how a, a skeptical statistician

like you perceives sort of home
run races in the late nineties and

batting titles and things like that.

And so a lot of the way I learned to
infer about statistics came from sports.

So this is a sort of near and
dear topic, and I, I wanted

to be a sports statistician.

For, for a long time, um, sort of when
you were at Texas a and m, you wrote

this column, um, the statistician
reads the sports pages in chants.

And so I would read those, I would
look at those and sort of, uh.

Um, it, it got me interested in
that and I, I tr I sort of searched

out a sports path too, right?

I, I applied for some
internships in sports.

I worked with Hal Stern who wrote a lot
of sports stuff back in the day, and so

I, I, I was on a statistics path for a
while before I veered to clinical trials.

So, uh, this is a perfect sort of
merging of the two worlds for me.

Scott Berry: Yep.

Yep.

Okay.

So, and, and many of those experiences
were my same experiences with

my father as well, who's also a
statistician and, and loves sports.

Uh, and so I.

We're, we're gonna talk about
various concept in sports, and

this is the first of, of, uh, the
first topic we're gonna talk about.

And we have multiple other
topics that I think are really

valuable, uh, for drug developers.

And I think the examples will, will
bring home some of the concepts.

And so that's what we're also
gonna try to make clear here.

Uh, we're not gonna get too
deep in the sports, not too deep

in the drug development, but
make sure we tie 'em together.

Nick Berry: So.

Scott Berry: topic is
regression to the me.

And it was talking about the, the
family love of sports and the family,

of course the family love of statistics.

If, if somebody in our family
brings up an interesting thing that

happened in the world and says to my
mother, your grandmother, gee, what,

what, what do you think that is?

She will 90% just say regression to
the mean, not necessarily knowing

what it is, but knows that that's
the answer to most of the questions

are it's regression to the mean.

So, so what is regression to the mean?

So we'll talk about it within sports,
uh, and then we'll talk about how

it, it, it, uh, what, what it means
in the world of clinical trials

and statistics and, and science.

So, my first experience
in regression to the mean.

That I remember, um, it was
the 1977 baseball season.

I was 10 years old.

I'm a little bit older than
Nick, not surprisingly.

Um, and in 1977 being 10, I loved baseball
and I could read the box scores and I

understood how to extrapolate statistics
at one point in the season and say, what

is going to be the end of the season?

Within that.

And the first of those was home runs.

George Foster was on my
favorite team at the time.

The Cincinnati Reds, the, the
Minnesota Twins were terrible

at the time, by the way.

So, uh, the big red machine where,
where the Cincinnati Reds and

George Foster, uh, halfway through
the season had 31 home runs.

And he was leading the league in home
runs and he had 31 home runs, and I

was sophisticated enough to double
that and say, okay, that's a pace of

62 home runs by the end of this season.

At the time, the record was 61.

The famous story of Roger Mars's.

61 home runs he beat Babe Bruce.

Babe Ruth's record of 60 home runs.

But that's the best that had ever
been done in a season before that.

And George Foster was on pace to
break that, and I knew that halfway

through the season, players have.

Half of their, their,
their expected number.

And so was on pace to break it.

And as a 10-year-old, I
thought he was gonna break it.

And I even mentioned this to my
father, who's a statistician, said

he's gonna break Mars's record.

And we made a wager based on
that where bet, whether or

not he would break the record.

And I thought at the time my dad was nuts,
that he would, not only would he make

that bet, but he gave me, um, 54 or more.

He said, I'll bet he doesn't even hit
54 home runs by the end of the season.

at that point, you know, does he
hit another 23 home runs when he

hit 31 halfway through the season?

That seemed like a this, this
was a sucker bet for my father

that of course I'd make that

Nick Berry: Yeah.

Scott Berry: And of
course, he didn't hit 54.

He didn't hit 62.

He hit 52.

is actually very good number.

Um, from his 31, he hit 20 more
home runs, which is on pace.

That 21 is a 42 home run hitter, which
is an incredibly good home run hitter.

Nick Berry: Sure.

Scott Berry: did really, really
well, better than than average,

but not the 31 pace in that.

Okay, and this, this
happens baseball season.

After baseball season.

A really fun thing about that season
was my Minnesota twins, rod Caru, was

flirting with batting 400, a batting
average of 400, which is the number

of hits divided by the number of
attempts, and that's easy to project.

It's the batting average and
at, that point in the season,

he was hitting over 500.

And so we might, in this 1977 season
have a 400 hitter and break the

home run record all in one season.

On my two favorite teams at

Nick Berry: He was hitting over 400,

right?

And yeah, which is an astronomical number.

Like this has happened less
than, you know, a handful of

times in history of the game.

Scott Berry: The, the last time
before 1977 that it had happened,

which is still the last time it has
not happened even since 1977, was,

uh, Ted Williams hit 4 0 6 in 1941.

so this has become a, a, a, a
number that baseball fans know

and know this hasn't happened.

And so that's a very rare thing.

And he had the best batting average
in the league at the time, and I

didn't make a wager on this, but he
ended up hitting 3 88 actually, is

one of the highest numbers since 1941.

An incredibly high number
rod crew, a hall of fame.

Baseball player, uh, because
of his batting average.

Incredibly high number.

And we've had several people
since then that have done similar

flirting with that 400 number.

George Brett did.

Uh, actually Lenny Dykstra did Tony
Gwynn, uh, even Joe Mauer flirted

with it in another Minnesota twins,
and nobody has accomplished that.

So this is something we're very
familiar with in sports, uh, uh, of

this kind of phenomenon happening.

Nick Berry: So

what does it mean?

Scott Berry: Uh, in, in that
scenario were, was, was, was

Rod Caru, truly a 400 hitter?

was George Foster truly a 62
home run hitter at that point?

Nick Berry: Yeah.

Scott Berry: Uh, and they weren't.

within that scenario, and
they were the extremes.

And so if we were to say how good are
they really, we would never estimate them.

And my father knew that George Foster
was not a 31 home run hitter, which is

why he was very comfortable betting.

He knew he probably wasn't even a.

21 home run or a 23 home run
hitter, which is why he took that

bet that that's an extreme number.

His, his estimate was regressed.

Now, George Foster had previous
performances, but also regressed

towards the middle, uh, in that setting.

Nick Berry: Yeah, I was gonna
say, I'm sure that Don, your

dad looked up, George Foster's.

Previous three years and saw you hit
35 home runs to 40 home runs every

year, and, and, and made a deduction.

Scott Berry: Yep.

So we'll come back to sort of how we
might estimate, um, what they really are.

But this phenomenon
happens in every sport.

It, it happens in hockey,
it happens in baseball.

It happens in basketball.

We have, uh, people in teams.

We have teams halfway through a season
that are on pace to break the, the

record number of wins in basketball.

We've had some, some do that,
uh, and they fall short of that.

Uh, and we always talk about that.

We've actually recently had teams break.

Uh, and we might have a team,
uh, this season, break the record

for most losses in a season.

You know, these extreme sort of things,
uh, that, that we've talked about.

And then it doesn't happen.

And in the sports world, we
hear reasons why that happens.

We hear

Nick Berry: Yeah.

Scott Berry: the, the, the scrutiny
of the media, uh, everybody asking

about it day in and day, day out.

It's really hard to continue such a pace.

Nick Berry: Yeah, they got in their
heads and yeah, the whole, yeah.

Scott Berry: Yep.

let me pro, let me provide another
example of this in a, in a sport, Nick

and I both, uh, love, uh, which is golf.

Uh, we play golf, uh, and if you're
looking on video, we have golf shirts on.

Uh, so, within this we love to play golf.

So let me provide another example.

Moneymaking gig, uh,
for us potentially is.

a golf tournament, and I'm gonna
quote numbers from the 2017 US Open,

but this happens week in, week out.

Uh, very, very similar numbers within it.

The US Open might be a little bit
more extreme because there's more

of a variety of players in the
US Open than, than the run of the

mill Weekly PGGA tour tournament.

So in the 2017 US Open.

If I take the baseline score, and I'm
calling it baseline, but we're gonna

start talking about clinical trials.

But the first day, the golfers shot on
their first day in the US Open, and I'm

gonna break them up into four uh groups.

Uh, four Quantiles, the top
25% of golfers, second 25%

of what they shot on day one.

The third and the fourth, the bottom in
terms of the worst score within that.

So I'm gonna

Nick Berry: Gonna take those.

Scott Berry: and let's talk
about the worst quantile, uh, uh,

at first, uh, quartile, sorry.

The worst quartile of this, the
25% that shot the worst score,

they shot an average of 78.

On day one, US opens traditionally
very, very hard, and this

round was a hard round.

The average was 78.

I'm going to do an
intervention on that group.

gonna go around to each one of them
and I'm gonna think really good

thoughts about them, and I'm gonna
give them encouragement in that.

And so what happened to them
on day two when I intervened

on them is they averaged 3.04

shots better on day two
than they did day one.

They, their change from baseline from day
one to day two was three shots better.

I did that.

I caused them to shoot three shots better.

And is does three shots matter
in a PGA tour tournament?

Three shots around is multimillions of

Nick Berry: Yeah.

Scott Berry: It's the difference
between, uh, uh, being on, uh,

uh, uh, uh, one tour, another tour
winning major golf tournaments.

It's millions of dollars.

It's an enormous number

Nick Berry: Yeah.

Scott Berry: in that.

And think about a four round
golf tournament that's 12 shots.

That if I could bottle that and
have a three shot effect, I'm

making millions, uh, in this.

Okay?

Likewise, the people that shot the
golfers that shot in the top the

best, uh, quartile, they got three
shots worse on day two on average.

I didn't think good shots of them.

And in fact, I thought,
I thought, I thought.

Bad thoughts about them, and they
do three shots worse on day two.

This is repeatable.

This happens every single
week in the PGA tour.

happens all the time,
uh, in this circumstance.

Um, so what does it mean?

did, did, did my thinking about them
affect them in any way, shape or form?

No.

Nick Berry: Yeah.

Scott Berry: not, um, in that,
that doesn't mean other things are,

not understood whether they had an effect,

Nick Berry: Yeah.

The bottom quanti core quartile of people
didn't feel no pressure and go straight at

pins because they had a tough first day.

It wasn't some conscious, concerted
effort by the players to to,

to do better play freer, right?

This is variability, it's
randomness that led to it.

Scott Berry: So, uh,
announcers will tell you that,

Nick Berry: Yeah.

Scott Berry: I, I had a bad day and I
just, I was so super aggressive and I

wasn't worried about my positioning.

And you shoot much better when
you do that, uh, or vice versa.

Now you're near the lead.

You didn't sleep well.

Uh, you kept thinking about
hosting the trophy and, and, and

you shot worse with it, with it.

So people attach reasons to that.

Something that is nothing
other than randomness.

And this is

Nick Berry: Yeah.

Scott Berry: other than randomness.

Nick Berry: And this isn't
just like casual fans.

I mean, the players themselves say, this
announcers that played for 15 years say

this, and this is a, this is a, you know,
people who have spent their life playing

the game still contribute a lot of, you
know, variability to physical aspects

rather than the fact that there is just
a lot of variability in database scores.

Scott Berry: So what is, we have
this term regression to-the-mean

what do we mean by regression-to
the-mean Uh, within a setting like

this, the, there's the average score
on day one, which I think was about

73 and a half, something like that.

Uh, and we have the average of
those golfers, all these golfers are

participants, and the average is that
73 and a half and somebody shoots 78.

On day one, we, and let's ignore previous
tournaments that they come into the US

Open, we're interested in, in, in that.

Um, you want, if you estimate that
that individual's true average score

on that golf course is 78, you are
shocked when you find out the average

of the people in that quartile got
better by three shots on day two.

And there must be a reason for that.

You could think everybody is a 73.5

and all golfers are identical, and
there's no difference between them.

You'd estimate those golfers to shoot 73
and shoot five shots better on day two.

Uh, not all golfers are all the same.

they're different.

Some golfers are better
than the other ones.

We know that, and especially
the US open, there's more

heterogeneity in those players.

There's variability across players.

There's variability on
a day score a setting.

So what does it mean
if somebody shoots 78?

They, they, they, they've
shot a high number.

They're probably not as good
as the average, but they're

not as extreme as the 78.

If we were estimating them statistically,
we would take an average of their 78,

their, their, their score, and the
average of the average of all the golfers.

If this is all we knew, trying to
estimate them, which is 73, and we would

do some estimate midway between that.

Nick Berry: Yeah, and how far
you go from one to the other

depends how well you think.

You actually know how
good golfers are and how.

Maybe if a, if a golf round was a
thousand holes, you think you know

more about the players, so you know the
amount of data you have and the amount

of belief you have and how good the
players are goes into this weighting

of where you are between the two.

Scott Berry: Right.

So in, in only using the 2017 US Open,

Nick Berry: Yeah.

Scott Berry: The key things about
how much regression you would do

towards the mean is the between
subject variability, how variable are

golfers, and then the within subject
variability in a, in an 18 hole score.

And by the way, spent some time in this.

It's, it's a little less than three,
is a standard deviation of within a

professional golfer on any one day.

Uh, within that, the between
variability in this UO US Open

is quite a bit bigger than that.

Um, within the setting, this,
the, the, the, the bottom qua

Quantile was 78, the top was 69.

There's probably a five or six
standard deviation across golfers.

So we would use those, uh, in that setting
now to provide that estimate of that.

So we're not at all surprised
when the next day scores.

For all of them shrink in towards the mean
it's, it's inevitable because the first

day score variability pushes them outward
and their truth is somewhere in the middle

of that variability always goes outward.

That's what variability does.

So we would always say when somebody
does better on day two, they go from 78

to 75, we say, oh, that's regression-to
the-mean Our estimate, if I was betting

on somebody that shot 78 on day one, I
would probably estimate 75 on day two.

And I would bet and I would
behave that way, uh, in the

particular setting like that.

Meanwhile, somebody that shot 67,
I would probably guess 70 or 71.

It's a regression to the mean,
statistically a common way we do this

in Bayesian approach is something
called the hierarchical modeling.

And the hierarchies are,
the players is a hierarchy.

And within players, their different
scores is a hierarchy and that's

used to provide these estimates.

And we do it in sports, all the time.

We also do it in drug development.

Now, the is, this is a natural phenomenon.

this is a natural phenomenon.

If you're rolling dice, this
is a natural phenomenon.

If you're flipping coins, if you're
playing golf, if you're playing

baseball, if teams are having
outcomes, it's pretty well understood.

In sports, it's sometimes understood
in drug development, and this is the

part that drives us nuts, is people
attach to it that are not right,

wrong, then they behave that way.

To it, is, which creates bad decision
making, which, uh, bad betting, bad

decision making, bad paying of baseball
players, expecting them to do the same.

And there are some you may heard of.

So, uh, I, uh, you know, there's something
I don't even know if people talk about

anymore, the Sports Illustrated Jinx.

recognized it that players or people
or teams got on the cover of Sports

Illustrated, a weekly magazine, and what
happened is they tended to do worse and

people referred to it as a jinx of being

Nick Berry: Yeah.

Scott Berry: Illustrate,

Nick Berry: Sports Illustrated
doesn't exist anymore, but it's

now the Madden Jinx or whatever.

You know, the video game that comes out.

You're on the cover of Madden, and
then the next year you do worse.

But it also exists crazy coincidence
that there's also a Madden Jinx,

like a Sports Illustrated jinx.

Scott Berry: And people
believe that's a real thing,

Nick Berry: Yeah.

Scott Berry: is a real thing.

There's, you've heard the sophomore jinx.

People have a great
freshmore freshman season.

They don't do quite as well in their
sophomore year, and it's a jinx.

Nick Berry: Yeah, sophomore slump.

Scott Berry: you know, we don't hear
this anymore because if you have

a great freshman year, you go pro.

Nick Berry: Oh, and yeah.

Scott Berry: there's a
rookie of the ear jinx.

Nick Berry: Yep.

Scott Berry: you go look at players who
win rookie of the year in baseball, in

any sport, they do worse the next year.

They're complacent.

It's a jinx.

Uh, and and it's nothing but
regression to the mean largely.

Nick Berry: overperformed
in their first year.

Scott Berry: yeah.

Yeah.

And if you believe they're
gonna repeat that performance,

that's the mistake you made.

then you think, okay, we have to
attach a, a, a, an effect to this, uh,

Nick Berry: Yeah.

Scott Berry: Um, media pressure
scrutiny, uh, team chemistry.

Uh, the team chemistry
that great was seasoned.

The next, the next year was the chemistry.

Wasn't very good

Nick Berry: Yeah.

Scott Berry: means we, we
don't know how to explain it.

So we, we invent chemistry for it.

This, now this happens
in all kinds of things.

Golf equipment, who goes, you
know, you try a new putter when

you're not playing well, a new
driver and you're doing better.

Nick Berry: Yeah.

Scott Berry: Golf teachers
benefit from this greatly.

Who goes to see a golf professional?

Somebody

Nick Berry: Yeah.

Scott Berry: I've, I,
I'm on that left side.

I'm I, you know, shooting the higher
scores and I play better, and I

want to pay more money to that
golf professional in that setting.

Now, lots of people benefit from this.

Chiropractors benefit, your back hurts.

You go in, doctors benefit, I'm not
feeling well, and then I feel better.

In the setting, and we attribute
it to that rehab therapists.

We, we, we use trinkets, we put magnetic
bracelets on our, on our arms, uh, to

make pain go away and it goes away.

And you know that, this sort of thing.

So it's all kinds of, this, this
shows up in every walk of life.

Nick Berry: Yeah,

Scott Berry: let's the, so let's

Nick Berry: you've been so sorry.

Scott Berry: trials.

Yeah.

Nick Berry: You've been
really pessimistic.

Like everything is fake.

That's not necessarily what you're saying.

You're just saying that every time
you observe something extraordinary,

especially in the circumstance where the
population that underwent this procedure

or something like that was different
than the norm population, but But you're

just saying shrink back the results.

You're not saying that.

It's fake golf teachers don't work.

It's just that if you see immediate
benefits after you go see a golf

teacher, some of that benefit is
due to regression, to the mean.

There are benefits of some of
these things, but believing at face

value, the change you see right
away is naive when we know that

there's going to be some regression.

Yeah,

Scott Berry: Yeah, exactly.

And, and, um, doctors are
phenomenally important.

I'm not saying don't go

Nick Berry: yeah, I know exactly.

Yeah, exactly.

Scott Berry: Uh, your uncle and my brother
is a golf professional, so, uh, they,

they, they absolutely play a role and
they will help you shoot better scores.

Nick Berry: Yeah,

but

it's hard to figure out how
much of it goes to each thing.

Like it's really hard to, to
assign how much it should shrink.

Scott Berry: Okay, so let's
talk about this in clinical

trials and drug development

Nick Berry: Yeah.

Scott Berry: and now you can
start to imagine some of these

effects showing up within it.

A, a very, one that I, I think is
very misunderstood is something

called the placebo effect.

And people talk about clinical trials
that in a clinical trial, and when people

say the placebo effect, what they mean is
somebody takes a treatment that is inert.

But they, they're taking a
treatment and the, the act of taking

that makes them perform better.

And you can see where I'm going with this.

I was labeling the, the, my thinking
about golfers in that situation.

As kind of a placebo and the golfer
doesn't know I'm thinking about them

Nick Berry: Yeah.

Scott Berry: and they do better.

So let's take a clinical trial.

And what happens in uh, in most
clinical trials is people who

enter a clinical trial like that
golfer who shot 78, they're doing

poorly, relatively on their scale.

And it could be.

It could be high blood pressure, it
could be high weight, it could be pain.

I'm not sleeping well.

Um, in that scenario, and this happens
in Alzheimer's trials, that people, that

their memory has been particularly bad.

I'm, I'm, I'm, I'm doing worse.

They enter in trials and in many of those
scenarios, patients who get no treatment.

Or an inert treatment, they do better,
and everybody calls this a placebo effect.

Now, in many circumstances,
that's the exact same thing we

Nick Berry: Yeah.

Scott Berry: It's a
regression to the mean.

Now, there are certainly some cases
where I believe that the act of

taking an intervention may help.

you could imagine sleep, you could
imagine potentially pain trials is

where they've talked about it and
there's actually been trials where they

give placebo or they give nothing, and
there's a difference between those.

But most clinical trials,
this is what happens And yeah.

Nick Berry: Placebo effects.

Interesting.

It's like you take statistics in high
school and you don't know anything

about statistics, but they're teaching
you about placebo effects and it's

like a immediate thing, and so you,
you just assign sort of improvement.

Without, uh, getting an active drug to
the placebo effect without assessing

the population or thinking about
sort of the populations going in.

It's just, oh, getting the drug makes
you think you're gonna get better,

so you get better in the placebo.

Getting the placebo makes you think you're
gonna get better, so you get better,

which is, is usually not what's happening.

Scott Berry: And so, um, I, I, the
misunderstanding of this has an effect.

And, and let me give you an example
that happened to me a couple weeks.

A company collected single arm data.

And what that means is they don't have
a placebo, that they run a trial where

everybody gets the treatment and they
have a baseline marker of severity.

And by the way, that's the
outcome measure, but that's

also measured at baseline.

It's similar to their first
round in a golf tournament.

Within the setting and they collected
it across a wide range of patients and

they noticed, wow, patients that have
more higher baseline, they improved

more when we gave them our drug.

Now, now in, in, in the golf example
that's taking that top quant quartile.

noticing they regressed three points in
better, by the way, the people in that

golf tournament in the second to the
worst quartile got better by one shot.

The data from this company looks almost
identical to the golf tournament.

Remember, in a, in a clinical trial, we
record your baseline before you get the

intervention, and what's your score?

There's variability in those.

There's variability across time.

A single patient goes up and
down on almost every measure.

Weight, blood pressure,
uh, uh, cognitive scores.

Uh, this cardiovascular endpoint
they were looking at goes up and

down within a patient over time.

so they're now running a randomized
trial in that particular population.

They're gonna run a randomized
trial, so we get to find out,

do the placebo get better?

And I told them, you're gonna get a
very large placebo effect in your trial.

And they looked at me like.

What are you talking about?

And they, their perception was what the
investigator tells them about how good

the drug is or what we know about it.

All those things are what caused the
placebo effect and not the inclusion

exclusion criteria at the beginning of
the trial, which causes a huge amount of

regression-to-the mean, which we don't
understand and we call the placebo effect.

Nick Berry: Yeah, the placebo
effect's not a psychological.

Thing that messes with patients.

It's a statistical process
that you just described, right?

Yeah.

Scott Berry: regression to the mean
is the statistical process that we

confuse for placebo effect, which
in some cases placebo effect is

Nick Berry: real thing

Scott Berry: said, we don't
understand very well, which is which.

Nick Berry: Yeah.

Scott Berry: we don't understand
very well at all, which is

which, uh, within that setting.

So they made huge drug development
decisions to enroll that particular

population from a single arm trial
without a control, not understanding

that by the way, the people on the
good side might actually be the ones

that have the bigger benefit relative
to a control, and now they're jumping

into a very, very large trial.

Do they understand this particular
effect or are they making a bad decision?

The beauty of it is the randomized
trial is gonna tell them,

Nick Berry: Yeah.

Scott Berry: and we're
gonna figure that out.

But are, do they understand it enough?

Enough to make good
decisions at this point?

Other examples that show up
that regression to the mean

is critically important.

We run a lot of trials and we
look at subgroups of patients.

We have a trial.

We have a trial designed
now called a basket trial.

the baskets in a basket trial is,
they're different kinds of patients.

And a very common one of these is
we have a treatment for, uh, cancer

and we enroll, uh, different.

of cancer.

It might be head and neck cancer,
breast cancer, lung cancer, GI cancer,

uh uh, all of these different types.

Or they may be subsets of kinds of cancer.

And we run trials in them and
we go into eight different types

and we find out, aha, these two
types we had the best effect in.

And.

look at the, uh, estimate from the
two best outta eight, and we run

a trial after that, or we try to
estimate the effect in the eighth.

The best one of the eight and
the data were, um, uh, we, the

number of responses we got.

Suppose we're looking at a cancer
trial and we look at how many patients

responded, and we got 10 out of 20
patients, and the best responded.

And all the other types had
worse responses than 50%.

I, as a statistician, don't believe
the response rate's 50% that one.

It's exactly the same as
the golfer that shut 78.

I don't believe they're 78.

They're better than that.

This 10 out of 20.

I would estimate their response
rate to be closer to the mean

response across all tumor types.

Depending on the variability of that
response and the variability of 10 out of

20, which is statisticians we understand.

Nick Berry: Yeah, In your hierarchy
model, like we talked about with the golf

example now says there is some variability
across the cancer types and we know that

it might actually work better in some,
but there's also a lot of variability

in how many responses you're gonna
observe about 20, uh, on 20 subjects.

And so 20 is not that many.

We don't know a lot about how
well each of these cancers or how

bad each of these cancers are.

So we shrink back and we say a lot of
the variability in this was due to there

only being 20 patients per cancer type.

So just pull everything
back towards the middle.

And if you're predicting what's
gonna happen in phase three,

you're not gonna predict 50%.

And if you do, you're gonna cost yourself,
you know, millions of dollars potentially.

So you just shrink it all the way back.

Really close to that.

Mean probably if you only have 20.

Subjects, you, you probably don't
vary very much from the, the

overall mean of all the types.

Scott Berry: Yep.

Yep.

Uh, we see lots of this
happening where we have units.

Uh, in the case like this, we have
trials with multiple arms, many doses,

um, and the effect on one dose, should
never estimate the effect on one

dose without using the other doses.

be a hierarchical model, it
could be a dose response model.

We have endpoints.

We collect a lot of endpoints in clinical
trials, and we analyze 12 endpoints

in a particular disease where we
think a treatment may have an effect.

this one did the best out of the
12, and this is like the 1977.

Halfway through the
season we see this effect.

Now we're gonna run
the rest of the season.

Do I think that effect in
the 12th endpoint's gonna

continue at the effect I see.

Or is it gonna shrink towards the
other 11 endpoints within that?

It's very similar scenarios to sports
where it seems almost obvious in that

setting, but a lot of these things
aren't naturally done in clinical trials.

Publications provide this single
estimate, and it's up to the consumer

to do that themselves, which.

I think experienced people in this
industry do that, and they understand

that there's this whole d, there's
this whole, uh, I don't, it's almost

controversy about the failure for phase
two trials to replicate in phase three,

and a lot of the things we just talked
about are reasons why that doesn't happen.

But the phase two process itself.

Is relatively small sample
sizes, so the variability in that

measurement, it can be large.

And we run hundreds of phase two trials.

What phase three trials do we run?

Those ones that are doing really well.

And, and there's a natural part to this,
and there's many phase two trials, and

you see the extreme right tail of those
that do well, they run phase three.

Lo and behold, the effect is
not reproduced in phase three.

it's completely gone.

Nick Berry: Yeah.

Scott Berry: sometimes it's just smaller.

It's regressed into the,
the average of the effects.

And people think that this
is a con, controversy.

The irre, we're not, this
isn't reproducible science.

What's wrong with what we're doing?

It's regression to the mean.

Nick Berry: A huge problem with that.

Yeah, yeah, yeah.

Scott Berry: a regression to the mean.

Nick Berry: A huge problem with
that though is that now we're

powering our phase three studies
based on the effect observed on

the best dose in phase two, and.

If we're regressing our estimate, the
truth is probably not as good as that.

You have an underpowered study and
when you inevitably observe, you know,

80% of the phase two effect in phase
three, you have a p value of 0.07

or something like that.

And so it's, it's not just
a, oh, our estimate was off.

I mean, this can have huge
implications in the, this sort

of life cycle of a, of a drug.

Scott Berry: Oh, I, I hope there
were lessons there, but let me

turn it back to you, Nick, because
I know this is a topic we talk

about with a lot of people and, um.

Your friends, uh, heard this topic
and they're in personal finance

Nick Berry: Yeah.

Scott Berry: what does the
regression to the mean mean to them?

So what would you say to, to
people outside of drug development?

Uh, outside of sports,
what does regression to the

Nick Berry: Yeah,

Scott Berry: to somebody who's doing
personal finance and buying stocks?

Nick Berry: yeah.

Yeah.

I think the general lesson that I would
would say to this sort of lay person

is that anytime you observe something.

Extraordinary.

And you get, uh, something that's,
you know, you've never seen before,

like this is an amazing result.

You just smooth over that and you
realize that there were maybe a lot

of opportunities from other places
to observe this amazing result.

There's a lot of stocks that could
have performed really well, and when

you start to, to base decisions off of
recent amazing things that happened.

You are inevitably setting yourself up
to be disappointed if you expect that to

be reproduced and to happen over again.

So when you observe something amazing,
just smooth over it, realize that it

might be good, but it's probably not
actually a world beater in that case.

And so just temper your expectations
every time you, you, you start to

make predictions based on past data.

Scott Berry: So if, if there are 50 stock
brokers in a company and in a particular

year, one person is the best stockbroker
and they earn a per certain percent.

Nick Berry: Yeah.

Scott Berry: best would be.

30, 30% on

Nick Berry: Huge.

Scott Berry: over a course of a year.

And then the next year they don't do 30%.

Nick Berry: Yeah, they fell off.

Yeah.

Scott Berry: they lazy?

They, you know, the,
the media scrutiny got

Nick Berry: Yeah.

Scott Berry: Yeah.

It, that they actually were never
that good and they, the data got

them there through randomness.

Now they may be better than average.

How

Nick Berry: Yep.

Scott Berry: depends on the
variability in stock pickers and the

variability in a particular year.

Nick Berry: Yep.

Scott Berry: the lessons are in
drug development, the lessons are

in sports, in in picking stock.

So this was our lessons learned for drug
developers from the world of Sports.

Episode one we

Nick Berry: Yeah.

We'll see you later.

Scott Berry: Yeah, we have more to come.

So.

Uh, Nick, thanks for joining me.

And

Nick Berry: Thanks for having me.

Scott Berry: we are in the interim.

View episode details

Listen to In the Interim... using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

Drug Developers' Lessons from Sports: Regression-to-the-Mean

Subscribe