Episode 24

STEP Statistical Modeling

August 11, 2025 · 33:43

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: All right.

Welcome everybody.

Back to in the interim today I
have a really, uh, cool topic.

Uh, I was really excited about this
and the, the topic is a new kind of

trial design and we'll get into what's
new about it, but the old version of.

Powering the trial at 90% running
it and finding out was the trial

successful or not, doesn't work here.

So we'll introduce that in a minute,
but first I want to introduce my guests.

So, uh, two guests from
Barry consultants today.

Uh, Liz Lorenzi, Dr.

Liz Lorenzi, a senior statistical
scientists at Berry Consultants, and Dr.

Amy Crawford, a statistical
scientist at Berry Consultants.

Welcome both.

Liz Lorenzi: Thanks.

Thanks for having us.

Scott Berry: So let's start,
let's start with introductions.

Um, I was told that we started one of
these podcasts and I didn't do a very

good job of introducing our, our guests.

So let's, let's do a little
bit of introduction, Liz.

Um, you've been, tell us how long
have you been at Barry consultants?

Liz Lorenzi: I've been at
Barry, uh, I think six years.

Actually.

I started on July 1st, so I just
hit my six year anniversary.

Uh, before then I was at Duke
doing my PhD in statistic.

Um, and since being, being at
Barry, I've been really enjoying

participating in lots of different
clinical trial designs, all of which

are adaptive, innovative, exciting.

I have been a part of a lot of design
teams for platform trials, so I have

found a particular interest in working
on the design of platform trials,

which I think we'll be discussing
an example of one of those today.

Scott Berry: Yeah.

Yeah.

Very nice.

And, uh, so, uh, Liz, do you
consider yourself a Bayesian?

Liz Lorenzi: Oh, I do think
I'd consider myself a Bayesian.

Yeah.

I think I have to say that going
to Duke for my PhD, but, but I do

think, uh, of myself as a Bayesian.

Mm-hmm.

Scott Berry: and you'll get a flavor
of that, I think today as well.

And Amy, um, uh, tell us a
little bit about yourself.

Amy Crawford: Sure.

So, um, I've been with Barry
for just over five years.

I started almost exactly a year after Liz.

Um, I've also been working on
some platform trial designs.

Um.

But I've also, I've also been working in
stroke quite a bit, so, um, which I think

we're, sorry, Scott teasing the topic.

Um, getting, gonna talk a little
bit about a stroke trial today.

So it's a particular clinical interest
that I've had since starting at Barry.

Um, before, before I was at Barry, I was
doing my PhD at Iowa State University

working on something very different.

Um, not as Bayesian of a, of a
group there, but I, I think I

also consider myself a Bayesian.

Um, yeah.

Scott Berry: Yeah, and, and,
uh, we'll, we'll see some of

that flavor today as well.

Okay, so the, the, the topic for today
is part of the step platform and a,

a, a shameless plug for everybody
to go back and listen to the podcast

with Eva Mystery and Jordan Elm.

It, we go through the, the big
picture of the step platform.

It's a, it's a fabulous effort
and it's a, it's a really nice.

Potential of what platforms can be.

And I think the NIH is actually
a place that platform draws

can be incredibly powerful.

So the, let me, let me not introduce
this and, and Liz, do you want to tell

us just overall the step platform,
the goal of the step platform?

Liz Lorenzi: Sure.

So Step is a platform trial in
acute stroke, and this is funded

by the NIH and kind of organized by
a big group of stroke researchers

under the umbrella of stroke net.

And the goal of this platform trial
started as in one of the areas of

answering the question of who do, who,
uh, what population does EVT benefit.

So about 10 years ago, they learned
that this therapy called endovascular

therapy, it's a, a clot clearing
device, um, was very effective in

a particular patient population.

And since then there's been numerous
questions of whether that we can

expand that patient population and try
to learn whom, uh, EBT can benefit.

And so one of the major questions
under the step, uh, platform is

to learn which patient population
can EBT be beneficial for.

Uh, there are other questions
that step answers, um, more on

the medical, uh, therapeutic side.

So thinking of different.

Uh, other treatments that are a little
different in the question of maybe

more traditionally trying to answer,
is this therapy better than a control?

Um, but I think the one that we're going
to focus on today is this question of EBT.

Um, and it's a very cool structure
to have a platform trial where gonna

have this master protocol, uh, guiding
the enrollment of patients, guiding

the inclusion, exclusion, guiding
kind of, uh, organizing sites in

the infrastructure of the trial.

And adding a lot of efficiency so
that we can answer this question.

Scott Berry: So the, the in the step
platform, it's, it's structured so.

An individual, a team can propose
a question to the platform.

The platform has this larger master
protocol aspect and questions get asked.

And a huge goal of this as, as Liz
described, is finding out who does and

doesn't benefit from endovascular therapy.

And we're gonna say EVT
over and over again.

So this is the, the, the
clot clearing device.

Within the structure of this,
when this question comes into the

platform, there's a design team.

So the design team is part of the master
protocol, and it involves, uh, scientists,

stroke, neurologists, statisticians.

So the three of us are
part of this design team.

There are, there are many
others within this design team.

Uh, several others at Barry
consultants, several at Kansas.

Kansas.

University Medical Center, uh,
several individuals from there,

MUSC at Jordan Elm with that.

And so this is, this is the design of
that part within the master protocol.

So in the setting where.

All trials historically were run
where, and and they really optimized.

If you're running a trial and you wanna
show endovascular therapy is beneficial.

Historically, they ran trials in
those patients that were really, from

their perspective, the most likely
to benefit from endovascular therapy.

Relative to the control here
is just a, a background medical

management without the, the device.

And so they, they ran trials in that
population that they thought were optimal.

They powered it for that population and
demonstrated endovascular therapy was good

and that's within a certain population
and it was incredibly effective.

In that.

So now I think it, what, what
neurologists in the medical community

are stuck with is, okay, here within
this optimized population, this, this

therapy's tremendously beneficial in,
in the neurological status of patients.

90 days afterwards, they,
they incredibly beneficial.

So let's figure out in those we
don't know if it's beneficial.

Um, whether it's beneficial.

So now the first, first question came
in one we're really gonna, uh, focus on

is we'll focus on the first question,
which large vessel occlusions were, were

generally thought that EVT was beneficial,
but how about smaller size strokes?

So can you set up a little bit of the.

Patient population, and I know
we're all statisticians here, Amy,

but a little bit of the patient
population that this question now

is, does endovascular therapy work?

Amy Crawford: Sure.

Yeah.

So like you were saying, Scott, there's,
there's this, um, set of patients where

we know the answer and I like to think
about it as you can sort of draw a box.

Around, around that
set of patients, right?

You, you present with a
stroke at the hospital?

Um.

And based on how you present, you know,
the, the doctors figure out whether

you should get EBT or, or should
not, and, and that, that indication

is relatively narrow at the moment.

Um, and so what we're looking at here
is for smaller vessel occlusions.

Um, I think, uh, we're, we're
thinking about what we call

medium vessel occlusions.

What about patients that
are just outside the box?

What if you present to the hospital
and you have a baseline characteristic

of your stroke that doesn't quite fall
inside the box, a median vessel occlusion,

uh, you fall just outside the box.

And the question is, you know, should
you get, should you get, um, EVT or not?

And that's kind of the, the, the
question we're trying to answer.

So these patients, um, where we're, where
we're trying to answer this question for,

um, they, they have a little bit, um, uh.

Sorry.

Their, their strokes are a little
bit different than those that

are part of the indication.

And I'm, I'm gonna fumble
the, the clinical piece here.

Scott Berry: And that's okay.

Where you get, you get a lot of
leeway, uh, in the clinical piece.

Yep.

Amy Crawford: So, um, they, there are
a couple baseline characteristics that

we look at, um, at for strokes when
we're trying to answer this question.

Uh, one of them is time last known well.

So patients who present earlier,
um, after they were last

known well with their stroke.

Um, we think that maybe they, they benefit
more from EVT, um, in this population

than patients who present later.

And so the question is, you
know, what should we do for

patients that present later?

And there's another baseline indicator
called, um, the NIH Stroke Scale.

Um, and so on this NIH stroke
scale, um, you, you present

with a stroke, you get a score.

So certain patients with certain
scores are indicated for EBT in this

population with median vessel occlusions
and, and certain patients aren't.

So the question that we're trying
to answer for these patients is,

um, you know, what, what kinds
of baseline scores on this scale

should we give, give the therapy to.

Scott Berry: Okay, so let's, let's set
The problem up here is we, we're now

going into medium vessel occlusions.

We're enrolling people.

Let, let.

I know papers have come out
that have changed this, and

so we're updating the design.

But let's go back to the first version
of the design for, for simplicity,

where it was medium vessel occlusions
and the, the thought was that.

It, it's the, the relative efficacy of
EVT compared to medical management is very

likely to depend on their, the clinical.

Symptoms, which is the NIH stroke scale.

So this is a stroke scale from
one to 30, say, uh, I think it

actually goes a little bit higher
than that, but those are very rare.

Was it 28 maybe?

Uh, that the scale, so this,
this is the, their, their range

of symptoms and really the size.

This is a, a, a proxy for.

The, for a medium vessel occlusion
stroke, the clinical symptoms are maybe

telling you how much brain there is
to save through endovascular therapy.

Now, all of these patients
are within 24 hours.

Last known well, so the, so the statistics
problem here that trial design is we

wanna find out who in this range is
a function of their stroke scale from

one to 28 benefits from EVT or not.

Now what would be wrong, Liz?

With us running a traditional trial
and saying, let's enroll everybody

in there and do a test, whether EVT
is beneficial in that population.

Liz Lorenzi: Yeah, so you know,
we believe, we know part of

this population, you know, from
these previous trials benefit.

If we just say, let's expand, run a
giant trial, randomized 1:1 and at the

end test we might, we would likely get
a null result because likely there is

heterogeneous treatment effects within
this population where some are going to

benefit, some will likely not benefit
and medical management might be better.

And so what we really wanna
do is try to learn, um, based

on patient characteristics.

How patient or who, which
patients are benefiting from EVT.

So if we think about this, N-I-H-S-S,
as one of the covariates we could

use to try to stratify the patient
population, we might wanna learn

within each bin of N-I-H-S-S what the
treatment effect of EVT might look like.

Um, but the question isn't necessarily
what is the treatment effect in the bin.

But it's really is EVT better than
medical management and it's more of a

generic question of, you know, which
of these bins should benefit, which

of these bin bins may not benefit?

Um, so it's really kind of
learning where that change might

occur in the patient population.

Scott Berry: Yep.

So you, you said the word change, which
is gonna be a key word, let's not,

maybe not jump to change quite yet.

Um, within this, so if we enroll
this entire population and we

run a test and find null result.

We might say, oh, in this population
there's no benefit where there

may be people in that population
that benefit a great deal.

So the mathematical problem here is we
have a scale from one to 28, and we're

trying to find out for those patients who
on that scale, benefits and who doesn't.

what somebody could do and to attack
this problem if we're stuck in the old

way of running a clinical trial where
we run a trial and we test a hypothesis

and it's only upon rejecting the null
that we say, oh, EVT is beneficial, which

was, which were done in the first ones.

We might go at this and say,
let's just take the big ones.

And this was largely what, when
people wanted to originally

show it was beneficial.

They did that, but they
took the big strokes.

And so let's take everybody that's 20
above and test to see if EV t's better

in that scenario, and then if that's
successful, maybe then we go down to 15.

You can imagine how slow science
would move if we're restricted to run

these Yes, no questions in the trial.

So we're gonna do something different in
this trial, and we are going to enroll

everybody in this patient population,
and we're gonna try to estimate who's

better and where does EVT work or not.

Now we're gonna use the
scientific information that on

the bottom part of the scale.

Um, that medical management is likely to
be better on the top part of the scale.

EVT is likely to be better, and
somewhere there's a change point here.

So tell us about doing this with a change
point model in trying to estimate that.

Amy Crawford: the idea here
is that, um, we want to figure

out where to draw the line.

Who, who should get EBT, who should not?

And so the I the idea is if we enroll
everybody, um, across this scale

where we don't know the answer.

We can let the data inform
where the line should be drawn.

And we do this in a structured way
because as Scott said, we know that

on the higher end of the scale,
somewhere out there, EVT is beneficial.

And the question is, you know, how far
down can we drag that line down the scale

to where EVT is still beneficial for
patients that lie above the boundary.

And, and we let the data
tell us what that looks like.

Um, and, and the way that we do this, like
Scott said, is with a change point model.

And what the change point model
does is it, it looks for that single

point of where we draw the line,
where we, where we draw the boundary

to expand the indication of EVT
along this baseline covariate scale.

Um, and, and the data informed that.

And so we, the, the really beautiful
thing about a change point model.

Is, um, it, it allows us to make these
informed decisions in a structured

way so we don't have to test, you
know, you could, you could think,

um, you know, maybe if you're coming
at this, um, for the first time and

you wanna say, oh, does it work?

And.

Bin, you know, 28, does it work in bin 27?

Does it work in bin 26?

You know, you can think about,
well, let's test it in bin 28.

Let's test it in bin 27.

Let's bet test it in bin 26.

It's going to be another extremely
inefficient way of doing this, right?

And so using all of the data and, and
the neighboring bins inform each other,

all, all of that data to figure out
where the single point, the line should

be drawn, um, is just a really nice
framework for how we answer the question.

Scott Berry: So within the model, if
we learn and we get a lot of patients

near 1516 and EVT is doing better.

It tells us that change point is to the
left of that, and 20 is also better.

25 is also better.

So in some way, we're trying to figure
out this one place where it shifts from

medical management to EVT in the setting.

Now, um, we, we use a
Bayesian model for that.

Um, not surprisingly, I'm
talking to two Bayesians here.

Um, they wouldn't let me get away
with a frequentist model here.

So we're using a Bayesian
model, uh, in this setting.

Now, Liz, it didn't really work
when we had a single change point.

And you know, we thought of that,
that break point of where it goes.

Why didn't a single change point work?

Liz Lorenzi: Yeah, the, the single change
point assumes that to the right there will

be a benefit of EVT, and to the left there
will be a benefit of medical management.

And what we were learning was
that it might be a little bit more

nuanced than that, where there
might be a subset of patients where.

There's essentially no
difference of the two therapies.

It's essentially like a, a coin flip.

And so by ch changing it from one
change point to two, we're allowing

ourselves to kind of find this period
or this, this area of somewhat equality

or a similar effectiveness, and then
we're allowing to say to the right of

that, that's where the EVT would be.

So to the right of the right change
point, that's where EVT is better.

Between the right and the left
change point, they're equal.

And then to the left of the
left change point, we think

medical management is better.

so it allows us a little bit more
flexibility to add that kind of piece

where we don't actually think there's
a big difference between the two.

Scott Berry: So we were simulating and
we discovered a bit of the issue with

one, and in part it was the, uh, the, the
old bugaboo for us, the null hypothesis.

So if you simulated a scenario
where it's flat in a, in

across the entire stroke scale.

The model doesn't really work well,
and you can imagine it's, it's

almost non-identifiable that the
change point could end up any one

of these places and maybe it ends
up right in the middle of that and

it, it, it somewhat is misinformed
by saying it has to be positive on

the left and negative on the right.

I think I said that backwards, but, uh,
that it's, it's beneficial on one side

or the other when there is this range.

So when we simulated that, we found
it really didn't work very well.

So we created two change points.

Amy.

Yeah.

Amy Crawford: Yes.

Yep.

So we created two change points and,
and what happens then in the null, um,

scenario that you were just describing,
Scott, is it allows those two change

points to, to scoot away from each other
if that's what the data are, are saying

should be the right answer and, and
allows for that region of equivalence

or equality that Liz was mentioning.

That we would estimate between
the two change points to cover

the entire scale, um, of that
covariate that we're modeling over.

And so it, it allows us to handle
this null scenario, um, much more

intuitively than if the model were to
have, to just try to figure out where

a single change point would need to be.

And the data look the same
across the whole thing.

Scott Berry: And then we're,
we really care about the

place where Ev v t's better.

If they're the same, you
wouldn't do this invasive device.

So it's, it's somewhat of
a statistical and clinical.

Better fitting of the data
and it functions much better.

Another value of clinical trial simulation
to carry out a range of scenarios, even

if we don't think it's very likely.

And, and, and we've created
a better model in that.

Okay.

Tell, tell me a little bit about the code.

Um, and, and, uh, I think, uh,
Amy has carried the brunt of

the, the, the code building here.

Um, now can we just create our code
to run this, to run simulations?

That didn't really work.

Amy Crawford: Yeah,
that didn't really work.

Um, so.

When we, when we write code in R or um,
one of these other languages, what we do

when we run Bayesian models is we tend
to use some, some built in packages.

Um, statisticians listening have
probably heard of Stan or Jags.

Um, and And these built in packages
don't necessarily facilitate.

Uh, learning the type of model
that we're describing here.

Um, and so we've had
to create custom code.

We, we wrote our own, um, our own sampler,
our own, our own code that fits this

model, um, to run custom for this trial.

Mm-hmm.

Scott Berry: So, and we wrote that in
a, a lower le you wrote it, we, we, uh,

you wrote it in a lower level language.

You wrote it in c to then be called for
simulations and, um, um, within that

circumstance, and it's, it's a Bayesian.

Model using Markov chain model where
the unknown is the, the location

of the, the change points within
that beautiful, beautiful code.

Uh, and, and there's, I'm ignoring a good
bit of work to get to the right place and

all of that, which was, which was fun.

Um, uh, in that.

So now we're running
the, the trial design.

In this circumstances were.

We, we may, as the data start to
come in, we don't want to enroll

2000 patients and then analyze this.

So that's the model that will be fit.

But what do we do with the design?

So what's the design that
goes with that model, Liz?

Liz Lorenzi: Yeah, so
this is in an adaptive.

Trial framework.

So we, through the step platform,
we'll have interim analysis that occur.

At that time we would gather the current
data, run the analysis model, and then

evaluate against some pre-specified
decision rules to just see if there's

any conclusions that could be made.

Um, and what's nice about this is, you
know, we have a lot of information on

the edges, like we've talked about.

So.

You know, we may not need that
many patients to inform those bins

that are, you know, right outside
that box that Amy described.

And so by doing these adaptations, we
may be able to get some, uh, results

and conclusions out more frequently,
change, uh, practice, um, facilitate,

you know, better learning of this
question through these adaptations.

So, um, I think we have them
set to be done quarterly.

So as we're enrolling the trial every
quarter we would gather the data.

Run the model.

Um, and that would be done by a
blinded team, so, or an unblinded team.

So we will have a separate team
from the three of us that are the

Baker Berry design team that will
be, um, unblinded to the data and

firewalled from us on the design part.

And they would be the ones that
get the data, run the model.

And if there are decisions, they
could then make a public announcement

through the DSMB, et cetera.

So I think that's currently the,
the way that we have this set up.

Scott Berry: So this is, I mean, this
is really cool in the way of, we,

we've got this model and as patients
come in, it's a, it's a wider frame

of, we don't know who EVT benefits and
who doesn't, but as soon as we learn

that, we stop enrolling that group.

We announce that, uh, for patients,
medium vessel occlusion with NIH stroke

scale above 18 EVT is beneficial.

We publish on that and meanwhile, those
patients aren't enrolled and we're

narrowing in across this scale on who
should get each particular treatment.

And one of the things that was wonderful
when we were doing simulations of this and

we're characterizing how well this worked,
is it's not, it's not just about power.

But what fraction of the patients, when
the trial's over are we going to treat?

Well, and how are we going
to treat patients when this

trial's over across this scale?

Now we, we might give
them medical management.

We might give them endovascular therapy.

We might give them the wrong
therapy that they would've

benefited or they're harmed by it.

And we can create scenarios of truth,
run this trial, and then evaluate

how are we caring for patients when
the trial's over, which is such

a cool operating characteristics.

And so.

Targeted to the trial.

Uh, so that, that was, there were many
of these, but that was one of the cool

things we evaluated in the simulations.

So science pulled a, pulled a bit of a, a
switcheroo on us and, uh, this is a very

active area of research, not surprisingly.

And so new trials have come out where.

Uh, medium vessel occlusion
trials have come out.

We're really questioning whether EV
t's beneficial, but it has given us new

information in this, in this population
that a key parameter is very likely time

last known well, how acute is the injury
is probably related to endovascular.

So now this has become.

Not one dimension of NIH stroke
scale, but two dimensions in it.

And we have a new submission to step,
and I don't know how public that is, so

I won't describe the clinical context,
but that one is also endovascular

therapy versus medical management.

And it's within the realm of the
patient population we don't know,

which is also two dimensional.

And it's the same structure
that on one side of the scale.

Uh, we know EVT works and in the other
side we know, uh, uh, at the very extremes

that medical management is better off
of the, the patients we're enrolling.

So we know that direction
in two directions.

So this is, you know, we know up
into the right is one treatment and

down into the left is the other one.

We just have no idea where
the break is in this.

So now the model's gotten a
little bit more complex, Amy.

Amy Crawford: It has.

Yeah.

So, um, our model now actually
has four change points.

Uh, like you kind of described Scott.

We're, we're working on a grid.

Um, and, and so across NIH stroke
Scale, we're trying to figure out,

um, within patients who present.

Early, you know, it's
less time, less known.

Well, um, you know, maybe they
should be treated differently than

patients, uh, that present later.

Um, they, you know, they, it's
been a longer duration since

their time last known, well
since they've had their stroke.

And so what we're doing now is
we're trying to draw lines on.

On a two dimensional grid instead of a
one dimensional, um, baseline covariate.

And the way we do that is with
four change points, and they're

restricted and, and, and they,
they move around with each other.

But, but I think the big kicker
with, with this, uh, more complex.

Framework is that it follows
clinical belief and knowledge.

And we've been able to adapt and update
what we're doing based on what is being

presented in, in changing practice.

And so now we're, you know, we're,
we're shifting our, our approach

and it still is going to follow
this structured data-driven

decision-making framework and the model.

Matches the decision framework that
the clinicians and the medical world,

you know, would expect to see based
on, based on these recent findings.

Scott Berry: Yeah, that
is, that's so cool.

Um, and now Liz, you are going to jump
into one where it's not necessarily

two change points, but it's a.

Change curve perhaps in two dimensions.

And so this, this idea was
presented just this week that

we have a two dimensional space.

Think of, uh, quantitative variable
on the x axis, quantitative

variable on the Y axis.

And we, we.

We know what's in the upper left and
we know what's in the bottom right, and

there's a curve that we now have to fit,
and there's mono tonicity across this.

So you're jumping into a pretty cool
problem with a bit of uncertainty

how this is all going to behave.

Liz Lorenzi: Yeah, and I, I think
what we learned from Amy's, you know,

development of the original change point
model is that this really needs to be

simulated and it needs to be discussed
and, you know, constantly put in front

of the clinicians to understand what
they're thinking and what they're

expecting, and how we can make sure
the model aligns with that question.

I, yeah, I'm not sure if I know exactly
what that model's going to look like,

but I think, you know, as we continue
discussions, we'll have a, a clearer

picture and simulation will be our, our
friend in these, in these conversations

to make sure that we get everything right.

Scott Berry: Yeah, we would be
incredibly fearful if we just kind of

put a model together and the first time
we use that is in the actual trial.

So we simulate millions of data sets.

We see what it looks like under a range of
scenarios and make sure it functions well.

Because we don't want the first
time we use it to be the real trial.

We want that to be the 10000000th
and third time that it's used.

It's just been used on
the simulated data sets.

So we will, we will, we will
drive you crazy Liz, by test

driving this over and over again.

But it's sort of the fun of science here.

All right.

So we, we, while they are
enrolling, actually, I, I, I

believe there's a bit of a pause
in enrollment, um, in this setting.

So we are actively working on this.

The step platform is a,
is a very cool platform.

Uh, by the way, if that kind
of modeling interests people.

There's another NIH trial ice cap
that just stopped enrolling that has.

Analogous models.

It's a different model,
but as an analogous model.

And that we'll be reading out
sometime in the next few months

if that's kind of interesting.

And then hopefully we'll
get step results out.

So, uh, Liz and Amy, thank you for joining
us in this weird place in the interim

and sharing this really cool modeling.

Liz Lorenzi: Yeah.

Thank you so much for having us.

Scott Berry: Yeah.

Amy Crawford: Thanks Scott.

Scott Berry: I appreciate it.

So everybody out there, thanks for
joining us here in the interim.

Until next time.

View episode details

Listen to In the Interim... using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

STEP Statistical Modeling

Subscribe