← Previous · All Episodes
External Data in Clinical Trials Episode 8

External Data in Clinical Trials

· 28:35

|

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: All right, welcome.

We are back.

Uh, in the interim, this is
Barry Consultants podcast of all

things statistical, all things
scientific with clinical trials, uh,

medical decision making, uh, and.

We have an interesting podcast today
in that I don't know the topic, so

my, my co-host on in the interim,
uh, Kurt Veley is here and he has a

surprise topic to talk about today.

Kurt.

Kert Viele: Hey Scott.

Um, alright, so I wanna
start with a story.

I went to a conference not that long
ago, and standard statistical conference.

People are talking about the
data they had, all the methods

that they've been developing to.

Understand it, what they've learned from
the data, and essentially all of these.

Talks.

They had a really interesting structure,
so they, they get through all their

methods and everything, and at the
end of every single talk, it was ended

with, if only I had gotten a chance to
design the data, design the experiment,

design the data, do this in advance,
everything would've been better.

I would've avoided all
these problems and so on.

And so I'm listening to all these topics
and when I get to the, my talk, of

course, I started it after listening
to all this is I live in utopia.

And then talked about experimental design.

And so I, I get back in the car on the,
on the way home, and I'm sitting here

thinking about, so am I really in the
good place or am I in the bad place?

Because one thing that's always
impressed me about the last 20

years in experimental design.

Is all of these methods that we're
developing to understand data,

causal inference, all of these,
these aspects of, of how do we make

inferences from data in front of us?

We don't use those in experimental design,
so we often have an idea that when we

design a trial, we're to ignore every
bit of data that's ever existed on Earth.

If I go to my doctor and I ask,
why are you giving me a drug?

They're gonna report lots and lots of
studies that say, this drug is good.

If I go into an experiment and
say, I want to use this data,

I'm immediately told it's bad.

And that depends on if
I'm borrowing information.

I got a drug that's targeted for I.

A specific mutation.

I have data on thyroid cancer.

Can I use that for lung cancer?

Generally, the answer is often no.

If I wanna borrow historical data from
old clinical trials, Alzheimer's, we

have hundreds of thousands of patients
that have been treated on placebos.

Can I use it?

Well, there's problems with that.

So on real world evidence, so I, I would
actually define the standard debate

that we've been having for the last
15 years is essentially, is data good

or is it bad in experimental design?

So does it lead us to a
good place or a bad place?

So I'd like to talk about where
we are, how we got here, and

where we think we're going.

So I'm gonna leave it up to you.

I've surprised you with a topic
and see what your reactions are.

Scott Berry: well, well let's, let's,
let's sort of figure out the topic

and the interesting part of this.

So what, when you say is data good or bad?

Do you mean that When I design an
experiment, I'm saying I want to collect

the following data and I can analyze
just the data in that experiment,

which we've been doing for a long time.

Uh, frequentist, uh, rarely
Bayesian, but we, we do that.

You're saying by data, is
that outside the experiment?

Is there any room in that experiment?

For external,

Kert Viele: Yes.

So what?

What do we do with essentially
the totality of human knowledge?

Prior to the experiment, should
that enter into the experiment

in any way and wanna avoid this?

I mean, there's lots of
rhetoric we could do with this.

If you want to, if you want to say bad
things about this, you talk about biases

and confounding and everything else.

If you wanna say good things, you talk
about totality, the evidence, but really

let's get at, you know, what is gonna
lead us in good and bad directions?

How do we decide when to
do this and when to not?

Scott Berry: Yeah.

So let's, let's, let's lay out why not.

Why?

I mean, why, what are the reasons
people give that we shouldn't

use other data in our experiment?

Kert Viele: Obvious answer would be.

It, it could lead us
in the wrong direction.

So the, uh, experiment might have
been done with a different set

of patients at a different time.

There are key differences between that old
data and what my current experiment is.

If I use the new data or use the
old data in my new experiment.

I can get, uh, answers that
are biased in certain ways.

I can draw wrong conclusions.

I can say drugs work that don't simply
because of the information I'm bringing

in rather than the experiment itself.

Scott Berry: So is this, is
this a frequentist issue?

Is this that we calculate the operating
characteristics of the new experiment?

Type one error is only the new experiment.

And if I use any data outside
of my experiment, you can

inflate type one error.

You can get bias, all these
bad terms, uh, uh, in it.

Is it a frequentist problem
that data outside the experiment

all of a sudden type one error
means something very different?

Uh, bias means something very
different, uh, that I can't

use stuff and be a frequentist.

Kert Viele: So I don't know
if it's a, I think there is a

frequent DYS Bayesian divide.

I think Bayesians are trained with the
idea of collect some data, update your

beliefs, collect more data, update your
beliefs, and it's a natural thing to do.

I.

To bring things in, but I don't know if,
if I were to say that the problem is not

being a frequentist in terms of our, do
you care about long-term error rates?

I would lay the problem at the
feet of, you have to have this 2.5

type one error.

In the worst case scenario.

So in effect, we're playing
minimax against nature.

We have to assume that nature has deceived
us for the last 25 years, and then in

that case, we shouldn't use the data.

So, uh, on the one hand, that delivers
an immense amount of robustness.

On the other hand, we're
reinventing the wheel.

Scott Berry: Yeah.

Uh, so every time, so let, let's,
let's talk about a case where we would

use external data, and we have, uh,
we've designed trials where we've used

external data and what that looks like
and what could be the potential problem.

So, um, suppose.

I've got results from another trial,
a phase two trial about the relative

efficacy of a treatment, and I
bring it into my next experiment.

And I use that as prior knowledge.

And when I'm done with my new
experiment, I combine them together.

I.

Or I bring in previous
data on the control arm.

Uh, I'm comparing to an active
comparator, something that's approved.

It's standard of care, and I wanna
run a new experiment of my new drug.

And I want to compare to standard of care.

There are lots of trials and lots of
information about standard of care.

Why do I have to run a one-to-one
randomized trial standard of care to my

therapy when I know so much about it?

So I might bring in that information
specifically into the trial

and e both of those notions.

Can cause issues in my new
experiment essentially.

If it's wrong, I if that is
different than my new experiment.

Statisticians are great at
saying, well, boy, if that data

was a little bit different.

Now your new experiment.

Has a type one error or bias or, or
issues that if I bring in anything from

outside the trial, it can cause issues.

So that might be ways in which
I bring in external data.

Kert Viele: are certainly, I mean,
there are cases where you are

gonna run into those problems.

I mean, we've talked in
examples and antibiotics.

The development of resistance over
time, we expect therapies to change.

There are situations where we think
that problem exists and we need

to address it or not use the data.

There are also cases where diseases
are very stable over time, and those

issues may not be as big a concern.

Scott Berry: Yeah, so I, I mean, I'm,
I, I, I'm very much on the side that

we should be using external data.

I, I, uh, I think we go about this
in a way that's just so strikingly

conservative that it slows us
down, uh, in it, that, that.

That, you know, something may happen.

I, I think it falls under the realm
that science is hard, uh, incorporating

this other information, uh, be explicit
about it, incorporate it, and the

process of this is hard, but the
notion that we don't use any of that

information just seems strikingly wrong.

Kert Viele: So I, it's an
interesting question 'cause it

certainly can be wrong on occasion.

So I, I wonder if as a society and you,
you ask about frequentist and whether

the error rates are driving this, I
think you can be a, a perfectly good and

wonderful frequentist and use historical
data, but what you're gonna have to

accept is most of the time prior data
is leading you in the right direction.

If it's not, we should just
see science altogether.

Because we have worse problems.

But in any case, if it's generally
leading us in the right direction, we're

gonna be running a few experiments that
have say three or 4% type one error.

And we're gonna be running a lot more
experiments that have say one or two.

And in the grand scheme of
things, we're still putting a

better mix of drugs on the shelf.

More drugs that work,
less drugs that don't.

So I think we have to change our mindset
to be frequentist in this kind of world.

Scott Berry: Yeah, it's a, and I'm struck
by, I, I'm struck by the idea of using

that data and in and in Bayesian we do

Kert Viele: I.

Scott Berry: borrowing
of that information.

We do things that are dynamic.

I'm struck by, there are
tons of trials that we use,

objective performance criteria.

That it's a single arm trial and the
drug has to jump a particular hurdle.

That's a number.

That number's always based on
prior data, and we do that all the

time, and presumably that's okay.

Or you run a trial where your
only, your data comes from the new

experiment, and that's your control.

But the idea that we do something
halfway between that, that we use

it, it just strikes it, people the
wrong way, just do one or the other.

And, and I, I'll, I'll, I'll give
an example of this happening to me.

Where, by the way, and, and if you listen
to, in the interim a lot, you'll find

that we, we very much respect the FDA.

The it, they're, they're, they're
the, um, uh, i I in the world.

They do tremendous good
and, and they're wonderful.

It doesn't mean we don't run into hurdles.

So we

presented a

Kert Viele: so just, just to
interrupt you there, I mean, we go

to the FDAA hundred times a year
and have three major disagreements.

That's much better than
with our wives, so.

Scott Berry: Yeah.

Yeah.

So we provide an example that we
wanted to go in with a new therapy.

It was oncology and we were,
we, the, the new trial.

We didn't want to enroll one
to one, to a standard of care.

Uh, it's easier to get patients
in if they're more likely to

get experimental treatment.

Uh, we wanted to enroll three to one,
three to experimental, one to control.

There's a good bit of information
about the behavior of the control.

So we were borrowing from previous
trials, and let's assume that,

that the prior data was a 15%, uh,
overall, uh, objective response rate.

Make it simple and, but we're
gonna enroll three to one.

We're gonna borrow on the
control using that, that prior

data that's centered on 15%.

In the new experiment because we're
borrowing data that's 15% on the control.

If the new rated, that new experiment is
actually higher than that, we're pulling

the control down, making it easier for
the treatment to look better, and we

present those operating characteristics.

That was, that was deemed to
be unacceptable because you

inflate type one error rate.

The response from the agency
was use 15% as an objective

control and don't enroll any new
controls, and you have to beat 15%.

Of course, that we generally
don't calculate the.

Type one error.

If in fact the real rate in
that experiment was 25 or 30%

because we're jumping 15%.

But it's almost the, the, the opposite
Bayesian view that we know that answer's

15% for what you're trying to beat.

And if you can beat that, that's good.

It's using a, a, a prior that's all
focused on a single value rather

than modeling it with some borrowing
and recognizing the uncertainty.

Or enroll a whole control arm,
so do one or the other, but

it's entirely this experiment.

But the idea of using borrowing or
modeling seems to be harder to accept.

Kert Viele: It's interesting,
it's almost you have to cross the

Rubicon of what are you willing to
completely make this assumption or

not make this assumption at all?

And you can't test the assumption.

Scott Berry: Right.

Right.

And the other extreme of this,
and I know, I know, I know this

one, uh, gets at you a little

Kert Viele: Don't do it.

Don't do it, Scott.

Scott Berry: Yeah.

That we talk about using real
world evidence all the time.

Digital twins or real world evidence.

So for, for control, we
bring in external data.

And people seem to be almost
comfortable that your control is

entirely external data, that's okay.

Um, but if you were to do something
like use non concurrent controls

from your own experiment, that
seems to be unacceptable despite

the fact that these are unbelievably
phenomenal, uh, historical controls.

And better than any real world evidence
you're gonna get same protocol.

You have the exact same
data, the same quality.

The only difference with them is time.

People seem really
uncomfortable with that.

But yet, at your conference you
went to, there are probably 10 talks

about using real world evidence.

It.

Kert Viele: Yep.

The, um, I mean, it, it's, there are
different legal reasons that real world

evidence is being handled in certain
ways right now compared to other.

Um, but, you know, FDA is not a, um,
it's not a homogeneous organization.

There are lots of people, the
academic community is a, is many

people, industry is many people.

Um, but I think we're gonna, if we're
talking about where we're going, one of

the key aspects of this is a hierarchy
of what's more or less dangerous.

And I'd actually like to see
us spending more time on.

Alzheimer's trials.

We've got dozens, hundreds of these.

So how much do the control arms differ
from trial to trial to trial to trial?

If we have hundreds of pieces of trials,
we should be able to estimate this.

We're statisticians.

We can estimate a variance parameter.

Let's get an idea about what it is,
and that variance parameter translates

into how much we should borrow.

Scott Berry: Yeah, and, and you,
you mentioned a key point to that if

you're enrolling some new controls,
you actually get a comparison.

You get the, you get the controls in
your experiment against the previous

controls, and you can judge some level
of, of similarity between those and,

and what's become okay.

Kert Viele: Why don't, why don't
you talk about that a little more?

About just the details?

'cause I think there's often a sense of.

the fact that there is a hierarchy
between a single arm trial, dynamic

borrowing, static borrowing, get an
idea about what those methods are

and how they mitigate those risks

Scott Berry: Yeah, yeah, yeah.

So, so the extremes on this, and I can
run an experiment where I only include

data in my new experiment, uh, one-to-one
randomized control to treatment.

And I'm only gonna compare those two arms.

Uh, in that I could do the exact
ex, uh, very much extreme from that,

where I enroll only my treatment arm.

And I, I want to know, is the
treatment arm behaving, you know.

Better than, or is the treatment
benefiting these patients?

I would have to figure out a
way to create a control arm.

I could use entirely historical data.

In that far extreme.

I have no ability to know whether that
control rates I'm using is is reasonable.

If suppose I do something halfway that
I enroll two to one or three to one.

I use the controls in that
experiment, but I also use external

controls to reinforce the control.

Now I'm getting new controls
so I can judge their similarity

to the external controls.

And statistically model that
similarity using the external data

to strength if they're similar, and
not if they're not, but I do get

evidence about their similarity.

And it's different than this extreme case
where I only use controls and I have no

idea whether, whether they're good or not.

Uh, as, as a middling ground to this.

Kert Viele: And I think, you know,
that's really what's going on in the

academic literature right now in terms
of trying to figure out the best ways

to make that assessment, minimize
the risks, amplify the benefits.

Um, we're talking now more about
covariate matching between studies.

I.

So not just looking at all the
historical data, but which patients

best match our current enrollment.

Um, none of that of course is going
to be perfect, but I think it, again,

it's the question of do we generally
add information or are we taking a risk

and we should be looking at a society
If we do this, these kind of trials

over and over and over again, do we
get a better set of drugs on the shelf?

Scott Berry: Yeah, and,
and part of this is it.

Part of it is statisticians are
really smart and they can figure

out how something can go wrong.

That this, the, the, the historical
data's not the same as the new

controls we've got, uh, beha, we've
got problems with the analysis there.

We've got potential type
one error inflation.

We've got potential bias in the
estimate of the treatment effect.

And statisticians can figure this out
and say that, you know, this is bad.

But if we have to do experiments
where we have to enroll one-to-one,

that experiment is bigger.

It's more patients.

It's more patients on control.

We take less shots on goal.

We learn slower.

It's, it's the whole industry is slowed.

If every time we have a question.

It has to be only in that experiment
as opposed to using what we know

scientifically to reinforce that making
a more efficient trial we can, we can

look at more treatments, we can treat
fewer patients, we can do this cheaper.

So there is a huge societal, I'll call
it societal, a whole medical field issue.

If every time we have a question.

The only thing we can do is
look at the new experiment.

It seems like there's a huge waste.

Kert Viele: Well, we
basically, it's like data.

We design an experiment,
data comes into existence.

There is this bright shining
moment of about five minutes

where we can do something with it.

And then we're meant to put it aside, at
least with regard to future experiments.

Obviously people are gonna reference
it in deciding treatments and so

forth in the medium, but it, it's
interesting, it worries me as a

statistician, I feel deep-seated
guilt actually putting data aside.

Scott Berry: Yeah.

Yeah.

Uh, and, and, and again, it's hard, uh,
but there's, there's huge ramifications

to not doing it, uh, in that ways.

And I think there are ways
of doing it very well.

You can be prospective
in your new experiment.

The analysis plan is completely
written down, written down.

It's explicit.

When you present the results
you, you show the results.

Just in the experiment, you
show what happens using the

modeling of external data.

I think we can do this really well.

I think, by the way, this
is changing a little bit.

We're seeing this even a somewhat similar
issue where you run basket trials.

There, there's a lot of reticence
to a, you know, we have four

kinds of patients and we run an
experiment of control to treatment.

Of using the data in one of the
subsets of patients to help estimate

another subset of patients that
gets people really bothered, and

that's been done more and more.

I think it's a very similar type of
issue and it's sort of striking and

we're in trouble if we don't do that.

But it has similar
statistical ramifications.

It.

Kert Viele: Well, it also, it, it
has tremendous implications towards

design because if you can't borrow,
suddenly sponsors have an incredible

incentive to claim that their
patient population is homogeneous

so that they can pull everything
together and maximize their power.

Scott Berry: Yeah.

Kert Viele: you say, you know, as
soon as you say there's two different

groups of people, you have to run
twice as many patients, no one's gonna

wanna explore those kind of questions.

Scott Berry: yeah, so it's the same
bizarre thing that happens, that we run

trials that are single arm trials where
we use a historical control a hundred

percent, or we don't use it at all.

And you're describing exactly
what happens in trials.

We run a trial and you do a common
estimate across the entire population,

a single estimate, which is this,
which is full pooling, it's exact same

mathematically of pooling populations
altogether to get a single answer.

Or we run separate trials and we call 'em
population A and population B, but we only

estimate those individually in A and B.

Now we want to do something that's a
middling thing between that, and by the

way, we actually get to observe in the
experiment similarity of effect that we

shrink estimates statistically, uh, that
isn't full pooling and it isn't separate.

And that people struggle with that.

It's either you have to do all
pooling or completely separate

populations, and it's just, it,
it, it strikes me as just bizarre.

Kert Viele: So let me ask you about
one other thing, which I think is a

real oddball in this conversation.

Um, there is a situation where
we routinely combine data across

different studies, and it's often
viewed as the pinnacle of evidence

for clinical trials, which is a

Scott Berry: I, uh, I have no idea.

I have no, oh, meta-analysis.

Okay.

Yep.

Kert Viele: where do you think
meta-analysis fits in, in terms

of the, the assumptions and
how it fits into this debate?

Scott Berry: Yeah, I, I, I think
meta-analyses are incredibly valuable.

Um, the modeling makes a ton of
sense and incredibly important.

Um, I, I think people overweight single
trials and the, the totality of evidence.

Now, not every meta-analysis is every

meta-analysis.

Uh, there's, and, and, and I
think people, I sort of think

meta-analyses are bad because there
are some that are clearly biased.

There's publication bias, there's,
there's availability of data bias.

Um, uh, so I.

Not every meta-analysis is bringing
quality data that's free of these

issues, but there are many circumstances
where we have data on essentially every

patient that's been experimented on.

And combining this together
I think is hugely valuable.

I By, by the way, I think the
idea that we run two separate

independent phase three trials and
we analyze them separately is stupid.

I, I, I mean, it's stupid that
we would analyze them completely

separately, and we need both to be 0.05.

Combining them together into
a single inference is better

in every way.

Kert Viele: is two is two 0.49s

as good as a 0.051

and a 0.001

on the P-values.

Clearly you'd want the latter.

Scott Berry: Yeah.

Yeah.

And, and, and so I think of a
meta-analysis the same way I, I,

I would do a meta-analysis of my
two phase three trials for what,

what does it tell me to the answer?

So it, you know, those techniques,
they tend to be Bayesian.

But, uh, largely the availability of data.

And there are some where we know there's
missing data, we know there's issues

with it, and we would not take those at.

Uh, you know, at, at a high
scientific degree to that.

Um, and that's where the
science parts comes into it.

And by the way, these are abused where
people present data, it's bias data.

They're ignoring huge amounts
and they call it a meta-analysis.

And we think that's bad science.

So we've got to, at some
level, be able to judge.

Good science, quality data,
meta-analysis, which I think are more

valuable than individual trials there.

And then there's bad science put
together that we, we, we recognize

we wouldn't use that information.

Kert Viele: We would certainly say
that on any kind of borrowing, bringing

together data that doesn't belong is bad.

You can borrow in a way that
does generate horrible biases and

will generate bad conclusions.

Uh, but we also are talking about
what can be done well with experience.

Um, we're running outta time.

Why don't you close us out?

Scott Berry: Uh, so, uh, I, I think
this by the way, is the future.

I think as we learn more and
more about diseases, we get

more and more subsets of them.

Um, we are getting more and more quality
data available from other clinical trials.

The sharing of data from clinical
trials, the sharing of other

types of resources, we're gonna
have explosion with medical data.

The idea that we don't use that
in our new experiments, if we

don't use that in valuable ways,
we're running worse experiments.

So a wonderful topic, and I
think it's certainly the future.

Uh, and so here we are in the interim.

So, uh, till next time, appreciate it.

Thank you,

Kurt.

View episode details


Subscribe

Listen to In the Interim... using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music
← Previous · All Episodes