← Previous · All Episodes · Next →
Religion, Politics, and Ordinal Outcomes Episode 5

Religion, Politics, and Ordinal Outcomes

· 30:18

|

Judith: Welcome to Berry's In the
Interim podcast, where we explore the

cutting edge of innovative clinical
trial design for the pharmaceutical and

medical industries, and so much more.

Let's dive in.

Scott Berry: Well, welcome everybody.

Welcome to In the Interim, uh, the
podcast of, uh, Barry Consultants and

we, we look at, uh, as much as we can
into the science, clinical trial science.

We look at clinical trial results.

We talk about statistics, uh,
related to medical decision

making, related to clinical trial
results, clinical trial design.

That's our focus.

Uh, and so today I'm going to talk
about something that turns out

to be incredibly controversial.

So the name of this podcast is Politics,
Religion, and Ordinal Outcomes.

Funny name, of course.

My, my reference of this is that you, you
should not at the dinner table, you should

not talk about politics and religion.

Turns out you probably shouldn't
talk about ordinal outcomes as well.

Everybody has different reactions to them.

Everybody analyzes them differently.

Uh, a passionate feeling about them.

It's kind of interesting because
it's, it's ingrained in the

history of clinical trials.

I'm going to essentially describe
that almost every endpoint is

ordinal, so you can't escape this.

So what are the controversies?

How do they show up?

What do I think about ordinal endpoints?

You may not think that this
is a controversial topic.

It turns out to be really controversial.

So, ordinal endpoints have been around
since the start of clinical trials.

Uh, I, I, I'm not a, a historian, uh,
but my, my scouring of, of the, the,

of Google and the internet that really
the first clinical trial was, uh, 1747.

And it's attributed to James Lind.

He was a Scottish naval surgeon.

He was very, very interested,
of course, in 1747 in Scurvy.

The Scurvy.

Uh, and he ran what is attributed
as the first clinical trial.

Now, it's not the first randomized
clinical trial, and I couldn't

actually figure out exactly how he
assigned patients, and that's the,

it's an interesting part of the trial.

So, uh, if you want to read
more about this, by the way,

it's really wonderful reading.

You'll find yourself in a black
hole reading about, uh, James Lind,

reading about Scurvy, and reading
about this particular clinical trial.

, uh, you can go to james lind library.org,

uh, and read all about this.

Uh, so in 1753, he
published a treaties on.

of the scurvy.

So I'm, I'm, I'm reading parts of this.

I found it absolutely fascinating.

And of course, in 1747, he
writes that, uh, armies have been

supposed to have lost more of their
men by sickness than by sword.

And so he makes reference to, uh, that
more of the, the English army died from

scurvy than the French and Spanish armies.

Uh, themselves, uh, killed.

And so, huge problem of the time.

Uh, and he was interested in whether
this calamity could be prevented and the

danger of this destructive evil obviated.

There were, interestingly at the time,
there were tons of, of, of theories

as to What, what could prevent scurvy,
what it was, what was causing it.

We have a pretty good
idea at this time in it.

Uh, but he was very interested in this.

On May 20th, 1747, he conducted
the first clinical trial.

He took, as he describes it,
12 patients in the scurvy, on

board the Salisbury at sea.

Their cases were as
similar as could have them.

The, they all had putrid gums, the spots,
and lassitude, I don't know what that

means, with weakness of their knees.

So he of course recognized that we
want these patients to look the same.

He had strange inclusion, exclusion
criteria, obviously not written down,

but he did everything he did, he could
to get 12 patients that looked the same.

He gave them, he describes a common
diet, he describes what it is, a water

gruel sweetened by sugar in the morning,
fresh mutton broth for dinner, and of

course he refers to supper as different
than dinner, barley with raisins,

rice and currants, and with wine.

Uh, so he, he, he, he.

did really smart things for the first
clinical trial, trying to get patients

to look the same, tried to treat them
the same in things except for the

interventions he was interested in.

So he had 12 patients and he
had six different treatments.

He gave two of them court of cider.

a day.

He gave two of them 25 guts of
elixir vitriol three times a day.

He gave two of them spoonfuls
of vinegar three times a day.

Two of them, two, and he describes
it interestingly, two of the worst

patients, their tendons in the ham,
rigid seawater, a half a pint a day.

He gave two of them, two
oranges and one lemon per day.

And two of them, bigness of
nutmeg, three times a day.

This was a surgeon recommended therapy.

Again, lots of people were describing what
they would do for the treatment of scurvy.

Uh, by the way, that, that, I don't
know what that is, but apparently it had

garlic, mustard seed, radish, balsam of
Peru, uh, gummer, and cream of tartar.

Uh, and the interesting thing
in all of this is he does not

describe how they were assigned.

They were randomized, he assigned
them, uh, an interesting part of a

clinical trial that we now think a
really important part of the trial.

Uh, and that's really left out
of the description, which is a

fascinating aspect of this 1747 trial.

I'm more, much more interested in today's
discussion of ordinal endpoints, of how

he classified the outcomes of this trial.

So he writes, Uh, and I'm quoting from
his writing, the consequence was the

results, the most sudden and visible good
effects were perceived from the use of

the oranges and lemons one patient taken
being at the end of six days fit for duty.

The spots were not indeed quite
off his body, nor his gum sound.

The other was the best recovered
of any of his condition.

And being deemed well was appointed
nurse of the rest of the sick.

So, I don't think this happens
in current clinical trials.

that A patient did so well that
they become a caregiver of the other

patients in the clinical trial.

Um, uh, in that scenario.

But the description of this is the
two best patients out of the 12 were

those that were on a single treatment.

The oranges and lemon.

In this scenario.

So his outcome was an ordinal outcome.

Actually resembles what we would
today, we might call a door patient.

That are almost a win ratio type thing.

That comparing patients to patients.

These patients did the best.

There's no quantitative aspect that
this patient was five times better than

that patient or double that patient.

There wasn't a mean time to
recovery and all of this.

It was an ordinal outcome.

By the way, if you calculate a p
value for the probability that lemons

and oranges would have the two best
patients, it's one out of 66 or 015.

You, of course, would say, well, wait
a minute, there's multiplicities here.

There were six different treatments.

What's the probability that any one
of them would have had the two best?

Well, that's one out of eleven,
uh, in that, which is 091.

So, adjusting for
multiplicities, you get 091.

I'm more interested in the, the endpoint.

Now, interestingly, in part of this,
and this may resonate with people,

that, This didn't change anything.

We now know this was
actually the right treatment.

that it's about vitamin C and lemons and
oranges would have prevented the scurvy

and would have had a massive impact.

Uh, it took 47 years beyond this
before this became recognized

as a way to treat patients.

Interestingly, the thought was that
you give dehydrated, uh, fruit because

it was hard to keep fresh fruit on,
on board, uh, ships and all that.

And it turns out that loses much of the
vitamin C content and it's not clear

that would have had any actually benefit.

So, CMC issues and PKPD issues could
have really helped James Lind in 1747.

So, that was the first clinical trial, and
again, you should go read all about this.

It's absolutely fascinating.

Um, the first randomized human
clinical trial is attributed

to Austin Bradford Hill.

And he the, he published a paper in
1948 of the first randomized clinical

trial, which was ex experimenting
on streptomycin for the treatment

of pulmonary tur tuberculosis.

And this was published in the.

British Medical Journal in 1948.

This trial enrolled 107 patients and the
difference between James Lynn's, James

Lynn's trial and this one was that the
patients were randomized to the treatment.

And they were randomized to
the control group or they were

randomized to streptomycin in it.

Uh, the publication of this
is also very interesting.

There's a traditional table one.

Maybe he invented the table
one and the table two.

Uh, but table one give it, gave
demographics of the trial and

table two gave the results.

So what was the end point of this first
randomized clinical trial of humans?

There were randomized, uh, uh, uh, people
probably understand there were randomized.

uh, agricultural experiments,
but there were not randomized

human experiments before this.

Um, the primary endpoint of this was
a more, what you more think of now as

a more traditional ordinal endpoint.

It was a six level ordinal endpoint.

The top level, but was
considerable improvement.

The next was moderate or slight
improvement, then no material change.

Moderate or slight deterioration,
considerable deterioration, and death.

Six, six outcomes.

By the way, they aren't given numbers.

They aren't given, you could consider
giving them letters, A, B, C, D, E, F.

F is the worst, A is the best.

So all this is, is, is, it's
ordinal in that one is better than

the other one, but no numeric, uh,
number is given as how much better.

We think of these as ordinal
outcomes, and it's probably what

you know as ordinal outcomes.

So controversially, how
did he analyze this?

We would fight today, I mean fight
in a good way, that we would fight

about how to analyze this endpoint.

What do you do?

He analyzed it actually in multiple ways.

He, um, he talked about, uh, in
this outcome, the first thing he

did was he talked about deaths and
deaths were four, I'm sorry, 7.

3 percent in streptomycin and
it was 27 percent in control.

And he talked about this being
less than one in a hundred

chance that this would happen.

He then said the best.

If you split it in any particular way,
the best split for streptomycin for

this was considerable improvement,
which was 51 percent to 8%.

One in a million, he talks
about the likelihood that

that would happen by chance.

So he's giving p values,
he's analyzing it.

We would recognize, ooh, the
multiplicity of looking at the

best place and what does that mean?

And that gets everybody in a bob.

Um, now this was an easy analysis where
any way you analyze this endpoint,

it looked really, really good.

Uh, and so there's not going
to be a whole lot of struggle.

Um, uh, on this end point in there,
and I went back and took the data and I

fit a proportional odds model to this,
and you get a proportional odds of 5.

43 to the good of streptomycin.

Uh, the lower bound of the 95
percent confidence interval is 2.

64.

This is a massive effect.

P value of that is 6.

3 times 10 to the minus 6.

Awesome data.

Uh, makes it really easy.

Everybody can, you know, look at
this and say, this is clinically

really, really important.

Um, in it.

That was, um, that was some 75 years ago.

We still fight about how to analyze
orbital endpoints, um, within this.

Um, okay.

So we're still fighting about this.

I'm going to focus to talk more about
where are we today, what are we doing in

Ordinal Endpoints, what do I think about
Ordinal Endpoints, uh, from this, as I,

I'm going to pick a particular example.

Very similar to, uh, uh, Austin Bradford
Hills, Bradford Hills, uh, endpoint,

uh, of, A six level ordinal endpoint.

We're going to talk about
the modified Rankin score.

The modified Rankin score is
a seven level ordinal outcome.

It's used on the neurological
status of patients.

It's quite common in
a number of scenarios.

It's quite common in stroke trials,
where the endpoint is your modified

Rankin status at 90 days, 180 days.

different points.

Um, the endpoint itself is seven levels.

They do give numeric values to these.

The numeric values that now we'll
talk about, are they relevant or not?

Are they just labels?

And they're labels.

You could have labeled them,
uh, A, B, C, D, E, F, G.

But I'll give the labels that are
there, you may be familiar with them.

Zero is no symptoms.

Perfect neurological stat.

One is no significant disability.

You're able to carry out usual activities.

Uh, zero is better than one.

We can argue about how much better.

We're going to get to that.

Two is slight disability.

You can look after your own
affairs, but unable to carry

out all previous activities.

So you have some disability.

Uh, one is better than two.

Zero is better than one.

One is better than two.

Then three is moderate disability.

You need help.

You require help.

Four, moderate to severe disability,
unable to attend to your own

bodily needs without assistance.

Again, a higher level of support
needed to carry out daily activities.

Five is severe disability, bedridden
and requiring constant nursing care.

It's commonly given the
label of vegetative state.

Um, uh, clearly a, a worse state than
four, worse than zero through three.

And then six is dead.

Uh, no neurological status.

So, these are the seven outcomes.

It's a very, very common
outcome in clinical trials.

The question is, how do
we analyze this thing?

How do we analyze those seven
outcomes in a clinical trial?

There's huge disagreement that
goes on about, uh, analyzing this.

There's disagreement about the
medical interpretation of these.

There's disagreement about statistically
how to analyze them, uh, the relative

meaning of the values of them.

Uh, and different analyses are used.

The, the, the first way to potentially
analyze these, and I shudder to say

it, but is to dichotomize these.

Very common in clinical trials, the
modified Rankin is dichotomized.

There's even a name to
the dichotomization.

Uh, and it's disabled or not.

So 0 through 2.

is considered a responder, a good outcome.

3 through 6 is considered a bad outcome.

And the clinical trial analyzes
how many of those are 0 to 2 and

how many of those are 3 to 6.

That's a way to numerically
analyze these outcomes.

Now I'm going to walk through this a
little bit and I want to start by saying

this is mathematically equivalent.

to giving a weight of 0, 1, and 2, a value
of 1, giving 3 through 6 a value of 0.

I could say that's my utility,
and analyze that way, and I get

mathematically the exact same answer.

Okay, so.

It is a weighting by doing that
dichotomous and you're saying having

a zero is equivalent to having a two.

Having a three is equivalent to dead.

Absolutely, that is the assumption
that's being made when you

analyze the trial that way.

I shudder at that.

I shudder at that clinically.

I shudder at it statistically
for a couple reasons.

The power of a trial that lumps
everybody together, 0-2 3-6 is lower

than a different way to analyze it.

I think it's clinically
harder to interpret.

I don't think any person has a utility
function that is equal to that.

But it's very commonly done.

Okay, another way to analyze
this endpoint, you, you, you can

probably understand what I think
about, uh, zero to two or any

dichotomization of that endpoint.

What about a proportional odds model?

So a proportional odds model is a
statistical assumption that the effect

of moving patients to a higher level.

is, has a statistical, uh,
odds of that happening.

We assume it's the same of moving
people from 6 to better than 6,

from 5 and 6 to better than 5
and 6, from 6 to better than 6.

The odds of that is
constant across the scale.

There's a strong, I
don't want to say strong.

I'm sure I shouldn't say that word.

There's a statistical
assumption that comes with that.

That, that odds is the
same across the scale.

Now that gets statisticians all ruffled.

It's almost like when you go
to graduate school, you have to

question that assumption in any
model, or you're not worth your

statistical salt, uh, uh, within this.

It is a, it is an assumption.

Now, and it can be violated.

The interesting thing about it
is that, that way to analyze

that ordinal endpoint imposes a
utility on each of those outcomes.

Remember, dichotomizing
them imposes a utility.

And you can mathematically write it down.

A proportional odds model imposes
a utility on it and you can

mathematically write it down.

What is the utility?

The interesting thing is the
utility of that is based on

the prevalence of the outcomes.

If you have a lot of 0s and 1s and 10
percent deaths but 90 percent 0s and

1s, your utility you're imposing is
highly weighting 0 to 1 differences.

Because that's where your prevalence is.

Death, and above that, is down weighted
because they aren't very prevalent.

You can mathematically write it down.

I could give you the utility functions
as you're imposing that upon your input.

People don't think about it that way.

It's a little bit uncomfortable.

If, alternatively, you have a lot of 4's,
5's, and 6's and very few 0's and 1's,

but you have some, you're down weighting
those because they're not prevalent.

You are imposing a mathematical weight
of those outcomes by analyzing that way.

You can't get out of that.

Okay.

Uh, I'm not saying that's bad.

I've used it.

By the way, I've used both
of those in clinical trials.

Um, I struggle with the
dichotomization, but I've done it.

Um, I've done proportional
odds model, and I don't I don't

necessarily shudder at them.

Um, but I, I understand I'm imposing
this strange sort of weight on them.

Okay.

There, there are other ways
to analyze these endpoints.

We could do a Wilcoxon test.

We could do a Cox model on them.

Um, first of all, I want to point out
that there are tons of ordinal outcomes.

This is the one a lot of
people are familiar with.

Um, mortality is a ordinal outcome.

Yes, no, mortality.

It's ordinal.

Now, that's a pretty
simple dichotomous one.

It's binary.

Time to death, overall survival
in a cancer clinical trial, or any

clinical trial, is an ordinal outcome.

A year is better than six months,
but you don't say how much.

You just say it's better when
you run a Cox proportional

hazards model or a log rank test.

You're analyzing an ordinal outcome
On that when you're analyzing infarct

volume or tumor size That's we sometimes
people say that's quantitative.

Well, no, it's ordinal

You're imposing upon it the
difference between them of one

unit being a numerical difference.

You're imposing upon a quantitative weight
of that ordinal scale, but it's ordinal.

Almost everything is ordinal.

The best corrected visual acuity where
you're looking at a chart and saying how

many letters can I see, it's ordinal.

We calculate a mean of them.

We make the assumption that one letter
is better, no matter where you are in the

scale, that it has a constant difference.

We're imposing a
quantitative weight on them.

People are comfortable with that.

When I impose a numerical value
to the, you, to the, uh, modified

Rankine, people jump up in arms.

People.

Scientists regulators.

So there have been multiple, there have
been several studies that look at patient

preference for the modified Rankin,
that look at, um, the economic value of

them and they're actually very similar.

We have taken them and said, okay, we're
not going to do a proportional odds model.

We're not going to do dichotomous.

Um, nobody has that weight.

We're going to use this and we're
going to call it a utility function.

By that weight, by the way, the weight
that is used on that is a full weight of

1 if you're a 0, 91 if you're a 1, 76 0.

65, 0.

33, and then 0 for both 5 and 6.

That those are equally bad.

That's a utility function.

You can analyze it.

Very simple statistical models.

No assumption of proportional odds.

And we can analyze it that way.

Uh, now, people get frustrated by that.

Lots of people in terms of
regulators, what does this mean,

and the first criticism is not
everybody has that utility weight.

You're using two different studies,
but how do we know the people that

you're analyzing that have that weight?

The problem is any way you analyze
this endpoint has that same issue.

If you dichotomize them, you're
imposing that everybody in your

study has that weight of 111000.

My claim is nobody has that weight.

But yet, it's very commonly done.

But nobody's frustrated by that.

Bothers me because I don't
think anybody has that weight.

P, studies, phase 3 clinical trials
at FDA use proportional odds model.

There's a comfort in that.

You got to check assumptions, you
got to look at proportional odds.

The usual thing we do for
Cox models and all that.

You're imposing a weight.

How do we know the people in
your study have that weight?

My point is it's a false criticism
because you cannot escape it.

You have to weight the endpoints
any way you analyze them.

All of these endpoints we impose a weight
on them when we go into a clinical trial.

But when we specify it up front and
try to justify that this is a good

weighting system, everybody throws
arrows at it and, oh, that's bad.

But dichotomous and proportional
odds, that's a good way to do it.

Alright, so what do I think of this?

I think it's the really hard
thing in a clinical trial.

It's the way as we move forward
we're going to get more and more fine

grained understanding of endpoints.

Our endpoints are going to be
more strength, more valuable.

We have to do this.

It's hard.

Science is hard.

Medicine is hard.

Hiding behind ad hoc ways to do this,
I think, just leads us to bad places.

It disconnects the statistics
from the clinical outcomes, from

the doctors, from the clinicians.

This is where we need to go.

We need baby steps.

We need to work there.

We need to show it's valuable, that it's
good for regulators, that it's good for

patients, it's good for statisticians.

But it's really the only way to do this.

So I think as our endpoints become more.

More, um, uh, less signal, uh, more
signal, less noise, wearables, uh, daily

values, our iPhone, our iWatches, our,
our things, calculating things for us.

These are all ordinal outcomes.

That this is the only way forward.

We need to go there.

Let's do baby steps.

Let's analyze the modified Rankine in
really, really better ways, smarter

ways, explicit ways, uh, in this.

And please, please, let's not just
dichotomize really nice endpoints.

Um, uh, again, I don't think
anybody has that weight system.

Alright, well I hope, I hope I
didn't defend you like I would if we

talked about religion or politics.

It's a controversial thing, but
I think it's a really important

thing in clinical trials.

I hope you're, you, you,
you enjoyed the discussion.

And until the next time,
we are in the interim.

Yeah.

View episode details


Subscribe

Listen to In the Interim... using one of many popular podcasting apps or directories.

Apple Podcasts Spotify Overcast Pocket Casts Amazon Music
← Previous · All Episodes · Next →