Applying data science principles to FDA’s PreCert Program: Some scary stuff

North America

Applying data science principles to FDA’s PreCert Program: Some scary stuff

Epstein Becker & Green's Bradley Merrill Thompson explains why the FDA PreCert Program's built-in subjectivity and bias toward established entities should worry the healthcare and medical technology industry.

By Bradley Merrill Thompson

November 01, 2019

02:08 pm

About the Author: Bradley Merrill Thompson is a member of the firm at Epstein Becker & Green, P.C. There, he counsels medical device, drug, and combination product companies on a wide range of FDA regulatory, reimbursement and clinical trial issues. The opinions in this piece are Thompson's and don't necessarily reflect the opinions of MobiHealthNews or HIMSS.

Let’s say I could show you, through reliable data that is statistically significant, that children from wealthy families on average make much better employees in very meaningful ways. Do you think companies ought to adopt policies — perhaps embedded in algorithms — that favor hiring from wealthy families? That sounds pretty un-American.

So why is everyone so excited about creating an easier pathway through FDA for projects that come from certain incumbent companies that have a pedigree demonstrating that they have been more successful in the past? In a nutshell, that’s FDA’s PreCert Program presently under development.

Throughout FDA’s recent work in digital health and beyond, as recently explained by Dr. Amy Abernathy, the principal deputy commissioner and acting CIO of the FDA, the agency has a rather broad plan to advance and indeed lead the use of data science in healthcare through these regulatory reforms. Last summer, FDA released a midyear report describing the agency’s progress in developing the PreCert Program for regulating “software as a medical device” (SaMD). Presumably toward the end of this year, FDA will release a final assessment of its pilot work for the year, and its proposals for the future. Based on Abernathy’s comments, FDA has very ambitious plans for the PreCert Program.

But not everyone is enamored with what FDA is doing. On October 30, 2019, three US senators including presidential candidate Elizabeth Warren wrote FDA a follow-up letter (to their 2018 letter on the same topic) expressing concern over how: effectively the PreCert Program would protect patients, whether FDA had legal authority for the program, and how exactly FDA planned to make use of real-world evidence collected through the program.

So this seems like an appropriate time to discuss the criteria that stakeholders should use to evaluate the desirability of the PreCert Program. In this article, from that same data science perspective that FDA champions, I will review some of the key criteria that stakeholders should think about when they see the next iteration of the planned program.

Summary of FDA’s PreCert Program

The agency has been working to develop and then pilot the FDA Precertification Program for software since 2017. FDA’s most recent thinking on the program is reflected in a January 2019, 58-page “working model.” But it is still far from done. FDA has been spending 2019 testing some of the elements of the program on a few volunteer companies to see if the program features are administratively feasible. Presumably after this year, when FDA is done testing it, the agency will announce its future plans for the program.

At its heart, the program is about shifting regulatory focus from the software product to the company that develops the software product. Presently, typically a software developer that wants to get marketing approval from FDA would do so through two steps:

1. Voluntarily, under the so-called Q-Submission Program, engaging with FDA in a presubmission meeting to get feedback on what kinds of evidence the agency wants to see in a premarket submission; and

2. Submitting that submission.

Instead of those two steps, FDA would move to a four step process that begins with an initial focus on the company. Those four steps include:

1. An “excellence appraisal” to determine whether the company has a strong enough culture of quality to be admitted to the program.

2. A “review pathway determination” to decide which regulatory process and what kinds of evidence might be needed for a particular software submission. FDA proposes that this meeting be legally voluntary, but is not clear how the program would work without such a meeting.

3. The actual premarket review of the submission.

4. Ongoing collection and reporting to FDA of postmarket performance.

So basically steps one and four are added to the existing process, but the expectation is that the actual review — new step three — would be faster than the old step two because of the prior excellence appraisal and review pathway determination.

The Good in the Precert Program

FDA is thinking — in very creative ways — about how the agency will handle the significant growth in the development of SaMD type products in the future. Creativity is often difficult for a government agency simply because of law and politics, so for FDA to have the courage and vision to proactively pursue novel approaches is frankly wonderful.

Further, I love the idea of using data science, including algorithms, wherever they can be useful. In fact, this fall I enrolled as a graduate student to work on a Masters of Applied Data Science at the University of Michigan (at age 58, I may well be one of the oldest graduate students there.) There is no question that using sophisticated data collection and analysis techniques on FDA compliance issues can be useful.

So What’s the Worry?

While there is much to be praised in FDA’s work on the PreCert Program, there are fundamental issues that stakeholders must assess when deciding whether to support the ultimate program. The biggest issue is how FDA will make its decisions.

The subjectivity of the decisions to be made

In the PreCert Program, FDA will make two new decisions not previously made. The first is whether to allow a company to participate in the program (through the excellence appraisal I describe as step one above), and the second is an ongoing decision to allow products marketed through the program to remain on the market on the basis of real-world data collected postmarket (in what I describe as step four above).

1. The excellence appraisal

FDA makes lots of decisions regarding companies that want to market a medical device, and frankly every decision has an element of subjectivity. For a device to be marketed initially by a company, presently in many cases it has to be deemed by FDA either for class II products to be “substantially equivalent”, or for class III products to be “safe and effective.” While there is subjectivity in both of those decisions, the decisions are also primarily data-driven assessments where the standards are either functional equivalence to existing, similar products on the market in intended use, design and performance; or weighing demonstrated benefit against demonstrated risk of harm. Reasonable minds do differ, but at least there is some objective basis for making the decision.

Now imagine that the decision-making issue is not whether a product is safe and effective, but whether a company has a “Culture of Quality and Organizational Excellence” (CQOE). If you think “safe and effective” is a tad squishy, you ain’t seen nothing yet. The fundamental characteristic being evaluated here is a company’s “culture.” That should be a huge red flag regarding the degree of subjectivity that inherently must exist in this process.

Under the skeletal outline of the PreCert Program, in its excellence appraisal FDA says it’s going to decide whether a particular company has a CQOE through the data-driven assessment of five principles, including, for example, whether the company has a “proactive culture.” To do that, while the agency says it will be flexible and not apply one approach to all companies, generally speaking the agency will look at almost 150 categories of data. Here, for example, are the questions through which FDA plans to collect data related to a proactive culture (click to enhance):

As FDA suggested in October 2017, the data would be collected through a variety of mechanisms including surveys of employees. Notice that the questions are not simple yes or no questions, or even multiple choice. Indeed, many of the questions begin with the word “how.” So we are talking about unstructured data. And the information collected on these 150 topics will be the basis for FDA’s decision.

I’m not debating whether culture is in fact important. It is. What I’m raising is concern over just how objectively — and for that matter capably — FDA can assess a company’s culture. Assessing culture is not a scientific endeavor for the scientists, engineers and clinicians at FDA. Assessing culture is in the domain of business people and business schools, and culture is amorphous, defying any exact quantitative assessment.

Further, FDA will be drowning in data: to be precise perhaps over 150 different data streams. How a human or a formula distills all that data into a central binary decision — yes CQOE or no CQOE — by grading and then weighting each of the 150 factors will be fraught with subjective assessments.

2. The decision to permit a company in the program to continue to sell a product

Once a company is in the PreCert Program, then the degree of oversight by FDA really ramps up. FDA wants to become essentially partners with the company, sharing in all of the data regarding the product performance to continuously evaluate whether the product ought to remain on the market. And, FDA suggests that much of that data should be public so that other stakeholders can do likewise.

Now at least this issue — whether medical devices are performing in a safe and effective way — is a bit more objective than assessing a company’s culture. In some ways it is more akin to the decisions FDA currently makes to allow products onto the market. But it will be done under a very different set of data than the current premarket clinical trials and bench tests.

FDA is not so far proposing that companies engage in postmarket clinical trials. Rather the agency is suggesting that the companies engage in the continuous collection of “real-world performance analytics.” (I think FDA actually made up that new term to avoid some of the statutory baggage that comes with real-world data and real-world evidence.) Unless FDA has something very expensive in mind in terms of how the data and analytics are collected and performed, these data will be much dirtier than premarket clinical trials. I’m using the word “dirtier” in its technical sense. Real-world evidence does have great value, and its role should be increased in regulatory oversight, but only with the real-world understanding that such data will include a whole lot more static than current premarket data. So the data will always require more judgment to discern signal from noise.

The process for making the decision

When talking about a new governmental program, it is typically helpful to talk separately about the criteria used for the decision and the process for making the decision. So having talked about the criteria, now I’d like to turn my attention to the process.

1. The difference between a government regulatory decision and a business decision

I’m actually not paranoid. I don’t distrust everything the government does. I am not from Montana. (Just kidding: I love Montanans.)

But I am paid to worry about situations where the government gets too much power. Because power can be abused, even by the well-meaning.

Giving a federal agency the ability to make important decisions based on substantially subjective criteria is simply a recipe for disaster. Subjectivity breeds inconsistency and unfairness.

It breeds inconsistency, in part, because a federal agency is comprised of lots of different people, and different people will make subjective decisions in different ways. If those decisions are made with regard to different citizens, those citizens will be treated in substantially different ways in cases where they should be treated the same.

Subjectivity also breeds unfairness because it allows the decision-maker to consider factors that should not be considered. All of a sudden the federal official’s general feelings or views of a particular person or company can start to influence the decisions they make.

The bottom line is, we are supposed to be governed by the rule of law, not the rule of people. Subjectivity replaces law with people. And for a criminal statute like the Federal Food, Drug and Cosmetic Act, that’s unacceptable.

2. The process of making a data-driven decision by humans and software

It seems almost certain that some of the government’s decision-making will need to be automated. Let’s assume for the minute that it is simply not in the cards from a congressional appropriations standpoint for FDA to get a huge influx of additional people. It is humanly impossible for FDA to review all this data manually for all initial excellence appraisals and then review the constant stream of data thereafter for marketed products in the program. It would be like Fitbit trying to hire people to manually read all of the Fitbit data that are sent to the cloud. FDA will have to automate the review, using an algorithm to combine 150 data streams and produce a recommendation.

At least so far our government does not convict people of a crime based on the recommendation of an algorithm. It is one thing to use data science and machine learning techniques to inform corporate decision-making, but it’s an entirely different proposition to use such techniques as a part of our justice system to make governmental adjudications of individual rights. The level of confidence in the algorithm would have to be extraordinarily high, and frankly a mechanized judicial decision would raise all sorts of questions about fundamental fairness, due process and the right to a jury of one’s peers. (And yes, before the comments start to fly, I am aware that the US legal system is toying around with AI, for example, in recommending sentences for judges to consider imposing. But even that is far from accepted practice presently.)

If you are a data science professional, you know just how fickle algorithms can be. Sometimes they work great. Sometimes they spew out crazy stuff. On average they can be pretty good but not always great. Sometimes small changes in data produce large unanticipated changes in the outcome. As a student studying data science, sometimes I’m amazed at how fickle the auto-grader is. It’s bad enough when I get a grade I don’t like, but I can’t imagine using an auto-grader to make government regulatory decisions that have an enormous impact as outlined in the next section.

For the record, I’m a big fan of using data-driven metrics in corporate decision-making. A little over 10 years ago, I spent almost an entire year and hundreds of thousands of dollars (from my law firm) developing an algorithm for assessing a company’s FDA compliance. My idea was that corporate decision-makers would use the algorithm to assess the state of compliance at their company and make improvements as warranted. (It never saw the light of day because I changed law firms, but that’s another story.)

In a business setting, corporate officials have the ability to use the data as well as their judgment to make decisions. That’s as it should be. And if they make bad decisions, their boss can fire them. It’s that simple.

But government decision-making in the context of a criminal statute is not the same. Even if we had higher confidence in the quality of the decision-making that the software would recommend, we are simply not at the point either from a justice standpoint or an assurance of protecting the public health that we are ready to defer to an algorithm the important task of evaluating either whether a company should participate in the PreCert Program, or whether products should continue to be marketed once they are a part of the program.

In those cases the result of the algorithm is almost sure to be arbitrary. I don’t want to prejudge the science: I haven’t seen the exact data streams in final form nor have I seen the algorithm that FDA plans to use. But just think of the methodology FDA will have to use to validate the algorithm.

FDA is going to have to supply a ground truth. What will be that ground truth? What will be the standard for an organization that has CQOE? As I understand it from talking to people at FDA, that CQOE standard is not currently being met by the vast majority of medical device companies. It’s aspirational. Indeed FDA is pursuing PreCert in order to encourage companies to more aggressively improve their cultures of quality. So there is no existing database of companies that have a CQOE. So from a more technical standpoint, how exactly is FDA going to design or build or train this algorithm? Just how much agreement will there be regarding ground truth when there are presently no data on companies that have a CQOE because almost no company presently complies?

And once the algorithm is completed, just like search engine optimization, doesn’t that mean that companies will start to get smarter about how to ensure they get a good score through supplying the right data? And will that really mean substantively that they have the CQOE? Or will FDA need to hide the algorithm from public scrutiny much like the way Google constantly changes and hides its search criteria to avoid website developers gaming their algorithm?

The consequences of a bad decision

I keep hearing people at FDA justify pursuit of the PreCert Program by saying, it’s only voluntary. So no one should get uptight. We don’t need statutory authority beyond what we already have, they say.

I also hear comparisons between the PreCert Program and the TSA Precheck program at airports. The TSA Precheck program is frankly a favorite of many in industry who travel a lot. So the comparison draws a warm and fuzzy feeling. But that’s a lot of crap.

To make the comparison an apt one, we would have to change the facts regarding the TSA Precheck program radically. To make the two even remotely comparable, we would have to change the airlines’ business model such that traveling on an airplane is first-come, first-serve to the airplane. In other words, your ability to fly that day depends on how quickly you get to the airplane. Then we have to say that your ability to get through the TSA Precheck lines is not just five to 10 minutes different from the regular line, but may be hours or days. We then have to say that entrance into the TSA Precheck will be based in the aggregate on 150 different factors that will be subjectively assessed and combined. How happy do you think people will be who want to travel but who get denied for the TSA Precheck lines based on subjective criteria?

It’s competition that changes everything. It’s the fact that those who are first to the market will have a substantial advantage over those who arrive later. And so if FDA does what it says it wants to do, which is make a substantial difference in the amount of time it takes to get to market, those who FDA deems to have a CQOE and therefore entitled to use the PreCert line will have a substantial commercial advantage over those that don’t qualify. That will produce winners and losers, and consequently lead to tremendous economic and public health implications for stakeholders.

Bottom line: based on FDA’s current proposal, the Excellence Appraisal process and subsequent product lifecycle decisions will either be highly subjective human decisions, or highly arbitrary mechanical decisions, and in either case companies will not be happy. When FDA releases its final PreCert proposal, it’s going to need to explain how it can reliably and fairly make excellence appraisal decisions and continued marketing decisions without undue subjectivity or arbitrariness. Based on what I know so far, color me skeptical.

Additional data needed to review the PreCert Program final proposal

Fairness of the excellence appraisal is only one of the questions where we will need data to assess whether the final proposal is desirable or not. Here are a few other areas where we will need data to assess the program:

Does it ensure safety and effectiveness?

From a patient viewpoint, and this is really the viewpoint taken by the three senators, will the PreCert process produce the same level of assurance? I’m not sure how FDA plans to prove this one. It may be that they’re planning to use as a surrogate endpoint, i.e. whether the same administrative decision is reached in some retrospective or prospective comparison. But there certainly is no suggestion that the number of reviews they are doing will be statistically significant, or that they are taking other measures to assure the integrity of the data such as random sampling and blinding. It all seems very anecdotal and, in a statistical sense, biased. Or maybe FDA will simply argue rhetorically that they are ultimately going to look at all the same data that they would otherwise, but it will be packaged and timed differently. Then I’m not exactly sure what the benefit of the program is.

Is PreCert quicker than the status quo?

From a patient and industry standpoint, will the PreCert Program be quicker — start to finish — than the regulatory procedures it would be replacing? The answer is not at all intuitively obvious, and in fact the description FDA published in January would suggest that the new process could end up taking longer, certainly at first. Speed is a function of two things: the actual FDA review process and the time it takes to develop the data that FDA requires.

1. FDA seems to be taking many of the same data requirements that exist now, spreading them out into a three-part assessment process, and then adding in extra data that they want to collect that they didn’t previously collect. How is that faster?

2. But there’s also the broader implications for the software development process, and the amount of evidence that needs to be developed. We have no data on that, and I expect we won’t at the end of this year.

Will the PreCert Program be less burdensome?

Beyond time, burden means the amount of work required for completing the process which would include gathering the data and other information to be submitted, but also the other burdens of preparing for inspections as a part of the excellence appraisal and the postmarket data collection burdens.

Will the PreCert Program protect confidential commercial information?

Companies should not be satisfied with a general assertion that it will, without some specific definition of which information will be treated as confidential and which will not. That evaluation is very much a subjective one. And FDA is talking out of both sides of its mouth in an effort to sell this program. For example, in FDA’s midyear report discussing the Excellence Appraisals, FDA seems to be boasting that the Excellence Appraisals are both “confidential, and transparent,” making opposing claims one word apart, without any hint of irony. That is a very neat trick.

I will not be at all surprised if at the end of the year FDA declares that the PreCert Program is administratively feasible and thus a success. Frankly I’m already confident that FDA will find a way to make it administratively feasible. And that’s really mostly an issue for FDA.

What I do not know is how society in general — and all the other stakeholders — will get answers to the questions presented in this article through data so that we can decide whether the program is worth pursuing through the necessary legislative channels.

In conclusion

One of the fundamental tensions here is created from moving from a system where the specific technology is the focus of the review, to a system where it matters more who is pursuing the technology.

From a policy standpoint, I have a problem with that. My problem is that I spent the entire spring of this year traveling to eight major universities talking to grad students and recent alumni developing artificial intelligence based products for healthcare. I was part of a group that was trying to teach them the FDA requirements to encourage their innovation. An FDA regulatory system that requires a track record to get to market quickly will unavoidably disadvantage these entrepreneurs. And these entrepreneurs have been the lifeblood of medical innovation in the past.

Large, established companies will benefit from this PreCert Program because they are more likely to achieve precertification. But I hope that many of them will take the long view, and recognize that their own future depends on the future of a vibrant start up environment. And I fear that’s what’s at stake in the PreCert Program.

Tags: