Tax Notes Talk

How AI Bias Affects State Audit Selection

Tax Notes

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 29:55

Tax Notes reporters Paul Jones and Emily Hollingsworth discuss how bias in artificial intelligence can affect automated systems that select taxpayers for audits and what their investigation of California's and New York’s audit selection processes revealed.

For more, read Jones and Hollingsworth's investigation for free: "Unwatched: How State Audit Selection Systems Bypass Oversight."

**
Credits
Host: David D. Stewart
Executive Producers: Jeanne Rauch-Zender, Paige Jones
Producer: Jordan Parrish
Audio Editor: Laura Kondourajian

****
This episode is sponsored by Portugal Pathways. For more information, visit portugalpathways.io.

David D. Stewart: Welcome to the podcast. I'm David Stewart, editor in chief of Tax Notes Today International. This week: AI biases.

Artificial intelligence is becoming ubiquitous. States like California and New York are increasingly turning to AI to help with automated systems, like the ones for selecting tax returns for audit. But what happens when systems aren't properly monitored? What kind of risks are presented by this use of AI?

Tax Notes reporters Paul Jones and Emily Hollingsworth recently investigated the use of AI in California and New York's systems, and they join me now to talk more about what they discovered. Paul, Emily, welcome back to the podcast.

Paul Jones: Thanks, Dave. It's good to be here.

Emily Hollingsworth: Thanks, Dave. Glad to be here.

David D. Stewart: Let's start from where this reporting that you did began. What led you down the road of looking into AI and state systems?

Emily Hollingsworth: We became interested in investigating this after seeing how prevalent AI, and specifically generative AI, was becoming in so many industries. And our thought was tax agencies and departments of revenue must be thinking about how to best pilot or implement generative AI. And we learned about an initiative in California's Department of Tax and Fee Administration that would use generative AI to improve its call center division. But we also learned that there are older forms of AI like machine learning that have been used by tax departments for decades. And these are often called automated decision systems, and they're used directly for tax administration, and Paul can go into more detail about this.

Paul Jones: Yes. I was doing some initial reporting. I spoke with Ryan Minnick, the chief operating officer of the Federation of Tax Administrators, and he mentioned that a lot of agencies had been using machine learning for many, many years. And one of the ways that it was used was to evaluate returns and identify sort of red flags using these automated processes or heuristics. The machine learning model basically is trained to look for certain features that could indicate that a return may be fraudulent or underreport income or that a taxpayer didn't file a return that they should have by cross-referencing data.

And Tax Notes had previously reported on a report by a group of researchers at Stanford, I think in 2023, that had indicated that automated systems used by the IRS had potential issues with bias, particularly those that were being used to evaluate people claiming refundable tax credits, like the earned income tax credit, were picking disproportionately large numbers of Black taxpayers to audit. And the Government Accountability Office had also looked at the system that the IRS was using, a system that was using machine learning, and determined that there were indications that it could have similar problems with bias due to the way that the machine learning model was trained.

And then, Emily, I think you can also talk about some of the other instances, even outside of the U.S., where some of these models have resulted in serious errors. Not just bias, but actually there was a major scandal, I believe.

Emily Hollingsworth: Absolutely. And that's a fantastic point. The concern that we've been hearing from experts is that without oversight or intervention, these automated systems can make mistakes or perpetuate biases. And as Paul pointed out, there was a scandal out of the Netherlands in which the use of a biased algorithm was responsible for tens of thousands of people who were denied or forced to pay back benefits to which they were entitled. So this was a serious issue.

Paul Jones: Yes. I believe the Netherlands scandal, they actually, the government had to basically resign over it in the end. So it has serious implications obviously for people, but it's something that government actors, agencies, etc., need to be on top of as well, both to serve the public and to prevent collapse in confidence in a system.

David D. Stewart: So are we seeing a lot of adoption of AI at the state level?

Emily Hollingsworth: That's a great question. We've certainly seen legislation that seeks to define and provide some safeguards around both generative AI and for automated systems.

Paul Jones: Yeah. And in California, for example, the Franchise Tax Board recently adopted and implemented machine learning models within the last couple of years. They basically started utilizing them for purposes of vetting tax returns and helping the agency determine which ones to pursue for audit back in 2021. But agencies have been using automated decision systems using various forms of machine learning, and even earlier sort of rules-based systems for many, many years. Decades in fact.

And the focus on generative AI that has arisen in the last couple of years is sort of underscoring the need not only to focus on how that technology is implemented and used by the government, but also, I think, has drawn more attention to the need to also ensure that some of these older systems are getting the review that they should. Because even the older systems and the less advanced forms of machine learning that have been implemented do have these issues with bias.

I believe the one that was identified as having a problem at the IRS had been in use for many, many years, and it had only been on sort of a subsequent review by the Stanford Group — and then again, the system was looked at by the Government Accountability Office — that they had realized that they had this bias issue with that system. I think it was Brandie Nonnecke that I spoke with, with the Americans for Responsible Innovation, had talked to me, and she'd also testified a few years earlier for a report by California's Little Hoover Commission warning that these older automated decision systems, even though they're not using the latest iteration of generative AI, large language models, etc., have the ability to make decisions that have really significant and sometimes potentially ruinous impacts on people's lives financially or legally, and they need to be reviewed carefully.

So yes, there are states that are implementing these systems, implementing generative AI to some extent, for example, to help with staff answering taxpayers' questions or public questions. But in addition to that, there are older systems that have been around for years that may need additional oversight, and that the adoption of generative AI and the concerns about bias with that technology may be sort of spilling back into raising awareness about the need to vet these older systems for it as well.

David D. Stewart: So let's talk a bit about the sort of specific case studies here. We'll start with New York. So how are they using AI these days?

Emily Hollingsworth: The New York State Department of Taxation and Finance is using a system called the Case Identification and Selection System. This is also known through the acronym CISS. It helps increase the effectiveness of processing tax returns and helps flag returns for audit.

Now, the system has two wings. One is for fraud detection, and one is for collections. The fraud detection wing was developed in the early 2000s, according to the department, and the collections wing was created several years later. Now, according to a contract between the tax department and IBM, the company that developed the system, we found that this collection system was described as "a predictive modeling initiative" intended to increase the effectiveness of tax collection efforts. And I should add that this contract is something that we received in our investigation through a state records request.

At the time we published the investigation, CISS is still in use and has proven to be effective, to put it mildly, for the department. By one estimate in 2012, the system helped the state collect more than $2 billion in payments. The reason why we wanted to look more closely at this system was because of a 2025 law that would require state departments to report the AI systems that they use in a publicly posted database that would be maintained by the Office of Information Technology Services.

We've heard questions from state practitioners on whether CISS could be considered an AI system, and we wanted to see if CISS could be a system that would be subject to this new law. The tax department, however, maintains that CISS is not an AI system. Instead, it uses business rules and data analytics to operate, rather than using forms of AI like machine learning. The department went on to say that there's continual ongoing evaluations of the system's performance, though there aren't specific reporting requirements, nor are there audits.

But one researcher we spoke with, after reviewing documents associated with CISS, told us that the argument could be made that the system is AI and that it could be an AI system that would be reported under this New York law. But of course, it appears that it depends on who you ask whether this system would be considered AI or not. But I think to that point, other researchers and experts we spoke with argued that regardless of whether CISS is considered AI, any automated system that can help aid an audit selection should be subject to rigorous third-party oversight to prevent any instances of errors or biases from affecting New York taxpayers.

David D. Stewart: So how is California implementing AI?

Paul Jones: California basically began a long-term IT project upgrade some years back called the Enterprise Data to Revenue Project. And this is an effort by the Franchise Tax Board to overhaul its entire sort of information technology architecture, and that included, in its phase two, the implementation of new machine learning models to help the agency identify instances in which taxpayers may be underreporting income, not filing returns that they need to, and also, based on staff commentary from public meeting notes, to go after instances of fraud. The idea being that utilizing more advanced tools, they can trawl through millions and millions of returns and identify patterns that could indicate the need to have staff take a closer look. So basically, the system sorts through the information and flags things, and then that is either further processed by other automated systems or goes to staff for review.

And the agency itself has been pretty happy with the outcome of the implementation of the system, which it was put into service basically in 2021, but the system itself has not really undergone any outside review. The Franchise Tax Board has processes in place to check the output basically and try and discern if there are indications that it's overselecting from certain populations, things that could indicate bias, but there hasn't been any audit or review of the system by an outside agency. And while reporting, I asked if there had been any reports or documents, sort of formal evaluations that had been provided to lawmakers. The answer to that was also no. And so it basically looks like the FTB has implemented this system, which has the ability to make a significant impact, really to a degree, kind of control the recommendations to agency staff as to which taxpayers to audit is really only being subjected to internal review by the agency that bought it and is using it.

Now, in 2023 California lawmakers passed legislation, Assembly Bill 302, the purpose of which was to ensure that the state would get kind of a sense as to what automated decision systems were being used, specifically high-risk automated decision systems, those that could have a potentially significant impact on someone from, for example, a legal standpoint. And the California Department of Technology was tasked with gathering this information and creating this list of automated decision systems. Again, high-risk automated decision systems. And when the final report was produced — that being, it was assembled, I think, in 2024 and then produced the final copy in 2025 — there wasn't a single agency that had reported to the CDT [California Department of Technology] that it was using a high-risk system, and that included the Franchise Tax Board, which was using at this time by that point, these models to try and determine which taxpayers it should audit and review and scrutinize. And a lot of the experts that Emily and I spoke with indicated that that was not something that they considered to be particularly credible and indicated that the law that had been created by lawmakers for the purpose of trying to get a handle on the automated decision systems the state was using was inadequate to the task.

David D. Stewart: What sort of mistakes have we seen from these automated systems?

Emily Hollingsworth: Well, while we weren't able to find conclusive examples at the state level, we had seen, firsthand, the ones we had mentioned before including the scandal out of the Netherlands that had affected tens of thousands of people and had resulted in multiple leaders in government resigning over the incident. We also spoke to one of the authors of the 2023 Stanford Review that had evaluated the IRS's audit selection criteria. His name is Jacob Goldin, and he sort of highlighted how disruptive, at best, and harmful, at worst, tax audits can be. They can potentially dissuade taxpayers who qualify for benefits. Those audits can dissuade taxpayers from claiming them in future years, and that's not to mention other downsides, including stress and monetary costs.

Paul Jones: Yeah. The burden on taxpayers is particularly high, and some of the experts we spoke with also noted that one of the effects of these more efficient systems for identifying potential fraud, and indeed when I looked through some of the transcripts and the meeting notes of the Franchise Tax Board, staff was lauding the fact that this new system was making them much more efficient and much faster at being able to target and identify people to pursue.

On one hand, the agency gets better at reaching out to people and saying, "We think you need to file a return. We think that you may have more liability than you reported." But for the taxpayers, the systems to challenge that or to say, "No, that's not correct," are not necessarily as efficient. So there is sort of an asymmetry there potentially in some instances where these systems enhance the ability of the agency to reach out to taxpayers, but they don't necessarily have the same upgraded system on the other side.

And this was a problem one person that we spoke with noted with the IRS. They don't necessarily have the systems set up for the taxpayer to then as efficiently be able to say, "Well, I don't think that's correct." So even without there necessarily being an increase in mistakes or errors, you just end up having a much faster generation of audit letters by the tax agency than they're prepared to fairly or properly handle. At least, that was one of the complaints we heard about the IRS.

I do also want to point out that in addition to instances where there have been positively identified problems like with the apparent racial bias with the IRS's system, the scandal with the Netherlands, the other issue here is that these sorts of problems with bias can be somewhat insidious and difficult to detect. So if you don't have lots of oversight, if you don't have people proactively evaluating these systems, some of these issues with bias can sort of hang around and not be noticed for some time. So it's not as simple a matter as seeing that there are problems. You actually have to go looking for them. And that's one of the reasons why some of the people we spoke with, like Haley Tsukayama with the Electronic Frontier Foundation, said it was important for there to be oversight. Because the systems look for what they're told to, and if there are biases in the metrics, if there are heuristic biases that the systems learn from looking at historic patterns of audits and the like, those can be not necessarily something that people see immediately.

And in addition to that, the agencies that are utilizing these tools, even if they make a reasonably good-faith effort to try and check this, they can benefit from outside oversight by lawmakers, by another agency, by a state auditor, for example. And in addition to that, if you work at an agency that's put a bunch of time and money into creating one of these systems and it's an essential part of your workflow and you're very dependent on it, there may be a sort of institutional bias against finding problems with your system that you're using. That was one of the things Tsukayama mentioned to us.

David D. Stewart: Now you both mentioned that California and New York have some laws on the books about the use of AI. Did you hear from folks you talked to about the shortcomings of those laws?

Emily Hollingsworth: Some of the sources who we spoke with about New York's law questioned why the bill didn't go further. It does require that state agencies report that they use AI systems, and of course there's that transparency aspect of having it be in a publicly available data space. However, there doesn't appear to be any other further movement to regulate or oversee these systems. And that was something that the New York State comptroller, Thomas DiNapoli, had mentioned in an audit conducted in 2025 where he kind of called for the state to have more rigorous oversight of the systems that state agencies are using that use AI for everything from facial recognition at the DMV to other forms.

Paul Jones: And in California, the 2023 law that was passed to require state agencies to work with the California Department of Technology and identify high-risk systems that they were using. High-risk automated decision systems that played an important role in shaping or making decisions that could have a legal or similar impact on people's lives. When that report came out in early 2025 and not a single agency had identified that they used such a system, a lot of the people we spoke with said that that was, again, not a credible outcome.

And in fact, one lawmaker, Senator Chris Cabaldon (D), was, at least the last time I checked a few weeks back, still backing legislation to extend the mandate of A.B. 302 with sort of a hope that there might be, I don't know what the proper term would be, a review or a revisiting of the declarations by agencies that they were not using high-risk systems. Of course, potentially that would implicate the Franchise Tax Board as well.

There have been other legislative efforts in California as well. One lawmaker, Senator Steve Padilla (D), last year backed legislation, which I believe is now an active Senate Bill 420 that sought to require greater review and oversight of automated decision systems used by state agencies. That was actually something that would've required even more transparency than A.B. 302. And Senator Cabaldon is currently backing a bill, Senate Bill 1248, which similar to [S.B.] 420 also seeks as one of its elements to require more transparency, more information about the use of automated decision systems, including to ensure that they cannot be used as the sole basis for making an adverse decision such as denying a person a benefit and requiring a human review for any such significant decisions. That bill is still in progress and is actually scheduled for a hearing.

But yes, in California, there is a continued interest, and I'm glad to see that lawmakers are not just focusing on generative AI systems and regulating that, but also looking at the use by state agencies of automated decision systems and machine learning.

David D. Stewart: So what sort of things did you hear about AI use in states other than California and New York?

Emily Hollingsworth: Well, an interesting component of this investigation is that there doesn't appear to be a uniform definition of what constitutes as AI or an automated system across states. Some departments of revenue may be using AI without even knowing it.

So for this investigation, Paul and I contacted 49 state and jurisdictional tax agencies in 2025 to figure out just how many agencies were using AI. We sent a mountain of emails and used a spreadsheet to track our progress. Of the 36 agencies that responded, 22 said they didn't use AI at all in their tax operations. Seven agencies said that they did use AI programs, ranging from chatbots to machine learning models. Six agencies indicated that they were interested in looking into the use of AI or machine learning systems. And this includes some of the agencies that said that they currently didn't use AI. Three agencies, because of their state laws, didn't disclose whether they used AI or not.

Paul Jones: And just to sort of emphasize what Emily noted, this is after we had spoken with people like the people at the Federation of Tax Administrators who told us that state agencies have been using AI for decades. And this is something I spoke with, again, with Brandie Nonnecke about that one of the challenges here is that state agency staff may not necessarily have familiarity with what would count as AI and that there could be sort of as a result, not necessarily the same degree of focus or emphasis on checking the behavior and the performance of some of those systems. And there is also an issue with transparency there. So there can be issues with groups like the Electronic Frontier Foundation, which would want to take a look at how automated systems and forms of AI are being used and agencies aren't even correctly identifying systems that would qualify as meeting that criteria as AI.

And of course, as we saw in New York and California, disagreement over whether a system is a certain type of AI or AI at all or high risk also has implications for the degree of transparency that agencies might provide lawmakers or another state agency that's trying to get a handle on the use of those systems. So looking more broadly at other states, there is, as Emily said, sort of a lack of consistency that's a bit concerning in terms of how these things are defined and tracked even by people at the agencies that are using them.

David D. Stewart: So having gone deep into this subject, what is the takeaway that both of you are coming from this investigation with?

Emily Hollingsworth: Well, I found a comment made by the Stanford review author who we mentioned before, Jacob Goldin. I thought he made an interesting point about these systems. He sees their adoption in state agencies, particularly the newer iterations, as a positive thing. He sees them as increasing efficiency and making the process smoother for taxpayers and auditors. But he believes that departments of revenue should establish clear and intentional goals to avoid disproportionately selecting lower income taxpayers for audit. And according to Goldin, machine learning systems can inadvertently carry out inequitable practices.
So essentially, Goldin is suggesting that departments of revenue do what they do already, which is evaluate their processes and systems and carry out new policies and oversight requirements that best protect taxpayers and their information.

Paul Jones: Yeah. And I think it's really great that Emily went back to that comment by Goldin, because Goldin also was a big fan of machine learning. He was with the Stanford group that reviewed the system, and at the same time when we spoke with him, he said that one important piece of perspective to bear in mind is that even when you simply had human beings using criteria and going through sort of a random sample maybe of tax returns to determine which ones to audit, you still had problems with mistakes and bias, but that machine learning is actually potentially a great tool for addressing that.

I think what our project indicates, simply, is that it's important for states, government agencies, and of course, this would translate over to private entities as well, to just bear in mind that when you automate decisions, that doesn't mean that you are allowed to simply let the system run on autopilot.

I did want to point out, and I think I mentioned it before, but I should stress it, the Franchise Tax Board in California did have a system in place to review its system and whether it was picking disproportionately from various groups. The real issue is simply that it was very difficult for anyone else to take a look at how well that system was working, and the lawmakers in the state didn't seem to have gotten any reports. The state auditor hadn't taken a look at it. At least not yet. And so in order to build trust in those systems, even if they're working well, there needs to be a guarantee, an assurance that there is proper oversight occurring. I think that is one of the takeaways that I would hope people would come away with.

David D. Stewart: Well, this is definitely going to be an issue that we're tracking for the foreseeable future. Emily, Paul, thank you so much for being here.

Emily Hollingsworth: Thanks, Dave.

Paul Jones: Thanks, David.

David D. Stewart: That's it for this week. You can follow me online at @TaxStew, that's S-T-E-W, and be sure to follow @TaxNotes for all things tax. If you have any comments, questions, or suggestions for a future episode, you can email us at podcast@taxanalysts.org. And as always, if you like what we're doing here, please leave a rating or review wherever you download this podcast. We'll be back next week with another episode of Tax Notes Talk.

Anthony Zoppo: Tax Notes Talk is a production of Tax Notes. You can learn more about us by visiting www.taxnotes.com/podcast. When major media wants the straight story, they turn to Tax Notes. Thank you for listening, and join us again for another edition of Tax Notes Talk.

Tax Analysts Inc. does not provide tax advice or tax preparation services. Nothing in the podcast constitutes legal, accounting, or tax advice. A full disclaimer is included in the transcript.

Tax Analysts Inc. does not provide tax advice or tax preparation services. The information you have seen and heard today represents the views of the presenters, which may not be the same as those of Tax Analysts Inc. It may include information obtained from third parties, and Tax Analysts Inc. makes no warranties or representations of any kind and is not responsible for any inaccuracies. Nothing in the podcast constitutes legal, accounting, or tax advice. The tax laws change frequently, and neither Tax Analysts Inc. nor the presenters can guarantee that any information seen or heard is accurate. Also, due to changing tax laws, any information broadcast or downloaded after its original air date may no longer represent the current views of the presenters. If you have any specific questions about any legal or tax matter, you should always consult with your attorney or tax professional.

All content in this broadcast is protected under U.S. and international laws. Copyright © 2026 Tax Analysts Inc. Unauthorized recording, downloading, copying, retransmitting, or distributing of any part of the podcast is strictly prohibited. All rights reserved.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.