Enhancing Tax Fraud Detection: An AI-Driven Framework for Operational Implementation

Detecting tax fraud is a big challenge, especially with many governments facing high deficits. Fraud makes up a large part of the tax gap, estimated to be between 4% and 15% of the amounts owed in OECD countries. For example, in France, fraud related to VAT alone is thought to cost between €20 and €25 billion. Because of this, the Cour des Comptes has released several reports calling for stronger action against fraud. In France, the Directorate General of Public Finance (DGFiP) is in charge of monitoring tax fraud and has already started using some artificial intelligence (AI) tools, which have shown good results.

To help address this problem, Christophe Gaie put together a project group made up of students from CentraleSupélec. The group worked on a research study to create an operational framework for tax fraud detection. This framework included different methods, algorithms, computer code, and simulated data. These tools were shared with people involved in the fight against fraud to help them in their efforts.

Objective of the Study

The objective of this project was to build upon theoretical research that has already defined and articulated various concepts, issues, and directions within the field of tax fraud detection. It extends this theoretical foundation by proposing an operational framework that facilitates the development and comparison of algorithms created by researchers worldwide. The focus of the study was on identifying irregularities, specifically fraud committed by individuals, as fraud perpetrated by legal entities is addressed separately.

Database Used in the Study

A graph showing Key Metrics in Tax Fraud Detection Study

A tax file contains lots of information about individuals, like their family status, income, and assets. However, because of privacy rules, getting access to this data can be hard. To solve this problem, the researchers created a synthetic database using selected data like income, spending, property values, and socio-professional categories. This database is designed so that more data can be added to it in the future if needed.

Since the DGFiP can’t provide actual tax data for fraud detection due to confidentiality concerns, researchers had to build their own database. This process took a lot of time and required a good understanding of tax-related issues. Also, since different researchers work with their own unique data sets, comparing their algorithms becomes more difficult.

Fraud Detection Using AI

The AI system used in this study works by selecting tax files for review based on certain configurable criteria. The system uses knowledge about common fraud patterns to categorize the likelihood of fraud into three main types:

High expenses and/or assets compared to income,
Low expenses and/or assets compared to income,
High wealth compared to others in the same socio-professional group.

The researchers used reference data from INSEE to build the dataset for this study. This included distributions of income, spending, wealth, and socio-professional categories. The data was organized to reflect real-world percentages. Other parameters were based on the Singh-Maddala distribution, which helped structure the data better.

The researchers develop algorithms for the purpose of detecting possible cases. Such algorithms range from neural networks using varying sampling methodologies to a random forest algorithm that works on a group of decision trees solving classification problems.

Real-World Application of Algorithms

The algorithms that have been developed in this study are not, as at this moment, bank for usage with real tax data; they are, however, ready for use by public agencies such as DGFiP’s SJCF-1D office in charge of programming and analyzing data for control purposes. One of the students from the project got an internship in this office, which may further open ways of future collaborations and feedback.

Accuracy of the Algorithms

When it comes to fraud detection, there’s always a balance between accuracy (how many predictions are correct) and sensitivity (how many fraud cases are actually detected). The performance of the algorithms was measured using the AUPRC (Area Under the Precision-Recall Curve), which looks at this balance. The random forest model optimized for sensitivity achieved an AUPRC score of 0.851, showing that AI could be very useful for tax fraud detection.

Limitations of AI Alone

Even though AI is a powerful tool, it’s not enough to handle tax fraud by itself. Fighting fraud requires a team effort, involving humans working together. Tax auditors play an important role by investigating suspected fraud cases. Their work must follow the law and respect taxpayer rights, which makes their input essential.

Auditors can be assisted by using AI to identify cases that require their review depending on the skills, workloads, or top training needs for new agents, but finally, it comes to the auditor, who may find other parameters that AI cannot catch.

For AI systems to work well, they must be part of a larger information system that can handle other administrative tasks. This includes making sure different systems work together, maintaining them over time, and upgrading them with better algorithms when needed.

Conclusion

In conclusion, AI offers promising tools for tax fraud detection, but it can’t replace the human side of the process. Auditors, legal frameworks, and good technology all need to work together for this system to be effective. AI is a step forward, but it’s just one piece of the larger fight against tax fraud.