Artificial intelligence and machine learning-driven codification of entities and goods descriptions: transforming risk management in Indian Customs

27 October 2025

By Sruti Vijayakumar and Shivam Dhamanikar, Central Board of Indirect Taxes and Customs, India

Indian Customs has made significant strides in building systemic and real-time targeting capabilities through the development of an integrated risk management system powered by artificial intelligence and machine learning. One of the cornerstones of this system is the codification of supplier details and goods descriptions. This article explains how the codification works and how it is used to analyse supply chain networks, auto-generate risk insights and detect valuation anomalies.

Entity codification: the backbone of automated risk targeting

The foundation of automated risk assessment and targeting in Customs lies in the availability of codified, machine-readable data. Codifying all entities within the supply chain enables Customs administrations to address both revenue and non-revenue risks more effectively, while also establishing robust networks for comprehensive risk analysis. While supply chain entities such as importers and Customs brokers are already assigned unique codes, overseas supplier information is traditionally captured in free-text formats within import declarations. This unstructured data presents a major challenge to automated risk analysis, particularly when the same supplier provides goods to multiple importers across the country.

Assigning unique codes to overseas suppliers

To address this issue, it was decided to assign a unique identifier to suppliers based on their business names and addresses as submitted in the import declaration. An unsupervised machine learning model was developed to carry out this task. Initially, supplier names and addresses extracted from import declarations were cleaned and standardized using natural language processing and text analysis techniques. String-matching algorithms were then applied to assess the similarity between supplier entries, followed by clustering algorithms that group entities exhibiting high match strengths, based on predefined thresholds. To determine the closeness of a data point to a cluster, distance algorithms such as Jaro-Winkler and Levenshtein distance were employed.

The outcome was a supplier code – a unique identifier assigned based on key attributes such as the name, address and country of the supplier. For instance, the variations in supplier details listed below – all referring to the same supplier – are consolidated under a single code: A88000001.

This codification allows the Customs risk management system to treat all these variants as a single supplier, enabling more accurate targeting and supply chain network analysis.

Codification of item descriptions for addressing revenue risks

Building on supplier codification, description codification plays a critical role in detecting revenue-related risks, such as misclassification and undervaluation of goods. Item descriptions, as declared in invoices and import declarations, are often unstructured and inconsistent – even when the same supplier ships identical goods.

To address this, Indian Customs has implemented description codification – an unsupervised machine learning model which assigns a standardized “description ID” to commodities supplied by each codified supplier. The description codification process involves several machine learning-driven steps that convert unstructured item descriptions into standardized data. Item descriptions, which are in free-text format, are extracted from import declarations. Natural language processing is used to clean and standardize the text by removing irrelevant details such as punctuation or stop words. Similar descriptions are grouped using clustering algorithms and text similarity metrics. Items with a high degree of textual and semantic similarity are assigned the same description ID.

This description ID links variant descriptions of the same commodity supplied by the same supplier to a single, analysable category. Drawing from the example above, the supplier A88000001 describes the same automobile parts in various ways, as shown in the table below.

All the above variants, though phrased differently, are recognized as the same product from the same supplier and assigned the description ID “1”. This allows Customs to compare declared values for the same product from the same supplier across different importers, flag undervalued shipments and ensure the appropriate classification of goods.

Supply chain network analytics

Entity codification enables Customs authorities to analyse the relationships among key actors in the supply chain – including suppliers, importers, Customs brokers and ports of import – for more precise risk assessment and targeting.

The network analytics model draws data primarily from import and export declarations, using key input parameters such as importer/exporter identification numbers, Customs broker codes and codified supplier identifiers. For each entity identification number provided, relevant data is extracted from source systems, then cleaned and standardized using text analytics and natural language processing techniques. This prepares the data for accurate network construction, where individual nodes – representing suppliers, importers/exporters and Customs brokers – are identified and distinctly tagged. Using advanced network modelling techniques, these nodes are linked to form a comprehensive network of supply chain relationships. A network visualization tool displays these interconnected networks, allowing Customs officers to explore entity connections, detect anomalies and identify high-risk entities and transactions.

Advanced risk management models

The successful implementation of entity and description codification has laid the groundwork for a suite of risk management models designed to enhance targeting accuracy, decision-making and operational efficiency in Indian Customs.

The first model concerns “Predictive targeting based on suppliers and description”. It applies specific targets or interdictions to suppliers identified as high-risk, enabling granular, entity-specific targeting.

One of the key developments is the “Machine Learning-based Valuation Model”, which leverages supplier codification and description IDs to assess the declared value of goods in real time. By comparing item-level declared values of current consignments with historical data for the same goods from the same supplier, the model generates automated, machine-generated instructions for officers, along with reference points based on past declaration patterns, thereby supporting accurate valuation and fraud detection.

Another notable advancement is the “Insights Module”, which uses artificial intelligence and machine learning to provide a comprehensive, 360-degree analysis of suppliers, importers, Customs brokers and commodities. This module employs network analytics to assess behavioural patterns of entities involved in trade transactions and provides entity-specific risk insights based on historical data, helping officers detect anomalies and trends in trading behaviour.

The “Post-Seizure Analysis Tool (PSAT)” is designed to generate offence-specific insights through relational analysis, using supplier codification and description IDs. The tool focuses on analysing the behavioural aspects of suppliers, importers and Customs agents involved in fraudulent transactions. It contributes to risk scoring and enhances the predictive capabilities of the risk management system. Registered offences directly influence the risk profiles of associated entities, thereby making future assessments more precise.

The “Supplier-Importer Correlation Model” provides detailed insights into the importers and Customs brokers associated with a given supplier. This helps Customs officers evaluate the credentials of the supplier and make informed decisions about the risk posed by that supplier. Finally, the “Offence Database” acts as a centralized repository of offences linked to entity codes and description IDs. This database supports real-time risk alerts and prompts for front-line Customs officers, offering insights into the historical behaviour of suppliers, importers and Customs brokers. It also plays a critical role in proactively flagging high-risk consignments by matching supplier codes, importer codes and product identifiers against registered offence patterns.

The accuracy of these risk management models is regularly measured using key performance indicators (KPIs). Key metrics include accuracy with which the risk management models identify high-risk consignments, the amount of revenue collected by flagging incorrect classification, or undervaluation of goods, the precision of predictive targeting of entities based on their historical performance, false positive rates and the rate of detections.

The use of the models led to important cases, including the seizure of approximately 294 kg of heroin smuggled from Afghanistan at Nhava Sheva Port, of about 883 kg of methamphetamine concealed in a maritime import consignment , and of 7.9 million sticks of foreign brand cigarettes, to name but a few. Multiple illicit consignments of e-waste and other prohibited/restricted items were also discovered, as well as cases of commercial fraud.

Turning data into actionable intelligence

The integration of artificial intelligence and machine learning models into Indian Customs’ risk management system is delivering measurable, real-time results. These models generate daily alerts and decision-support inputs for front-line Customs officers, significantly enhancing the precision and efficiency of enforcement actions. Entity codification and standardized description IDs form a critical foundation for other advanced artificial intelligence and machine learning models used in Customs risk management. By transforming unstructured declaration data into structured, machine-readable formats, Indian Customs has improved its ability to detect both revenue and non-revenue risks with greater accuracy, thereby strengthening the security of cross-border trade. Consequently, efficient risk management and the secure movement of goods contribute to prosperity for all stakeholders involved in international trade.

More information
sruti.vijayakumar@gov.in
shivam.dhamanikar@gov.in

Panorama
Other articles in this Edition >>