Clearance of express cargo and postal items: Korea tests new analytical tools to root out fraudBy Taeil Kang, Director General, Korea Customs Service
Customs administrations collate a significant volume of data on a daily basis. For instance, the Korea Customs Service (KCS) accumulates 45 GB of structured data and 30 GB of unstructured data in its database every day. However, most Customs administrations were until now not able to leverage such data due to a lack of information technology (IT) infrastructure and knowledge about data analytics techniques.
Things changed, however, with the development of modern IT infrastructure as well as Big Data open source analytics solutions to manage and analyse data. Tools such as Hadoop and R, a language and environment for statistical computing and graphics, have made it possible to create “values” from the huge amounts of data that are received each day.
In 2017, the KCS set up a Roadmap for Big Data Analysis, and commenced a six month-long training programme to nurture talents and experts in data analysis. This year, the Service established its own infrastructure to initiate in-house data analysis. Moreover, it is planning to raise 300 experts (7% of the total Customs workforce) in Big Data analysis over the next five years.
This article presents an experiment that the KCS has been undertaking to see whether new analytical tools could help in testing a hypothesis related to commercial fraud via express cargo and postal items, and identify potential illicit transactions.
With the exponential growth in e-commerce, the number of small parcels to be cleared by Customs has skyrocketed, stretching the limits of Customs enforcement capacities. Korea has a tax-exemption system and simplified Customs procedures in place for “low-value goods,” and there is reason to believe, for example, that criminals sneak in smaller quantities of goods in separate consignments to avoid reaching the de minimis thresholds, above which duties and/or taxes become payable.
But so far, Customs has failed to effectively respond to this form of crime using conventional methods due to difficulties in analysing the 200 million pieces of data that have been generated just over the past 10 years. It is worth mentioning here that, in Korea, express couriers send requested clearance information electronically, in order to permit the pre-advice and possible pre-clearance of items. Korea Post also send some information on parcels electronically.
To address this challenge, the KCS decided to boost its data analysis capacity by bringing together Customs officers trained in data mining and Customs experts dealing with the clearance of express cargo and postal items. Based on the outcomes of their discussions, IT experts from the private sector who have been working with Customs’ IT systems for years then reviewed the actual analysis tools and methods, and trained officers conducted a two month-long projects.
Based on their experience, the hypothesis formulated by risk analysts was that operators (in an effort to avoid paying duties/taxes) were importing items in a multiple of small parcels, using a number of different addresses and contact numbers. In other words, compliant importers use one name, one phone number, and one address for all their operations, while non-compliant importers use a complex series of names, phone numbers, and addresses.
To confirm the hypothesis, records of importations that were transported via express and postal services were extracted for a three-year period. Then, search tools were used to mine the data in order to identify specific information such as phone numbers and addresses. Datasets containing the refined data, including the consignee’s name, address and phone number, were then created for analysis purposes.
Among the suspicious cases that came out of the analysis was an importer who had reported 123 different phone numbers and 127 different addresses. To facilitate data-reading, the analysis team converted the addresses into geographic coordinates. Several visualization techniques were used. For example, the datasets were analysed using ORA, a network analysis tool, to examine correlations and relationships. As can be seen in figure 2, 83 different people reported the same seven phone numbers on different occasions when importing goods destined for 60 different addresses.
By inputting information such as phone numbers, descriptions of goods and the exporting country into the visualization programme, analysts were able to identify importers’ different addresses. Data on seven suspicious importers showed that they were using a specific region of Seoul, Korea’s capital, as their address, which indicated that they might be importing items in a multiple of small parcels with false destination addresses across this region.
Issues and solutions
When the KCS first mapped the project, the team wanted to analyse data reported in all simplified and general declarations. However, many of the declarations contained omitted or incomplete information on the consignor, the consignee, the goods description, and the phone number. As a result, the scope of the analysis was scaled down to include only general import declarations of goods containing relatively complete and accurate information.
Even when data was complete, it had to be refined. In many cases, the same address would be written in a different way or use different spellings. The data cleansing process took a long time and was rather burdensome. It involved replacing the country code with a country name, and removing special characters in international phone numbers as well as blank spaces in addresses. Data collection and refinement was the process where unexpected difficulties were experienced the most, and which took the longest time.
“Garbage in, garbage out” is a well-known maxim relating to the need for “good” information for meaningful data analysis. The expression emphasizes that the quality of output is determined by the quality of input. The KCS has learned through experience that Customs officers tend to be nonchalant to the importance of the quality of data in Customs as much as to the quantity of it. In light of this, the Service plans to conduct automatic data cleansing at the time when data is recorded in its database by adopting artificial intelligence technologies.
Another lesson learned is the importance of “domain knowledge.” During the project, one IT expert from the private sector said that a task which took one week to complete by Customs officials would have taken one month by a “lay” person. In other words, domain knowledge matters a lot when analysing data. Therefore, Big Data analysis of Customs related topics should remain within the purview of Customs.
This project was a short-term pilot project aimed at testing how data analytics could enhance risk analysis. It was applied to express cargo and postal items to identify commercial fraud, but the KCS believes that the same methodology could be applied to other areas. For example, criminals trying to import high-risk cargo, such as narcotics and weaponry, tend to file an import declaration with a false address in order to hide their identity.
The KCS plans to invite a larger number of IT experts to enhance the tools used during the project in order to make them fit the Service’s analytical needs. These solutions will be integrated into KCS’s system for utilization in actual investigations.
De minimis thresholds in Korea are used in different ways:
- as a “value” threshold below which duties and taxes are not collected and no Customs declaration is required: for postal operators the threshold is 150 US dollars using the FOB price; and for couriers the threshold is 150 US dollars using the FOB price, or 200 US dollars for goods originating in the US, under the terms of the Free Trade Agreement signed between Korea and the US;
- as a “reporting” threshold for goods in respect of which a full Customs declaration must be submitted: for express cargo, a “list clearance” procedure allows a trader to receive goods and, providing their value is below the de minimis threshold, clear them by submitting 24 pieces of information, such as the trader’s name and address, the consignee’s name and address, and the type and price of the goods; as for goods entering via the international mail channel, they are cleared on-the-spot.