Design an Accurate Algorithm for Alias Detection

Full Text (PDF, 669KB), PP.36-44

Views: 0 Downloads: 0

Author(s)

Muneer Alsurori 1,* Maher Al-Sanabani 2 Salah AL-Hagree 2

1. Ibb University/Faculty of Sciences /Department of Computer Sciences & Information Technology, Ibb, Yemen

2. Thamar University/2Faculty Computer Science and Information Systemst, Thamar,Yemen

* Corresponding author.

DOI: https://doi.org/10.5815/ijieeb.2018.03.05

Received: 1 Nov. 2017 / Revised: 5 Dec. 2017 / Accepted: 8 Jan. 2018 / Published: 8 May 2018

Index Terms

Alias Detection (AD), N-gram Distance, Transliteration, Name Matching

Abstract

An improvement in detection of alias names of an entity is an important factor in many cases like terrorist and criminal network. Accurately detecting these aliases plays a vital role in various applications. In particular, it is critical to detect the aliases that are intentionally hidden from the real identities, such as those of terrorists and frauds. Alias Detection (AD) as the name suggests, a process undertaken in order to quantify and identify different variants of single name showing up in multiple domains. This process is mainly performed by the inversion of one-to-many and many-to-one mapping. Aliases mainly occur when entities try to hide their actual names or real identities from other entities i.e.; when an object has multiple names and more than one name is used to address a single object. N-gram distance algorithm (N-DIST) have find wide applicability in the process of AD when the same is based upon orthographic and typographic variations. Kondrak approach, a popular N-DIST works well and fulfill the cause, but at the same time we uncover that (N-DIST) suffers from serious inabilities when applied to detect aliases occurring due to the transliteration of Arabic name into English. This is the area were we have tried to hammer in this paper. Effort in the paper has been streamlined in extending the N-gram distance metric measure of the approximate string matching (ASM) algorithm to make the same evolve in order to detect aliases which have their basing on typographic error. Data for our research is of the string form (names & activities from open source web pages). A comparison has been made to show the effectiveness of our adjustment to (N-DIST) by applying both forms of (N-DIST) on the above data set. As expected we come across that adjusted (A-N-DIST) works well in terms of both performance & functional efficiency when it comes to matching names based on transliteration of Arabic into English language from one domain to another.

Cite This Paper

Muneer Alsurori, Maher Al-Sanabani, Salah AL-Hagree, "Design an Accurate Algorithm for Alias Detection", International Journal of Information Engineering and Electronic Business(IJIEEB), Vol.10, No.3, pp. 36-44, 2018. DOI:10.5815/ijieeb.2018.03.05

Reference

[1]J. Brynielsson, A. Horndahl, F. Johansson, L. Kaati, C. M°artenson, and P. Svenson, “Harvesting and analysis of weak signals for detecting lone wolf terrorists,” Submitted to Security Informatics, 2013.
[2]D. B. Neill 2002. Fully Automatic Word Sense Induction by Semantic Clustering. M.Phil Thesis. Cambridge University.
[3]D. Jurafsky and J. H. Martin 2000. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall.
[4]Muhammad Ghafoor, Mehyeddin Abdulrahman and Shvan Tariq: Plagiarism Detection System for the Kurdish Language” I.J. Information Technology and Computer Science journal, V.12, no. 64-71,2017.
[5]Prianka Mandal and B M Mainul Hossain: A Systematic Literature Review on Spell Checkers for Bangla Language ” I.J. Information Technology and Computer Science journal, V.6, no. 40-47,2017.
[6]R. Ibrahim, S. Saeed, and K. Wakil, "Plagiarism Detection Techniques for Arabic Script Languages: A Literature Review," Kurdistan Journal of Applied Research, vol. 2, no. 3, 2017.
[7]Ning An , Lili Jiang , Jianyong Wang , Ping Luo , Min Wang, Bing Nan Li , Toward detection of aliases without string similarity, Information Sciences 261,89–100 ,2014.
[8]M. Bilenko and R. J. Mooney. On evaluation and training-set construction for duplicate detection. In Proceedings of the KDD-2003 Workshop on Data Cleaning, Record Linkage, and Object Consolidation, pages 7-12, 2003.
[9]L. K. Branting. Name matching in law enforcement and counter-terrorism. In Proceedings of ICAIL 2005 Workshop on Data Mining, Information Extraction, and Evidentiary Reasoning for Law Enforcement and Counter- Terrorism.
[10]Shaikh, M. , Dar, H., Shaikh, A., and Shah, A. “Adjusted Edit Distance Algorithm for Alias Detection”, International Conference on Information and Knowledge Management , 2012 .
[11]Salah Alhagree and Maher A. Al-Sanabani, “ A Framework For Name Matching In Arabic Language”, 1st Scientific Conference on Information Technology and Networks, 2016.
[12]Salah Alhagree, “Design Algorithms for Matching English and Arabic Names” Master‟s thesis, Thamar University, Department of Computer Science. 2017.
[13]P. Christen, Data Matching – Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, Springer, 2012. ISBN 978-3-642- 31163-5
[14]Levenshtein, Vladimir I. "Binary codes capable of correcting deletions, insertions, and reversals." In Soviet physics doklady, vol. 10, no. 8, pp. 707-710. 1966.
[15]Zhan Su, Byung-Ryul Ahn, Ki-yol Eom, Min-koo Kang, Jin-Pyung Kim, Moon-Kyun Kim" Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm"The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08), 2008.
[16]yoke yie chen , suet-peng yong and adzlan ishak , "Email Hoax Detection System Using Levenshtein Distance Method", journal of computers, vol. 9, no. 2, february 2014.
[17]W. E. Winkler and Y. Thibaudeau. An application of the fellegi-sunter model of record linkage. Technical report, U.S. Decennial Census, Bureau of the Census, 1990.
[18]P. Hsiung, D. Andrew, W. Moore, and J. Schneider. Alias detection in link data sets. In Proceedings of the International Conference on Intelligence Analysis, 2005.
[19]L. Jiang, J. Wang, P. Luo, N. An, M. Wang, Towards alias detection without string similarity: an active learning based approach, in: Proceedings of the 35th Annual International ACM SIGIR Conference, 2012. Computer Science, 3772, Springer, Heidelberg, Germany, 115–126, 2005.
[20]P.Selvaperumal and A.Suruliandi , "String Variant Alias Extraction Method using Ensemble Learner",2016,
[21]Kondrak, G, “N-gram similarity and distance”, In M. Consens and G. Navarro (eds.), Proceedings of the String Processing and Information Retrieval 12th International Conference, Buenos Aires, Lecture Notes in
[22]Maher Sanabani, Salah Al-Hagree. ,“Improved An Algorithm For Arabic Name Matching”. Open Transactions On Information Processing ISSN(Print): 2374-3786 ISSN(Online): 2374-3778.2015.
[23]Abdulhayoglu, M. A , Bart Thijs , Wouter Jeuris , “Using character n-grams to match a list of publications to references in bibliographic databases” , DOI 10.1007/s11192-016-2066-3,2016.
[24]www.kalmasoft.com/KLEX/dbfamnm.htm.