Speed up linear scan in high-dimensions by sorting one-dimensional projections

Full Text (PDF, 174KB), PP.41-48

Views: 0 Downloads: 0

Author(s)

Jiangtao Cui 1,* Bin Xiao 1 Gengdai Liu 1 Lian Jiang 1

1. School of Computer Science and Technology, Xidian University, Xi’an, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijisa.2011.04.06

Received: 15 Jun. 2010 / Revised: 5 Oct. 2010 / Accepted: 27 Jan. 2011 / Published: 8 Jun. 2011

Index Terms

High dimensional indexing, extended B+-tree, one-dimensional mapping, projection, distance computation

Abstract

High-dimensional indexing is a pervasive challenge faced in multimedia retrieval. Existing indexing methods applying linear scan strategy, such as VA-file and its variations, are still efficient when the dimensionality is high. In this paper, we propose a new access idea implemented on linear scan based methods to speed up the nearest-neighbor queries. The idea is to map high-dimensional points into two kinds of one-dimensional values using projection and distance computation. The projection values on the line determined by the first Principal Component are sorted and indexed using a B+-tree, and the distances of each point to a reference point are also embedded into leaf node of the B+-tree. When performing nearest neighbor search, the Partial Distortion Searching and triangular inequality are employed to prune search space. In the new search algorithm, only a small portion of data points need to be linearly accessed by computing the bounded distance on the one-dimensional line, which can reduce the I/O and processor time dramatically. Experiment results on large image databases show that the new access method provides a faster search speed than existing high-dimensional index methods.

Cite This Paper

Jiangtao Cui, Bin Xiao, Gengdai Liu, Lian Jiang, "Speed up linear scan in high-dimensions by sorting one-dimensional projections", International Journal of Intelligent Systems and Applications(IJISA), vol.3, no.4, pp.41-48, 2011. DOI:10.5815/ijisa.2011.04.06

Reference

[1]C. Bohm, S. Berchtold, D. A. Keim, “Searching in high-dimensional spaces-index structures for improving the performance of multimedia databases,” ACM Comput. Surv., vol. 33, no. 3, pp. 322–373, 2001.

[2]R. Weber, H.-J. Schek, S. Blott, “A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces,” in VLDB 1998, pp. 194–205.

[3]K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is ”nearest neighbor” meaningful?” in ICDT 1999, pp. 217–235.

[4]K. V. R. Kanth, D. Agrawal, and A. Singh, “Dimensionality reduction for similarity searching in dynamic databases,” SIGMOD Rec., vol. 27, no. 2, pp. 166-176, 1998.

[5]K. Chakrabarti and S. Mehrotra, “Local dimensionality reduction: A new approach to indexing high dimensional spaces,” in VLDB’00: Proceeding of the 26th International Conference on Very Large Data Bases. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 89-100.

[6]H. T. Shen, X. Zhou, and A. Zhou, “An adaptive and dynamic dimensionality reduction method for high-dimensional indexing,” The VLDB Journal, vol. 16, no. 2, pp. 219-234, 2007.

[7]X. Lian and L. Chen, “A general cost model for dimensionality reduction in high dimensionality spaces,” in ICDE ’07: Proceedings of the 23th International Conference on Data Engineering, 2007, pp. 66-75.

[8]H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. E. Abbadi, “high-dimensional nearest neighbor searching,” Information Systems, vol. 31, no. 6, pp. 512-540, 2006.

[9]J. Cui, S. Zhou, and J. Sun, “Efficient high-dimensional indexing by sorting principal component,” Pattern Recogn. Lett., vol. 28, no. 16, pp. 2412–2418, 2007.

[10]S. Berchtold, C. Bohm, H. V. Jagadish, H.-P. Kreiegel, and J. Sander, “Independent quantization: An index compression technique for high-dimensional data spaces,” in ICDE ’00: Proceedings of the 16th International Conference on Data Engineering. Washington, DC, USA: IEEE Computer Society, 2000, p. 577.

[11]G.-H. Cha and C.-W. Chung, “The gc-tree: a high-dimensional index structure for similairty search in image databases,” IEEE Trans Multimedia, vol. 4, pp. 235-247, 2002.

[12]S. Berchtold, C. Bohm, and H.-P. Kriegal, “The pyramid-technique: towards breaking the curse of dimensionality,” SIGMOD Rec., vol. 27, no. 2, pp. 142-153, 1998.

[13]H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Zhang, “iDistance: An adaptive b+-tree based indexing method for nearest neighbor search,” ACM Trans. Database Syst., vol. 30, no. 2, pp. 364-397, 2005.

[14]M. Ortega, Y. Rui, K. Chakrabarti, S. Mehrotra, and T. S. Huang, “Supporting similarity queries in mars,” in MULTIMEDIA 1997, pp. 403–413.

[15]B. S. Manjunath and W. Y. Ma, “Texture features for browsing and retrieval of image data,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 8, pp. 837-842, 1996

[16]H. T. Shen, B. C. Ooi, , Z. Huang, and X. Zhou, “Towards effective indexing for very large video sequence database,” in SIGMOD 2005, pp. 730–741.

[17]K. Chakrabarti and S. Mehrotra, “Local dimensionality reduction: A new approach to indexing high dimensional spaces,” in VLDB 2000, pp. 89–100