Tuning Schema Matching Systems using Parallel Genetic Algorithms on GPU

Full Text (PDF, 490KB), PP.48-56

Views: 0 Downloads: 0

Author(s)

Yuting Feng 1,* Lei Zhao 1 Jiwen Yang 1

1. School of Computer Science & Technology, Soochow University, Suzhou, China

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2010.01.07

Received: 15 Jun. 2010 / Revised: 25 Jul. 2010 / Accepted: 26 Sep. 2010 / Published: 8 Nov. 2010

Index Terms

Schema matching, tuning, genetic algorithms, GPU, CUDA

Abstract

Most recent schema matching systems combine multiple components, each of which employs a particular matching technique with several knobs. The multi-component nature has brought a tuning problem, that is to determine which components to execute and how to adjust the knobs (e.g., thresholds, weights, etc.) of these components for domain users. In this paper, we present an approach to automatically tune schema matching systems using genetic algorithms. We match a given schema S against generated matching scenarios, for which the ground truth matches are known, and find a configuration that effectively improves the performance of matching S against real schemas. To search the huge space of configuration candidates efficiently, we adopt genetic algorithms (GAs) during the tuning process. To promote the performance of our approach, we implement parallel genetic algorithms on graphic processing units (GPUs) based on NVIDIA’s Compute Unified Device Architecture (CUDA). Experiments over four real-world domains with two main matching systems demonstrate that our approach provides more qualified matches over different domains.

Cite This Paper

Yuting Feng, Lei Zhao, Jiwen Yang, "Tuning Schema Matching Systems using Parallel Genetic Algorithms on GPU", International Journal of Modern Education and Computer Science(IJMECS), vol.2, no.1, pp.48-56, 2010. DOI:10.5815/ijmecs.2010.01.07

Reference

[1]E. Rahm and P. A. Bernstein, “A Survey of Approaches to Automatic Schema Matching,” The VLDB Journal, vol. 10, no. 4, 2001, pp. 334–350.
[2]P. Shvaiko and J. Euzenat, “A Survey of Schema-based Matching Approaches,” Journal on Data Semantics IV, vol. 3730, 2005, pp. 146–171.
[3]H. H. Do, “Schema Matching and Mapping-based Data Integration,” PhD Thesis, Department of Computer Science, University at Leipzig, Germany, 2006.
[4]F. Duchateau, Z. Bellahsène, and E. Hunt, “XBenchMatch: a Benchmark for XML Schema Matching Tools,” in VLDB ’07: Proceedings of the 33rd International Conference on Very Large Data Bases, 2007, pp. 1318–1321.
[5]D. Aumueller, H. H. Do, S. Massmann, and E. Rahm, “Schema and Ontology Matching with COMA++,” in SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, 2005, pp. 906–908.
[6]Y. Qian, J. L. Nie, G. H. Liu, and S. H. Gao, “Corpus-based Complex Schema Matching,” Journal of Computer Research and Development (Chinese), vol. 43, no. Spply., 2006, pp. 200-205.
[7]P. A. Bernstein, S. Melnik, and J. E. Churchill, “Incremental Schema Matching,” in VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, 2006, pp. 1167–1170.
[8]C. Drumm, M. Schmitt, H. H. Do, and E. Rahm, “Quickmig: Automatic Schema Matching for Data Migration Projects,” in CIKM ’07: Proceedings of the 16th ACM Conference on Information and Knowledge Management, 2007, pp. 107–116.
[9]P. Shvaiko, F. Giunchiglia, and M. Yatskevich, Semantic Web Information Management, A Model-based Perspective. Springer Berlin Heidelberg, 2010, ch. 9 Semantic Matching with S-Match, pp. 183–202.
[10]Y. Lee, M. Sayyadian, A. Doan, and A. Rosenthal, “eTuner: Tuning Schema Matching Software using Synthetic Scenarios,” The VLDB Journal, vol. 16, no. 1, 2007, pp. 97–122.
[11]A. Nandi and P. A. Bernstein, “HAMSTER: Using Search Clicklogs for Schema and Taxonomy Matching,” Proceedings of the VLDB Endowment, vol. 2, no. 1, 2009, pp. 181–192.
[12]H. Elmeleegy, M. Ouzzani, and A. Elmagarmid, “Usage-based Schema Matching,” in ICDE ’08: Proceedings of the 24th International Conference on Data Engineering, 2008, pp. 20–29.
[13]G. R. Ding, G. H. Wang, and Y. H. Zhao, “Multi-schema Integration based on Usage and Clustering Approach,” Journal of Computer Research and Development (Chinese), vol. 47, no. 5, 2010, pp. 824-831.
[14]P. A. Bernstein, S. Melnik, M. Petropoulos, and C. Quix, “Industrial-strength Schema Matching,” SIGMOD Record, vol. 33, no. 4, 2004, pp. 38–43.
[15]H. H. Do and E. Rahm, “Matching Large Schemas: Approaches and Evaluation,” Information Systems, vol. 32, no. 6, 2007, pp. 857-885.
[16]F. Duchateau, R. Coletta, Z. Bellahsene, and R. J. Miller, “YAM: a Schema Matcher Factory,” in CIKM ’09: Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009, pp. 2079–2080.
[17]A. Marie and A. Gal, “Boosting Schema Matchers,” in OTM ’08: Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:, 2008, pp. 283–300.
[18]F. Duchateau, Z. Bellahsene, and R. Coletta, “A Flexible Approach for Planning Schema Matching Algorithms,” in OTM ’08: Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:, 2008, pp. 249–264.
[19]E. Peukert, H. Berthold, and E. Rahm, “Rewrite Techniques for Performance Optimization of Schema Matching Processes,” in Proceedings of EDBT 2010, 13th International Conference on Extending Database Technology, 2010, pp. 453–464.
[20]D. Whitley, “A Genetic Algorithm Tutorial,” Statistics and Computing, vol. 4, no. 2, 1994, pp. 65–85.
[21]Q. Yue, and S. Feng, “The Statistical Analyses for Computational Performance of the Genetic Algorithms,” Chinese Journal of Computers (Chinese), vol. 32, no. 12, 2009, pp. 2389–2392.
[22]E. Cantú-Paz, “A Survey of Parallel Genetic Algorithms,” Calculateurs Paralleles, vol. 10, 1998.
[23]P. Pospichal, J. Jaros, and J. Schwarz, “Parallel Genetic Algorithm on the CUDA Architecture,” in Proceedings of the Applications of Evolutionary Computation, EvoApplicatons 2010: EvoCOMPLEX, EvoGAMES, EvoIASP, EvoINTELLIGENCE, EvoNUM, and EvoSTOC, Part I, 2010, pp. 442-451.
[24]C. F. Tan, A. G. Ma, and Z. C. Xing, “Research on the Parallel Implementation of Genetic Algorithm on CUDA Platform,” Computer Engineering & Science (Chinese), vol. 31, no. A1, 2009, pp. 68-72.
[25]T. T. Wong, and M. L. Wong, “Parallel Evolutionary Algorithms on Consumer-Level Graphics Processing Unit,” Parallel Evolutionary Computations, 2006, pp. 133-155.
[26]S. F. Zhang, and Z. M. He, “Implementation of Parallel Genetic Algorithm Based on CUDA,” in ISICA '09: Proceedings of the 4th International Symposium on Advances in Computation and Intelligence, 2009, pp. 24-30.
[27]B. Alexe, W. C. Tan, and Y. Velegrakis, “STBenchmark: Towards a Benchmark for Mapping Systems,”Proceedings of the VLDB Endowment, vol. 1, no. 1, 2008, pp. 230–244.
[28]NVIDIA C., “NVIDIA CUDA Compute Unified Device Architecture Programming Guide,” NVIDIA Corporation, 2007.
[29]S. Melnik, H. Garcia-Molina, and E. Rahm, “Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching,” in ICDE ’02: Proceedings of the 18th International Conference on Data Engineering, 2002, pp. 117–128.