Pufeng Du

Work place: School of Computer Science and Technology, Tianjin University, Tianjin, China

E-mail: pdu@tju.edu.cn

Website:

Research Interests: Bioinformatics, Computer systems and computational processes, Computational Learning Theory

Biography

Pufeng Du was born in Tianjin, China, in 1983. He received his Ph.D. degree in Control Theory and Engineering from Tsinghua University, Beijing, China, in 2010. He is an Assistant Professor of the School of Computer Science and Technology, Tianjin University, Tianjin, China, since January 2010. His current research interests are bioinformatics and machine learning. Prof. Du is a member of Association for Computing Machinery (ACM) and China Computer Federation (CCF). 

Author Articles
CHex: An Efficient RDF Storage and Indexing Scheme for Column-Oriented Databases

By Xin Wang Shuyi Wang Pufeng Du Zhiyong Feng

DOI: https://doi.org/10.5815/ijmecs.2011.03.08, Pub. Date: 8 Jun. 2011

As increasingly large RDF data sets are being published on the Web, effcient RDF data management has become an essential factor in realizing the Semantic Web vision. However, most existing RDF storage schemes, which are built on top of row-store relational databases, are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format used in real-world applications arguably calls for an effort to deal with these drawbacks. In this paper, we propose a novel RDF storage and indexing scheme, called CHex, which uses the triple nature of RDF as an asset to implement sextuple indexing for a column-oriented database system. Using binary association tables (BATs) in the column-oriented data model, RDF data is indexed in six possible ways, one for each possible ordering of the three RDF elements. The sextuple indexing scheme in a column-oriented database not only provides efficient single triple pattern lookups, but also allows fast merge-joins for any pair of two triple patterns. To evaluate the performance of our approach, we generate large-scale data sets upto 13 million triples, and devise benchmark queries that cover important RDF join patterns. The experimental results show that our approach outperforms the row-oriented database systems by upto an order of magnitude and is even competitive to the best state-of-the-art native RDF store.

[...] Read more.
Other Articles