Work place: University of BECHAR, ALGERIA
E-mail: Med.Yacine.Dennai@gmail.com
Website:
Research Interests: World Wide Web
Biography
Mohammed Yacine DENNAI is a PhD student in first year Computer Science University of BECHAR - ALGERIA. He received his license diploma in Computer Science from the University of BECHAR - Algeria in 2013. He received the master’s degree in Computer Science from the University of BECHAR - Algeria, in 2015. His research interests are in the field of semantic web, web applications and ontology.
By Abdeslem DENNAI Mohammed Yacine DENNAI Sidi Mohammed BENSLIMANE
DOI: https://doi.org/10.5815/ijitcs.2016.11.03, Pub. Date: 8 Nov. 2016
Three classes of documents, based on their data, circulate in the web: Unstructured documents (.Doc, .html, .pdf ...), semi-structured documents (.xml, .Owl ...) and structured documents (Tables database for example). A semi-structured document is organized around predefined tags or defined by its author.
However, many studies use a document classification by taking into account their textual content and underestimate their structure. We attempt in this paper to propose a representation of these semi-structured web documents based on weighted vectors allowing exploiting their content for a possible treatment. The weight of terms is calculated using: The normal frequency for a document, TF-IDF (Term Frequency - Inverse Document Frequency) and logic (Boolean) frequency for a set of documents. To assess and demonstrate the relevance of our proposed approach, we will realize several experiments on different corpus.
Subscribe to receive issue release notifications and newsletters from MECS Press journals