IJITCS Vol. 8, No. 2, Feb. 2016
Cover page and Table of Contents: PDF (size: 232KB)
REGULAR PAPERS
This work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTs are used to produce all possible inflected verb forms with their full morphological features. Among the algorithm's strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license.
[...] Read more.Reasoning is the fundamental capability which requires knowledge. Various graph models have proven to be very valuable in knowledge representation and reasoning. Recently, explosive data generation and accumulation capabilities have paved way for Big Data and Data Intensive Systems. Knowledge Representation and Reasoning with large and growing data is extremely challenging but crucial for businesses to predict trends and support decision making. Any contemporary, reasonably complex knowledge based system will have to consider this onslaught of data, to use appropriate and sufficient reasoning for semantic processing of information by machines. This paper surveys graph based knowledge representation and reasoning, various graph models such as Conceptual Graphs, Concept Graphs, Semantic Networks, Inference Graphs and Causal Bayesian Networks used for representation and reasoning, common and recent research uses of these graph models, typically in Big Data environment, and the near future needs and challenges for graph based KRR in computing systems. Observations are presented in a table, highlighting suitability of the surveyed graph models for contemporary scenarios.
[...] Read more.This work documents the development and implementation of a commercial bank's loan classification database system. It employed multiple discriminant analysis models to assess the relationship between relevant loan variables and existing bad loan problem. It also made use of mathematical model to replicate the Examiner's classification process to classify loans in a more objective and sober way. Classification of loan is grouping of loans in accordance to their likelihood of ultimate recovery from borrowers. Banking business is one of the most highly levered businesses especially on loan accounts. It is likely to collapse in case of a slight deterioration in quality of loans. Six important factors (propriety of use of funds borrowed; operation of Borrower's overdraft account; cooperation with the Bank, collateral and number of days the loan is past due) were identified and grouped as variables in determining the quality of loan portfolio. The developed classification model shows that there exists a linear relation between loan classification and the six variables considered. Four classification functions were developed and implemented in Microsoft Access database to assist in effective classification. The implementation of a database system makes it easy to store relevant classification information and revert to them whenever needed for comparative analysis on quarterly, half-yearly and annual basis.
[...] Read more.Finite-state implementations naturally denote concatenations of morphemes and are limited to modeling concatenative morphotactics. The non-concatenative structure, such as reduplication, in the computational morphology of many world languages cannot be handled completely by finite-state technology. This paper describes the non-concatenative phenomena of reduplication, occurs in the adjective and adverb word classes of Manipuri language using the formalism of finite-state morphology tools and techniques. The discussion covers the non-concatenative nature and the challenges in capturing the various reduplication phenomena exhibited by the two classes; then present a morphological analyzer of the reduplicated adjectives and adverbs. It has been implemented using XFST and LEXC with the application of compile-replace algorithm to the morphotactics description of the language, which includes finite-state operations other than concatenation, to capture reduplication phenomena.
[...] Read more.OpenMP is an implementation program interface that might be utilized to explicitly immediate multi-threaded and it shared memory parallelism. OpenMP platform for specifications multi-processing via concurrent work between interested parties of hardware and software industry, governments and academia. OpenMP is not needs implemented identically by all vendors and it is not proposed for distributed memory parallel systems by itself. In order to invert a matrix, there are multiple approaches. The proposed LU decomposition calculates the upper and lower triangular via Gauss elimination method. The computation can be parallelized using OpenMP technology. The proposed technique main goal is to analyze the amount of time taken for different sizes of matrices so we used 1 thread, 2 threads, 4 threads, and 8 threads which will be compared against each other to measure the efficiency of the parallelization. The result of interrupting compered the amount of time spent in all the computing using 1 thread, 2 threads, 4 threads, and 8 threads. We came up with if we raise the number of threads the performance will be increased (less amount of time required). If we use 8 threads we get around 64% performance gained. Also as the size of matrix increases, the efficiency of parallelization also increases, which is evident from the time difference between serial and parallel code. This is because, more computations are done parallel and hence the efficiency is high. Schedule type in OpenMP has different behavior, we used static, dynamic, and guided scheme.
[...] Read more.In this paper I proposed modified K-means algorithm as the means to assess scientific authors performance by using their h,g-indices values. K-means suffers from poor computational scaling and efficiency as the number of clusters has to be supplied by the user. In this work, I introduce a modification of K-means algorithm that efficiently searches the data to cluster points by compute the sum of squares within each cluster which makes the program to select the most promising subset of classes for clustering. The proposed algorithm was tested on IRIS and ZOO data sets as well as on our local dataset comprising of h- and g-indices, which are the prominent markers for scientific excellence of authors publishing papers in various national and international journals. Results from analyses reveal that the modified k-means algorithm is much faster and outperforms the conventional algorithm in terms of clustering performance, measured by the data discrepancy factor.
[...] Read more.Pharmacy handles all the medicine needed in the hospital that consists of vast amount of records. These produce large scale of datasets that are complex to manage and thereby need tools and technique to easily process, interpret, forecast and predict future consumption. Due to this, the method of predicting and forecasting stock consumption using Data Mining technique in hospital pharmacy is not be a surprising issue. Thus, this research investigated the potential applicability of data mining technology to predict the Anti-Retroviral drugs consumption for pharmacy based up on patient's history datasets of Jugal hospital, Harar, Ethiopia. The methodology used for this research is based on Knowledge Discovery in Database which had mostly relied on using the decision tree algorithms specifically M5P model tree. WEKA software, a data-mining tool were used for interpreting, evaluating and predicting from large datasets. Result with the data set suggests that tree based modeling approach can effectively be used in predicting the consumption of ARV drugs.
[...] Read more.Vehicular Ad-hoc Networks (VANETs) are expected to implement wireless technologies such as Dedicated Short Range Communications (DSRC) which is a category of Wi-Fi. Other candidates of long distance wireless technologies are cellular, satellite, and WiMAX. VANETs can be viewed as component of the Intelligent Transportation Systems (ITS). This paper presents the implementation of Multiple Inputs Multiple Outputs (MIMO) and Adaptive Modulation and Coding (AMC) techniques in WiMAX based Vehicular Ad-hoc Network. This designed system provides multiple radio channels in between transmitter and receiver for transmission and reception of the data by using the concept of MIMO technology. Also AMC provides the selection of different modulation techniques depending on the signal to noise ratio of the channel. These two techniques provide the significant change in the throughput, delay, jitter, and packet delivery ratio and packet loss ratio than existing vehicular ad-hoc network. WiMAX based VANET provides high speed, low cost per bit and large coverage area.
[...] Read more.Now a days, the Data Engineering becoming emerging trend to discover knowledge from web audio-visual data such as- YouTube videos, Yahoo Screen, Face Book videos etc. Different categories of web video are being shared on such social websites and are being used by the billions of users all over the world. The uploaded web videos will have different kind of metadata as attribute information of the video data. The metadata attributes defines the contents and features/characteristics of the web videos conceptually. Hence, accomplishing web video mining by extracting features of web videos in terms of metadata is a challenging task. In this work, effective attempts are made to classify and predict the metadata features of web videos such as length of the web videos, number of comments of the web videos, ratings information and view counts of the web videos using data mining algorithms such as Decision tree J48 and navie Bayesian algorithms as a part of web video mining. The results of Decision tree J48 and navie Bayesian classification models are analyzed and compared as a step in the process of knowledge discovery from web videos.
[...] Read more.In the phishing attack, the user sends their confidential information on mimic websites and face the financial problem, so the user should be informed immediately about the visiting website. According to the Third Quarter Phishing Activity Trends Report, there are 55,282 new phishing websites have been detected in the month of July 2014. To solve the phishing problem, a browser based add-on system may be one of the best solution to aware the user about the website type. In this paper, a novel browser based add-on system is proposed and compared its performance with the existing anti-phishing tools. The proposed anti-phishing tool 'ePhish' is compared with the existing browser based anti-phishing toolbars. All the anti-phishing tools have been installed in computer systems at an autonomous college to check their performance. The obtained result shows that if the task is divided into a group of systems, it can give better results. For different phishing features, the add-on system tool show around 97 percentage successful results at different case conditions. The current study would be very helpful to countermeasure the phishing attach and the proposed system is able to protect the user by phishing attacks. Since the system tool is capable of handling and managing the phishing website details, so it would be helpful to identify the category of the websites.
[...] Read more.