Work place: Lviv Polytechnic National University, Lviv, 79013, Ukraine
E-mail: Victoria.A.Vysotska@lpnu.ua
Website: https://orcid.org/0000-0001-6417-3689
Research Interests: Data Science, Machine Learning
Biography
Victoria Vysotska is a Professor at the Information Systems and Networks Department of Lviv Polytechnic National University. She defended her Doctoral degree in Technical Science, speciality in «structural, applied and mathematical linguistics» in 2023 on the topic "Analysis and synthesis of computational linguistic systems for processing Ukrainian textual content" Also, she received a PhD degree in Information Technologies from Lviv Polytechnic National University in 2014. She has currently published more than 500 publications. Her main research interests are focused on the identification of disinformation/fakes/propaganda, detection of sources of disinformation and inauthentic behaviour (bots) of coordinated groups based on artificial intelligence, NLP, computer linguistics, data science, system analysis, information technologies, machine learning, cyber security.
By Serhii Vladov Oleksandr Muzychuk Victoria Vysotska Alexey Yurko Dmytro Uhryn
DOI: https://doi.org/10.5815/ijigsp.2024.05.04, Pub. Date: 8 Oct. 2024
The article is devoted to the modified multidimensional Kalman filter with Chebyshev points development to solve the task of diagnosing and parring off failures in the measurement channels of complex dynamic objects automatic control system, which will provide a more accurate and reliable assessment of system state in the presence of outliers in the data. An implementation of the proposed modified multidimensional Kalman filter with Chebyshev points is proposed in the form of a modified recurrent neural network containing a failure diagnostics layer, a failure parry layer, a filtering and smoothing layer, and a results aggregation layer. This structure of the modified recurrent neural network made it possible to solve the main problems of the method of diagnosing and parring off failures of the measuring channels of complex dynamic objects automatic control system, such as diagnosing failures with an accuracy of 0.99802, fending off failures with an accuracy of 0.99796, and assessing the state of the system with an accuracy of 0.99798. It is proposed to use a modified loss function of a recurrent neural network as a general loss function for diagnostics, fault restoring and system state assessment, which makes it possible to avoid retraining when there are a large number of parameters or insufficient data. It has been experimentally proven that the loss function remains stable on both the training and validation data sets for 1000 training epochs and does not go beyond –2.5 % to +2.5 %, which indicates a low-risk overtraining or undertraining of the model. It has been experimentally confirmed that the use of a modified recurrent neural network in solving the task of diagnosing and parring off failures of the measuring channels of complex dynamic objects automatic control system is appropriate in comparison with a radial basis functions neural network and a multidimensional Kalman filter without a neural network implementation, based on metrics such as the root mean square deviation, mean absolute error, mean absolute percentage error, coefficient of determination for the accuracy of reproducing previous data, and coefficient of determination for the accuracy of predicting future values. For example, the value of the standard deviation of the modified recurrent neural network is 0.00226, which is 1.65 times less than the radial basis function neural network and 2.20 times less than the multidimensional Kalman filter without a neural network implementation.
[...] Read more.By Victoria Vysotska Denys Shavaiev Michal Gregus Yuriy Ushenko Zhengbing Hu Dmytro Uhryn
DOI: https://doi.org/10.5815/ijmecs.2024.05.05, Pub. Date: 8 Oct. 2024
The growing use of social networks and the steady popularity of online communication make the task of detecting gender from posts necessary for a variety of applications, including modern education, political research, public opinion analysis, personalized advertising, cyber security and biometric systems, marketing research, etc. This study aims to develop information technology for gender voice recognition by sound based on supervised learning using machine learning algorithms. A model, methods and means of recognition and gender classification of voice speech samples are proposed based on their acoustic properties and machine learning. In our voice gender recognition project, we used a model built based on the neural network using the TensorFlow library and Keras. The speaker’s voice was analysed for various acoustic features, such as frequency, spectral characteristics, amplitude, modulation, etc. The basic model we created is a typical neural network for text classification. It consists of the input layer, hidden layers, and the output layer. For text processing, we use a pre-trained word vector space such as Word2Vec or GloVe. We also used such techniques as dropout to prevent model overtraining, such activation functions as ReLU (Rectified Linear Unit) for non-linearity, and a softmax function in the last layer to obtain class probabilities. To train a model, we used the Adam optimizer, which is a popular gradient descent optimization method, and the “sparse categorical cross-entropy” loss function, since we are dealing with multi-class classification. After training the model, we saved it to a file for further use and evaluation of new data. The application of neural networks in our project allowed us to build a powerful model that can recognize a speaker’s gender by voice with high accuracy. The intelligent system was trained using machine learning methods with each of the methods being analysed for accuracy: K-Nearest Neighbours (98.10%), Decision Tree (96,69%), Logistic Regression (98.11%), Random Forest (96.65%), Support Vector Machine (98.26%), neural networks (98.11%). Additional techniques such as regularization and optimization can be used to improve model performance and prevent overtraining.
[...] Read more.By Yevgen Burov Victoria Vysotska Lyubomyr Chyrun Yuriy Ushenko Dmytro Uhryn Zhengbing Hu
DOI: https://doi.org/10.5815/ijieeb.2024.05.01, Pub. Date: 8 Oct. 2024
The use of ontological models for intelligent systems construction allows for improved quality characteristics at all stages of the life cycle of a software product. The main source of improvement in quality characteristics is the possibility of reusing the conceptualization and code provided by the corresponding models. Due to the use of a single conceptualization when creating various software products, the degree of interoperability and code portability increases. The new-generation electronic business analytics systems implementation is based on the use of active models for business processes (BP). Such models, on the one hand, reflect the BPs taking place in the organization on a real-time scale, and on the other hand, embody corporate and other regulatory rules and restrictions and monitor their compliance. The purpose of this article is to research the methods of presenting and building active executable BP models, determining the methods of their execution and coordination, and building the resulting intelligent network of BP models. In the process of its implementation, such a network ensures the implementation, support of decision-making and compliance with regulatory rules in the relevant real BPs. A formal specification of an intelligent system for modelling a complex of BPs of the enterprise using models has been proposed. A hierarchical approach to the introduction of intelligent functions into the modelling system has been proposed. The simulation system is designed to be used for the design and management of complex intelligent systems. Achieving the set goal involves solving several development tasks: methods of presenting BP models for different types of such models; methods of analysis and display of time relations and attributes in BP models; ways of presenting the association of artefacts, and business analytics models with individual BP operations; metric ratios for evaluating the quality of process execution; methods of interaction of various BPs and coordination of their implementation. The purpose of functioning an intelligent model-driven software system is achieved through the interaction of a large number of simple models. At the same time, each model encapsulates a certain aspect of the expert's knowledge about the subject area. To apply executable conceptual models in the field of modelling BPes, it is necessary to determine the types of conceptual models used, their purpose and functions, and the role they play in the operation of an intelligent system. Models used in modelling BPes can be classified according to various characteristics. At the same time, the same model can be included in different classifications.
[...] Read more.By Victoria Vysotska Krzysztof Przystupa Lyubomyr Chyrun Serhii Vladov Yuriy Ushenko Dmytro Uhryn Zhengbing Hu
DOI: https://doi.org/10.5815/ijcnis.2024.05.06, Pub. Date: 8 Oct. 2024
A new method of propaganda analysis is proposed to identify signs and change the dynamics of the behaviour of coordinated groups based on machine learning at the processing disinformation stages. In the course of the work, two models were implemented to recognise propaganda in textual data - at the message level and the phrase level. Within the framework of solving the problem of analysis and recognition of text data, in particular, fake news on the Internet, an important component of NLP technology (natural language processing) is the classification of words in text data. In this context, classification is the assignment or assignment of textual data to one or more predefined categories or classes. For this purpose, the task of binary text classification was solved. Both models are built based on logistic regression, and in the process of data preparation and feature extraction, such methods as vectorisation using TF-IDF vectorisation (Term Frequency – Inverse Document Frequency), the BOW model (Bag-of-Words), POS marking (Part-Of-Speech), word embedding using the Word2Vec two-layer neural network, as well as manual feature extraction methods aimed at identifying specific methods of political propaganda in texts are used. The analogues of the project under development are analysed the subject area (the propaganda used in the media and the basis of its production methods) is studied. The software implementation is carried out in Python, using the seaborn, matplotlib, genism, spacy, NLTK (Natural Language Toolkit), NumPy, pandas, scikit-learn libraries. The model's score for propaganda recognition at the phrase level was obtained: 0.74, and at the message level: 0.99. The implementation of the results will significantly reduce the time required to make the most appropriate decision on the implementation of counter-disinformation measures concerning the identified coordinated groups of disinformation generation, fake news and propaganda. Different classification algorithms for detecting fake news and non-fakes or fakes identification accuracy from Internet resources ana social mass media are used as the decision tree (for non-fakes identification accuracy 0.98 and fakes identification accuracy 0.9903), the k-nearest neighbours (0.83/0.999), the random forest (0.991/0.933), the multilayer perceptron (0.9979/0.9945), the logistic regression (0.9965/0.9988), and the Bayes classifier (0.998/0.913). The logistic regression (0.9965) the multilayer perceptron (0.9979) and the Bayesian classifier (0.998) are more optimal for non-fakes news identification. The logistic regression (0.9988), the multilayer perceptron (0.9945), and k-nearest neighbours (0.999) are more optimal for identifying fake news identification.
[...] Read more.By Serhii Vladov Ruslan Yakovliev Victoria Vysotska Dmytro Uhryn Artem Karachevtsev
DOI: https://doi.org/10.5815/ijisa.2024.04.01, Pub. Date: 8 Aug. 2024
The work is devoted to the development of the radial basis functions (RBF networks) neural network new architecture – a polymorphic RBF network in which the one-dimensional radial basis functions (RBFs) in the hidden layer instead, multidimensional RBFs are used, which makes it possible to better approximate complex functions that depend on several independent variables. Moreover, in its second layer, the summing the RBF outputs one by one from each group instead, multiplication is used, which allows the polymorphic RBF network to better identify relations between independent variables. Based on the training classical RBF networks evolutionary algorithm, the polymorphic RBF network training algorithm was created, which, through the initializing weight coefficients methods use taking into account the tasks structure and preliminary values, using the mutations tournament selection, adding additional criteria to the fitness function to take into account stability and speed training a polymorphic RBF network, as well as using an evolutionary mutation strategy, allowed us to obtain the lowest errors in training and testing a polymorphic RBF network compared to known RBF network architectures. The created polymorphic RBF network practical application possibility is demonstrated experimentally using the helicopters turboshaft engines (using the example, the TV3-117 turboshaft engine) operating process parameters optimizing solving task using a multicriteria optimization algorithm. The optimal Pareto front was obtained, which made it possible to obtain the engine operation three additional modes: maximum reduction of specific fuel consumption at the total pressure in the compressor increase degree increased value by 5.0 %, specific fuel consumption minimization at the total pressure in the compressor increase degree reduced value by 1.0 %, the total pressure in the compressor increases degree optimal value with a slight increase in specific fuel consumption by 10.5 %. Future research prospects include adapting the developed methods and models into the general concept for monitoring and controlling helicopter turboshaft engines during flight operations. This concept is implemented in the neural network expert system and the on-board automatic control system.
[...] Read more.By Serhii Vladov Ruslan Yakovliev Victoria Vysotska Dmytro Uhryn Yuriy Ushenko
DOI: https://doi.org/10.5815/ijcnis.2024.04.05, Pub. Date: 8 Aug. 2024
This work focuses on developing a universal onboard neural network system for restoring information when helicopter turboshaft engine sensors fail. A mathematical task was formulated to determine the occurrence and location of these sensor failures using a multi-class Bayesian classification model that incorporates prior knowledge and updates probabilities with new data. The Bayesian approach was employed for identifying and localizing sensor failures, utilizing a Bayesian neural network with a 4–6–3 structure as the core of the developed system. A training algorithm for the Bayesian neural network was created, which estimates the prior distribution of network parameters through variational approximation, maximizes the evidence lower bound of direct likelihood instead, and updates parameters by calculating gradients of the log-likelihood and evidence lower bound, while adding regularization terms for warnings, distributions, and uncertainty estimates to interpret results. This approach ensures balanced data handling, effective training (achieving nearly 100% accuracy on both training and validation sets), and improved model understanding (with training losses not exceeding 2.5%). An example is provided that demonstrates solving the information restoration task in the event of a gas-generator rotor r.p.m. sensor failure in the TV3-117 helicopter turboshaft engine. The developed onboard neural network system implementing feasibility on a helicopter using the neuro-processor Intel Neural Compute Stick 2 has been analytically proven.
[...] Read more.By Victoria Vysotska Andrii Berko Yevhen Burov Dmytro Uhryn Zhengbing Hu Valentyna Dvorzhak
DOI: https://doi.org/10.5815/ijieeb.2024.04.05, Pub. Date: 8 Aug. 2024
The purpose of the research is to develop mathematical models, solution methods and layouts of tools for problems solving of integrating information resources and creation of intelligent systems of business analytics based on effective models. These problems can be solved by automating the business processes execution and introducing artificial intelligence components into the business processes management systems. It can be said that the essence of the modern stage of the business processes modelling systems development is the transition from mainly manual (or with the use of auxiliary software) methods of business processes analysis to mainly automatic management of the business processes execution, construction of intelligent business processes networks in the interconnected conceptual models’ set form that encapsulate knowledge about the structure, the business processes features, system events, limitations and dependencies and are processed by machine. Decision-making powers are delegated to such information system in clearly defined (most often simple, routine) situations. So, in this way, it is possible to form the information resource of intelligent systems of business analytics as a single coherent set of data, suitable for use in solving a wide range of multifaceted problems. The integration approach of forming information resources has certain advantages over other approaches, in particular, regarding the information resources of intelligent systems of business analytics. The use of integration as a means of forming a set of consistent data has certain advantages, namely, it allows: combine data of different formats, content and origins in a single, consistent set; combine data without converting them to a single format, which is especially important when such conversion is difficult or impossible; creates virtual custom images of data that do not depend on their real appearance; creates opportunities to operate both real physical and virtual data in their combination; dynamically supplement, change and transform both the data itself and their descriptions; to provide uniform methods and technologies of perception and application of a large amount of various data.
[...] Read more.By Vitaliy Danylyk Victoria Vysotska Vasyl Andrunyk Dmytro Uhryn Yuriy Ushenko
DOI: https://doi.org/10.5815/ijcnis.2024.03.09, Pub. Date: 8 Jun. 2024
In the modern world, the military sphere occupies a very high place in the life of the country. At the same time, this area needs quick and accurate solutions. This decision can greatly affect the unfolding of events on the battlefield and indicate that they must be used carefully, using all possible means. During the war, the speed and importance of decisions are very important, and we note that the relevance of this topic is growing sharply. The purpose of the work is to create a comprehensive information system that facilitates the work of commanders of tactical units, which organizes the visualization and classification of aerial objects in real-time, the classification of objects for radio-technical intelligence, the structuring of military information and facilitates the perception of military information. The object of research/development is a phenomenon that creates a problematic problem, has the presence of slowing factors in the process of command and control, using teams of tactical links, which can slow down decision-making, as well as affect their correctness. The research/development aims to address emerging bottlenecks in the command-and-control process performed by tactical link teams, providing improved visualization, analysis and work with military data. The result of the work is an information system for processing military data to help commanders of tactical units. This system significantly improves on known officer assistance tools, although it includes a set of programs that have been used in parallel on an as-needed basis. Using modern information technologies and ease of use, the system covers problems that may arise for commanders. Also, each program included in the complex information system has its degree of innovation. The information system for structuring military information is distinguished by the possibility of use on any device. The information system for the visualization and clustering of aerial objects and the information system for the classification of objects for radio technical intelligence are distinguished by their component nature. This means that the application can use sources of input information and provides an API to use other processing information. Regarding the information system for integration into information materials, largely unknown terms and abbreviations are defined, so such solutions, cannot integrate the required data into real documents. Therefore, using this comprehensive information system, the command of tactical units will have the opportunity to improve the quality and achieve the command-and-control process.
[...] Read more.By Oleksandr Mediakov Victoria Vysotska Dmytro Uhryn Yuriy Ushenko Cennuo Hu
DOI: https://doi.org/10.5815/ijmecs.2024.01.03, Pub. Date: 8 Feb. 2024
The article develops technology for generating song lyrics extensions using large language models, in particular the T5 model, to speed up, supplement, and increase the flexibility of the process of writing lyrics to songs with/without taking into account the style of a particular author. To create the data, 10 different artists were selected, and then their lyrics were selected. A total of 626 unique songs were obtained. After splitting each song into several pairs of input-output tapes, 1874 training instances and 465 test instances were obtained. Two language models, NSA and SA, were retrained for the task of generating song lyrics. For both models, t5-base was chosen as the base model. This version of T5 contains 223 million parameters. The analysis of the original data showed that the NSA model has less degraded results, and for the SA model, it is necessary to balance the amount of text for each author. Several text metrics such as BLEU, RougeL, and RougeN were calculated to quantitatively compare the results of the models and generation strategies. The value of the BLEU metric is the most diverse, and its value varies significantly depending on the strategy. At the same time, Rouge metrics have less variability and a smaller range of values. In total, for comparison, we used 8 different decoding methods for text generation supported by the transformers library, including Greedy search, Beam search, Diverse beam search, Multinomial sampling, Beam-search multinomial sampling, Top-k sampling, Top-p sampling, and Contrastive search. All the results of the lyrics comparison show that the best method for generating lyrics is beam search and its variations, including ray sampling. The contrastive search usually outperformed the usual greedy approach. The top-p and top-k methods do not have a clear advantage over each other, and in different situations, they produced different results.
[...] Read more.By Oleh Prokipchuk Victoria Vysotska Petro Pukach Vasyl Lytvyn Dmytro Uhryn Yuriy Ushenko Zhengbing Hu
DOI: https://doi.org/10.5815/ijmecs.2023.03.06, Pub. Date: 8 Jun. 2023
The article develops a technology for finding tweet trends based on clustering, which forms a data stream in the form of short representations of clusters and their popularity for further research of public opinion. The accuracy of their result is affected by the natural language feature of the information flow of tweets. An effective approach to tweet collection, filtering, cleaning and pre-processing based on a comparative analysis of Bag of Words, TF-IDF and BERT algorithms is described. The impact of stemming and lemmatization on the quality of the obtained clusters was determined. Stemming and lemmatization allow for significant reduction of the input vocabulary of Ukrainian words by 40.21% and 32.52% respectively. And optimal combinations of clustering methods (K-Means, Agglomerative Hierarchical Clustering and HDBSCAN) and vectorization of tweets were found based on the analysis of 27 clustering of one data sample. The method of presenting clusters of tweets in a short format is selected. Algorithms using the Levenstein Distance, i.e. fuzz sort, fuzz set and Levenshtein, showed the best results. These algorithms quickly perform checks, have a greater difference in similarities, so it is possible to more accurately determine the limit of similarity. According to the results of the clustering, the optimal solutions are to use the HDBSCAN clustering algorithm and the BERT vectorization algorithm to achieve the most accurate results, and to use K-Means together with TF-IDF to achieve the best speed with the optimal result. Stemming can be used to reduce execution time. In this study, the optimal options for comparing cluster fingerprints among the following similarity search methods were experimentally found: Fuzz Sort, Fuzz Set, Levenshtein, Jaro Winkler, Jaccard, Sorensen, Cosine, Sift4. In some algorithms, the average fingerprint similarity reaches above 70%. Three effective tools were found to compare their similarity, as they show a sufficient difference between comparisons of similar and different clusters (> 20%).
The experimental testing was conducted based on the analysis of 90,000 tweets over 7 days for 5 different weekly topics: President Volodymyr Zelenskyi, Leopard tanks, Boris Johnson, Europe, and the bright memory of the deceased. The research was carried out using a combination of K-Means and TF-IDF methods, Agglomerative Hierarchical Clustering and TF-IDF, HDBSCAN and BERT for clustering and vectorization processes. Additionally, fuzz sort was implemented for comparing cluster fingerprints with a similarity threshold of 55%. For comparing fingerprints, the most optimal methods were fuzz sort, fuzz set, and Levenshtein. In terms of execution speed, the best result was achieved with the Levenshtein method. The other two methods performed three times worse in terms of speed, but they are nearly 13 times faster than Sift4. The fastest method is Jaro Winkler, but it has a 19.51% difference in similarities. The method with the best difference in similarities is fuzz set (60.29%). Fuzz sort (32.28%) and Levenshtein (28.43%) took the second and third place respectively. These methods utilize the Levenshtein distance in their work, indicating that such an approach works well for comparing sets of keywords. Other algorithms fail to show significant differences between different fingerprints, suggesting that they are not adapted to this type of task.
By Vasyl Lytvyn Victoria Vysotska Ivan Peleshchak Ihor Rishnyak Roman Peleshchak
DOI: https://doi.org/10.5815/ijisa.2018.04.02, Pub. Date: 8 Apr. 2018
Time-frequency and time dependence of the output signal morphology of nonlinear oscillator neuron based on Van der Pol model using analytical and numerical methods were investigated. Threshold effect neuron, when it is exposed to external non-stationary signals that vary in shape, frequency and amplitude was considered.
[...] Read more.Subscribe to receive issue release notifications and newsletters from MECS Press journals