A novel ontology framework supporting model-based tourism recommender

ABSTRACT


INTRODUCTION
Recommender systems make use of machine learning models in their decision making process. These model-based recommender systems often use the vector-based recommender datasets (e.g., MovieLens [1], book-crossing [2]) for measuring performances in experiments. While these datasets are limited in several domains (e.g., movies, books), the graph-based open linked data (e.g., DBpedia [3]) provide data in many fields and have been used as a supplementary data source in recent recommender research [4], [5]. However, the graph nature of open linked data makes it difficult to be consumed by machine learning models and a few domains of recommender datasets are not enough to build real-life specific recommenders (e.g., tourism recommenders). In order to fill this gap, our study focuses on constructing vector-based data for ontological knowledge base and generating tourism recommendation items based on the use of these vectors.
In this paper, we introduce a novel ontological framework that supports model-based tourism recommender in generating top-K personalized recommendations. To be more specific, we design a tourism ontology for machine learning so-called tourism ontology for machine learning (TOML) which captures knowledge of tourism domain and also integrates with outsource knowledge bases (e.g. DBpedia or local databases). Furthermore, we construct the semanticvector class to encode every entity's properties in numerical vector space. Algorithms are proposed to quantify dimensional values for each instance of semanticvector. The recommendation engine is designed to generate top-K recommendations based on the Int J Artif Intell ISSN: 2252-8938 A novel ontology framework supporting model-based tourism recommender (Ho Quoc Dung) 1061 calculation of semantic similarity or the use of supervised learning models. Two experiments are conducted and the experimental results confirm the feasibility of our proposed framework.
The rest of this paper is organized as: Section 2 describes the related work. In section 3, the TOML, the architecture of TOML-based tourism recommender and its decision-making process are presented. Section 4 draws the experiments and discusses the results. Finally, section 5 gives the conclusion and states the future work.

RELATED WORK
In this section, we analyze the recent methods of tourism recommenders including machine learning and semantic web based approaches. The reviews of recommender systems and tourism recommenders are out of the scope of this study and can be found in the following surveys [6], [7], respectively. Traditionally, collaborative filtering, contentbased filtering and hybrid methods are dominant approaches to recommender systems. The strength and weakness of these methods are analyzed in [6]. Besides, machine learning is also applied in recommenders for giving personalized recommendations. Specifically, classification methods that are widely used in making recommendations are support vector machine (SVM) [8], k-nearest neighbors (kNN) [9], artificial neural network (ANN) [10], decision tree [11] or ensemble method [12] to name a few. In the domain of tourism recommenders, traditional methods [13] and machine learning methods [14] are also introduced to the literature. Both traditional recommender methods and machine learning-based methods are data dependent. This means that the quantity and the quality of data decide the performance of recommender systems.
However, the lack of data often occurs in recommender studies. This is the root of the cold-start problem of recommender [15]. In order to support recommenders in building its prediction model, researchers have used supplementary datasets to overcome this difficulty. In which, open linked data has been adopted as the modern approach [15]. The use of open linked data and reasoning techniques of semantic web technology are also found in tourism recommenders [4] and in other kinds of recommenders [5]. As a result, combining machine learning with open linked data and semantic web technology has become a rising trend in recommender studies. In this paper, our target not only provides a new hybrid framework but also presents a new ontology to tourism recommender. The rest of this section reviews the recent studies with a focus on: i) ontology engineering methodologies; ii) ontologies for the tourism industry; and iii) discussion about the distinct characteristics of our proposed approach.

Ontology engineering methodologies
In 2001, Berner-Lee et al. [16] proposed the Semantic Web initiative which highlights the key role of ontology as an efficient way to capture domain knowledge in machine understandable format. Since then, the research trend named ontology engineering, which focuses on methods of developing domain ontology, has been raised. In this research trend, the tutorial of Noy and McGuiness [17] can be seen as one of the most popular methods of ontology building. The authors proposed 7 step method including: i) determine the scope of the domain ontology; ii) reuse existing ontologies; iii) enumerate domain concepts; iv) construct the class and the class hierarchy; v) define the properties of the class-slots; vi) define the facets of the slots; and vii) create instances. Although this method is efficient, it faces the difficulties in ontology evolution and collaborative building of ontology. Therefore, different ontology engineering methods have been presented. For example, Fernandez-Lopez et al. [18] focused on the major subtasks to develop new ontologies and the evolution of ontology throughout its lifetime. In another approach, Sure et al. [19] presented on-to knowledge methodology (OTKM) which takes account of the knowledge processes and the knowledge meta processes. The former process relates to the usage of ontologies, while the latter process makes initial setup. OTKM introduces the ways of integrating ontology in knowledge management applications. The NeON methodology [20] is different from previous methodologies. While previous studies build standalone ontologies, the NeON methodology constructs an ontology network by connecting different existing ontologies through their relationships.

Ontologies for tourism industry
Recently, Semantic Web technology and ontology have been applied to tourism recommenders in many aspects. To be more specific, Antonio Moreno et al. [21] used ontology to capture knowledge of tourism objects and populated the ontological instances with scores. These scores were the inputs of the recommendation algorithm. Lin Shi et al. [22] provided tourism recommendations based on the user's context. In which, ontology was used to describe and integrate tourism resources. Based on this knowledge foundation, the reasoning process was implemented to make personalized recommendations. Grun et al. [4] introduced an ontology-based method to support tourists' decision-making during their pre-trip phase. The authors matched tourists' profiles with characteristics of tourism objects through vector space where each ISSN: 2252-8938 Int J Artif Intell, Vol. 10, No. 4, December 2021: 1060 -1068 1062 dimension is a tourist factor. In another approach, P. Ferraro and L. R. Giuseppe [23] proposed an architecture of a semantically adaptive recommender system assisting users in the travel planning phase and in on-site phase. Hybrid method of tourism recommender was also introduced to the literature in the research of Yan Chu et al. [24]. Firstly, the authors used association rules to find out related users and unrelated users. Secondly, for each group of users, they applied different collaborative filtering algorithms to make recommendations. Finally, the recommendations were expanded by using a tourism ontology.

Discussion
Both recommenders and machine learning models require data which is often in numerical vector format. However, this kind of data is not always available, especially in the research line of tourism recommender. On the other hand, there are many valuable open linked data sources (e.g. DBpedia), which reside under graph-based formats, can efficiently support the recommendation making process. The problem is to transfer directly the graph-based data to numerical vectors in order to serve different machine learning models in predicting user's preferences or generating top-K personalized recommendation lists. To solve this problem, our proposed framework is different from the aforementioned research in the following three aspects. Firstly, we introduce a new tourism ontology based on domain expert collaboration and outsource knowledge integration. Secondly, a Semanticvector concept is used to describe every entity of the ontology in a vector space model. This component provides semantically numerical data for all machine learning tasks including classification and clustering. Thirdly, we present algorithms for the recommendation engine which use directly the semantic numerical data in the recommendation making process. This approach is different from the previous use of other ontologies in the tourism domain.

TOML-BASED RECOMMENDER FRAMEWORK
In this section, we describe our ontological approach to the tourism recommender named TOML. TOML-based recommender framework has three major parts including TOML ontology, methods of populating TOML knowledge base and TOML-based recommendation engine. Figure 1 shows the overall architecture of this framework. In this framework, the TOML ontology was designed through the proposed six-step process which is presented in the subsection 3.1. The TOML knowledge base was enriched by different ways like importing from DBpedia, local databases and tourists' preferences data. The enriching methods are discussed in the subsection 3.2. Subsection 3.3 introduces the recommendation engine of this framework.

TOML ontology
In general, a domain ontology can be defined as in Definition 1. In order to build TOML ontology, we invited tourism expertises and knowledge engineers to work togethers. The working process of this group includes six steps: At first, we adapted the method of [17] for creating the first draft of the knowledge base. Specifically, expertises enumerated the concepts and relations of the tourism domain. Then, knowledge engineers transferred these information to ontology structure using Protégé [25] software. Secondly, the first step was repeated until all of the expertises and engineers reached their consensus. Thirdly, further specific descriptions were added to the ontology (e.g. the SemanticVector class). Fourthly, we enriched the ontological instances by using our local database and importing data from open knowledge-base (e.g. DBpedia) through mapping operations. Fifthly, we iterated over each entity of TOML knowledge base and computed its correspondent semantic vector by using our proposed algorithms. Finally, the ontology was carefully checked by both the expertises and the engineers in order to reach its first version. An excerpt of TOML is shown in Figure 2. In general, TOML has 157 concepts, 65 object properties and 24 data properties. Due to these large numbers of concepts and properties, we describe TOML by summarizing its characteristics and highlight our own contribution in specifying the tourism domain knowledge. Firstly, we develop concepts that relate to tourist, place, service, facility and activity. For example, the concept toml:Tourist is inherited from foaf:Person concept and has three different object properties with toml:City concept including toml:has HomeTown, toml:visited and toml:visits. The toml:Tourist concept plays the key role of our ontology in capturing the knowledge about tourist's personal information (e.g., gender, and name), tourists' preferences through the relation with travel:Activity and its subconcepts.
Secondly, we elaborate and specify more concepts about tourist's activity like toml:Purchase, toml:Listen or toml:Festival to name a fews. These activity concepts are efficient in capturing tourist's preferences. And they are used in the first phase of the recommendation process by linking with other concepts through toml:suggest object property.
Thirdly, every sub-concept of toml:Place, toml:Products or toml:Service has relation with toml:SemanticVector concept. This concept provides the quantitative vector for every entity of the related concept. This vector is the base for any further use of machine learning models or decision making process. We propose specific algorithms to build semantic vectors for every related entity of TOML knowledge base.

ISSN: 2252-8938
Finally, we propose the toml:RecomItem concept to capture one or more recommended things. For example, in case that tourists prefer to buy products, and the products are found in a local market where it is required to use the public transport service to go to, the recommended items for tourists should take account of not only the product itself but also the available public transport service and route guide. This is the different characteristic of tourism recommendation in comparison with other kinds of recommenders like books or movies.

Enriching TOML knowledge base
In order to populate the TOML knowledge base, we imported relevant data from open linked data sources (e.g., DBpedia) and local databases to the TOML knowledge base. The importing process depends on the mapping methods of class and property. In which, correspondent concepts between Dbpedia and TOML were figured out. Similarly, the mapping rules between database tables and TOML ontology were defined. Then, relevant DBpedia entities and their properties were selected by SPARQL queries and were exported to RDF/JSON format. In case of integrating local databases into TOML, the relevant table records were selected by and exported to RDF/JSON files. Finally, these batch files were imported directly to the TOML knowledge base. The pseudo codes of importing data from DBpedia and local database are shown in Figures 3 and 4, respectively.  The primary purpose of TOML knowledge base is to provide data for machine learning models. While machine learning models require inputs as numerical vectors, open linked data (e.g., DBpedia, TOML knowledge base) provide data under graph-based formats (e.g., RDF, OWL). We transferred the property value of an entity by using (1). Then, our solution to building numerical vectors based on available linked data for every TOML's entity applied (1) in pseudo-algorithm of Figure 5. Each property of the entity now plays the role of a dimension in the semantic vector.
where < , , > triples is the total number of triples which have the same subject concept (class) -c, the same property -p and the same property value -e. By implementing the algorithm shown in Figure 5, every entity has its own semantic vector, however, some properties may appear or not in different entities. In other words, different entities may have different vector spaces. Therefore, building the common vector space for all selected semantic vectors is necessary. Firstly, all semantic vectors related to the recommendation task are selected by SPARQL SELECT query. Then, all of the distinct properties are figured out and are sorted in ascending order of property names.
These are the dimensions of the vector space. Finally, for each semantic vector, its original values are filled properly into corresponding dimensions. The rest of dimensions, which are not filled, receive zero values. This procedure is expressed in Figure 6.

TOML-based recommendation engine
TOML-based recommendation strategies were designed to cope with the two popular recommendation cases: (i) with the availability of tourist preference data; and (ii) without the availability of tourist preference data. In case that the tourist preference data is not available, the recommendation strategy is as: Assuming that tourists want to get a top-K recommendation list about a given concept (e.g. place, food or product). First, an entity relating to the recommended concept is randomly selected via a SPARQL SELECT query. This entity should be specified as "famous" in the knowledge base. We use this entity as the starting point and find other (k-1) nearest entities by calculating semantic similarity between this entity and the other entities within the same concept. The Euclidean distance is accepted to compute semantic similarity. The pseudocode of this strategy is shown in Figure 7. In case that tourists provide preference data for creating labeled data, the supervised learning models are applied to generate top-K personalized recommendation items. Different classification models can be plugged into the recommendation engine via parameter input. And the prediction scores were used to rank the top-K recommendation list. Figure 8 shows the pseudocode of this strategy.
Based on the top-K recommendation list, which is generated by algorithms in either Figure 7 or Figure 8, the route planning algorithm is applied to find the shortest path from the tourist's current location to all locations of k suggested items. The location data was stored in TOML knowledge base and Google ISSN: 2252-8938 map API was used to find the location-to-location route. Figure 9 shows the pseudocode of route planning recommendation.

EXPERIMENTS
The experiments were conducted to evaluate the efficiency of TOML knowledge base and its recommendation engine. We developed a prototype in Python programming language which implements all of the algorithms proposed in section 3.1. The tests of user satisfaction and the feasibility of implementing machine learning models with TOML knowledge base were presented in subsections 4.1 and 4.2, respectively.

Experiment 1: building top-K recommendation list without user preference
In this experiment, tourists' preference data is not available. This situation causes the cold-start problem of the recommendation research field. In real word, the tour guides often provide suggestions without having tourists' preferences. Therefore, we decide to compare the top-K recommendation lists yielded by TOML-based prototype to those of tour guides.
The experiment was designed as: Questionnaires were sent to tour guides of 5 different local tourist companies. The survey closed after 2 months and there were 32 tour guides who completed the survey. The tour guides were asked to give top-10 recommendations for foods, places and products of a given city. Due to the complexity of collected data, we summarized the experimental results in Table 1. To be more specific, top-10 place recommendations generated by both TOML and tour guides are visualized in Figure 10 for better understanding of the recommended results.
As shown in Table 1, all of the p-values of three different groups of top-10 recommendations are greater than 0.05. These statistical results imply that there is no difference in personalized recommendation lists between tour guides and TOML-based prototype. In other words, the TOML-based prototype can provide suggestions as good as those of tour guides. Furthermore, the statistical results also indicate that TOML knowledge base has captured experts' domain knowledge efficiently.  Figure 10. Top-10 places recommended by TOML and by tour guides

Experiment 2: Implementing various classifiers with TOML knowledge base
The purpose of this experiment is to demonstrate the ability of TOML knowledge base in terms of providing data for machine learning models. Specifically, labeled data is required to train supervised learning models for predicting which entity should be presented to tourists. However, it was hard to ask tourists to join this experiment. Hence, we invited tour guides joining this experiment under the role of tourists. Each participant figured out which things (places, foods, and products.) she or he likes or dislikes. These preferences were updated to the correspondent entities in TOML knowledge base via the data property toml:hasPreference. This property was also added to the semantic vector as the label dimension. The preference data were associated with the concept toml:Tourist which captures tourist profiles.
There were 6 tour guides who participated in this experiment and constructed 327 records of their preferences. We used three popular classification models including k-NN, Naive Bayes and SVM to predict personalized tourism recommendations. There were three suggestion lists about place, food and product. Each participant evaluated on every suggested item that she or he satisfied or not. Table 2 shows this experiment results.
It is important to emphasize that this experiment does not target at introducing new classifiers with highly predicted capabilities but demonstrating that machine learning models can work well directly with TOML knowledge base. As indicated in Table 2, the averages of satisfied ratios range from 40% to 63.3%, while those of unsatisfied ratios range from 36.7% to 56.7%. The three traditional classifiers reach and overcome the 50% threshold 6 times in total. These results confirm the efficiency usage of TOML-based semantic vectors in machine learning models. k-NN and the promising usage of TOML-based framework. While the results obtained from experiment 1 indicate that TOML knowledge base has captured experts' domain knowledge efficiently, those gained from experiment 2 confirm the efficiency usage of TOML-based semantic vectors in machine learning models.
The future work of this study will focus on building TOML-based web service and integrating TOML ontology with other tourism ontologies in order to enlarge the knowledge base and building TOML-based framework as a backbone of tourism recommendation service.