Intelligent cluster connectionist recommender system using implicit graph friendship algorithm for social networks

Received Jan 26, 2020 Revised Mar 20, 2020 Accepted Apr 12, 2020 Implicit clusters are formed as a result of the many interactions between users and their contacts. Online social platforms today provide special linktypes that allows effective communication. Thus, many users can hardly categorize their contacts into groups such as “family”, “friends” etc. However, such contact clusters are easily represented via implicit graphs. This has arisen the need to analyze users’ implicit social graph and enable automatic add/delete of contacts from and unto a user’s group through a suggestion algorithm. This will make the group creation process dynamic (instead of static, where users manually add and/or remove users on their contact list). The study implements the friend suggest algorithm, which analyzes a user’s implicit social graph to create custom contact group using an interaction-based metric to estimate a user’s affinity to his contacts and groups. Algorithm starts with a small seed-set of contacts – already categorized by the user as friends/groups; And, then suggest other contacts to be added to a group. The result inherent demonstrates the importance of both the implicit group relationships and the interaction-based affinity in suggesting friends.


INTRODUCTION
The main benefit of communication (online over offline) is that they validate communication amongst a cluster of people, rather than restrict communication to be peer-to-peer. Email allows support for group conversations as well as the sharing of other feats such as photo, links and collaborative document [1]. Despite the prevalence of group communication, users do not often spend time to create and maintain custom contact groups. A survey of mobile phones users in Europe showed that only 16% of users created such custom contact [2]. Various social sites and networks seem to provide users with exclusive relationship and link types with their contacts. Thus, it is common practice that some users will identify others as friendseven when they do not practically know or trust them. Treating all contacts in same manner, not differentiating one user from another is unsafe and restrictive practice, since users need some time to share data and interact with members of their personal (social) networks. Many users have quit many groups/social platforms when their family, friends, superiors or subordinates are online [3].
The most common means of modelling relationship on social networks is via graphs. A graph is a symbolic representation of a network and of its connectivity. It is an abstraction of reality, simplified as a set  ISSN: 2252-8938 Int J Artif Intell, Vol. 9, No. 3, September 2020: 497 -506 498 of linked vertices or nodes. Mathematically, a graph is a set of vertex V, connected by edges E, and denoted as G = (V, E). Graph origin can be traced to Leonhard Euler's problem of the "Seven bridges of Konigberg" in 1735. The problem addressed a user who crossed all the bridges only once, in a continuous sequence. It has since then been modified with various improvements [4][5]. A social graph seeks to interconnect users via relationships on a social (usually online) network [6]. Social graphs are a discrete graph containing vertices called users and its edges describes relationships between these entities [7][8]. A social network determines the relationship between entities by their means of interaction and its recency.

Literature Review: The Mathematics of Social Graphs
Graphs have become the dominant life-form of many tasks. It advances the study of graphs and its applications as successfully applied to many disciplines [7]. A graph has vertices connected together by edges to yield system that represents interactions for the task. Social graphs consist of vertices representing actors or agents; while, edges represent weighted relationships shared between these actors. Thus, graph analysis studies nodal relationships via the formation of node clusters as it effects the system. A powerful role of graphs (as networks) is to bridge local feats that exist in nodes as they blossoms into patterns that helps explain how nodes and their corresponding edges impacts a complex effect that ripple via the graph [9-10]. A graph is graphically represented as network of nodes, and mathematically denoted as G = (V,E,w)where v represents a set of vertices (nodes, actors, agents), E represents the set of links/relations and arcs; while, (optional) w represents weights of each node. Each node i  V has a set of ties m  E that is either self-linked, single-linked or multi-linked. The links are either weak or strong stressing the social relationship between such nodes as measured via dyads D [11].
There are two kinds of social graph namely in the light of social discuss namely: explicit and implicit graph relationship. The implicit graph describes an interaction between users, their contacts and group of contacts. It defines a graph whose vertices are not represented as explicit data objects in memory; But rather, are determined algorithmically from some more concise inputs. It is also a graph whose edges are weighed by feats such as frequency, recency, and direction of interaction between users and their contacts and their group of contacts. They are used to identify clusters of contacts who form groups that are meaningful and useful to each user [1]. Conversely, explicit graphs are such that two-individuals deliberately and mutually describe their connection with one another. Thus, such graphs can be mined more easily, since they begin with hard data, and not algorithms that will be hard for competitors to replicate in the future. It is best understood as truly personal and social [12]. Thus, explicit graphs are rare and examples include Facebook and LinkedIn.
Generally, groups change dynamically as new users are added to multi-party communication threads; while, others are also removed. Thus, a person's individual relationship dynamically evolve and changes over time as a friend becomes one's family, a colleague becomes a friend, a friend becomes a colleague etc. The need to consistently update all the relationships a user has with his/her contacts, require constant maintenance. This is quite tedious and time consuming [3]. Previous studies have used many modelsand at the frontier of these is the friend suggest algorithm, which assists users in creation of custom contacts that are either implicit or explicit.
Traversing contacts on a social graphs Graph binds nodes together through a predefined model so that a researcher can effectively analyze its entities along some theories to explain the observed patterns within [13][14][15][16]. Thus, it helps to propagate local feats present in nodes that eventually emerge as global patterns. It examine dynamics in relationship between nodes as well as helps to locate all the influential entities within such a networkas it theoretically, allows connection convergence of nodes [11]. A powerful role of graph is to bridge local underlying features as they blossoms into global network patterns. Thereby, explaining how simple nodal relationships impact on complex effect, which eventually ripples throughout the network cum population system. Each node, shapes the graph's evolution as the need arises. Thus, social graphs aim at two goals namely: (a) better understand how networks evolve, and (b) study dependent social processes like innovation diffusion and data retrieval via models to specify how local interaction of agents' feats are explored to a global pattern [5,8,11].

The implicit friendship suggest algorithm
Extending [1] based on graph theories in [8], we explore the friend suggest algorithm, which probes the presence of implicit clustering in a user's egocentric network observing groups of contacts who are frequently present as co-recipients in same email threads. FSA functions within the egocentric networkto suggest based only on user's local data so as to protect user privacy and avoid exposing connections between the user's contacts that may not otherwise have been identified to him. Inputs to the FSA is a seedset (one or more contacts that belong to a particular group)and is characterized by the user picking a few contacts. Given this, the FSA finds the contacts in the user's egocentric network that are related to the seed (contacts present in the same implicit cluster). Friends suggest also returns a score for each suggested contact, indicating the goodness of its fit to the existing seed. The FSA is applicable to problems of group clustering in which the interaction is based on social graph.

Statement of problem
The following problems are to be addressed: 1. Manual creation of groups from user contacts is quite time consuming as the user must deliberately identify clusters from his/her contact list so as to create the required groups. 2. The dynamic nature of social groups especially with the addition, deletion and amending of relationships etc, users often manually handle such updates of custom groups. 3. Study seeks to tackle the above problem by using an implicit social graph via a friends suggest algorithm.
The study seeks to implement a friend suggest algorithm via the implicit social graph model that will automatically help users create custom contact in their phonebook. This will intelligently help users eliminate time spent to manually create their custom contact graph. Specific goals includes: (a) to describe an interaction-based metric for estimating a user's affinity to his contacts and groups [17], (b) implement the friend suggestion algorithm via a user's implicit social graph and thus, generate a friend group, given a small seed set of contacts which the user's has already categorized as friends and also to suggest friends to expand the seed set of that group, and (c) demonstrate the importance of both the implicit group relationship and the interaction-based affinity ranking in suggesting friends [18].

Data gathering
The study uses the Enron Email Corpus Dataset to implement the Friend Suggest Algorithm (FSA). It is a large collection of email messages of the Enron Corporation, collected in Houston in 2002 by Joe Barting, to analyze contractors of Aspen, whom the Federal Energy Regulatory Commission hired to preserve vast amounts of data collected during the legal investigation of the Enron accounting fraud in December 2001. It contains over 600,000 messages from 150 users [19][20]. For the study, we adopt single user (i.e. an employee) email address -since we seek a user' egocentric network. Each employee folder in the dataset, has other email folders for incoming and outgoing messages. Our chosen employee is Shackleton Sara (chosen based on a balance in the number of incoming and outgoing messages) as shown in Figure 1. The rationale for the adoption of the dataset used, is based on (a) standard email for social network analysis, (b) dataset explores the benefits and tie-strength of users, and (c) the dataset incorporates the various components metrics to be measured for the friend suggest algorithm. Dataset is obtained from [web]: http://www.cs.cmu.edu.

RESEARCH METHOD
The general architecture of the proposed implemented system, are coupled graph theories modeled on [8,[21][22][23][24][25] with a view to extend [1] as in Figure 2: 1. Client Computeris any device through which a user can submit services request to and receive messaging services or other services from the server system. Examples are PCs, mobile devices, mobile phones etc. 2. Communication Network is any wired/wireless local area network and/or wide area network or its combination. 3. SeedSet Selectorseeks to select one or more contacts from the data set (subset) as the seed set. 4. Score Contribution Accumulatoris for generating scores for contacts in the group of contacts, the seed set is sent to the score contribution accumulator which uses data from the user account data-store (e.g. group of contacts) who were recipients or senders of messages to generate score for each of the contacts in the group of contacts. 5. Contact Suggestion Generator generates suggested contacts based on the generated score data. It includes a contact_add suggest function for suggesting contacts to add to messages and a contact_remove suggest function to suggest contacts to remove. It receives all the generated scores to generate suggestions including contacts from the group of contacts. 6. User Account Databasestores user data (messages) that are associated with user account(s) contact data for contacts associated with the user account(s) and generated score data generated by the score contribution accumulator. 7. Interaction Rank Generatorgenerate values that are indicative of the importance of a group of contacts to user. 8. Interaction Rank Databasestores interaction rank score data generated by the Interaction Rank Generator. The score contribution accumulator weighs the score contribution of each group of contacts to the generated score for a contact.

The interaction rank (IR)
IR is computed by summing the number of emails exchanged between a user and a particular implicit group, weighing each email interaction as a function of its recency. Interaction weight decays rapidly over time with half-life λ tunable parameter. IR can be tuned with which is the relative importance of outgoing versus incoming emails. Thus, we compute using (1) as a set of email interactions given by I = {Iout, Iin} as: Iout set of outgoing interactions between a user and a group. From (1), each interaction from the current time has an input of to a group's IR; whereas, an interaction from one half life λ ago subsidizes and so on. IR is associated to the recency metric proposed by [17]. Thus, IR computes the weight of each interaction based on the timestamp; whereas, the recency groups interactions as consecutive direction weighed by exponentially decomposing the scale at which they interact over their ordinal rank. Thus, recency does not take into justification the direction of each interaction. In furtherance, [18] augmented with an edge-weight metric that deliberates the role of the interaction users; But, does not take into account the time of the interaction. IR do not easily allow for assessments across several users. For example, an active user, who sends and receives many emails per day, will have overall higher IR for his implicit groups when compared to a relatively inactive user. Thus, a user's egocentric network has that IR allows for a clean ordering of the user's implicit groups by estimated relationship strength.

The core routine
It is the core algorithm for suggesting contacts that expands a particular seed-set, given a user's contact groups. It is thus: Function EXPANDSEED (u, s) Input: u, the user S, the seed Returns: Ƒ, the friend suggestions The EXPANDSEED function takes as inputs a user, u, who is the mailbox owner of a single egocentric network in the implicit social graph, and a seed S consisting of a set of contacts that make up the group to be expanded. EXPANDSEED returns a set of suggested friends F that maps each suggested contact to a score. Each contact's score shows the algorithm's prediction for how well the given contact expands the seed, in comparison to the other contacts in the user's network. Not all contacts from a user's network is guaranteed to be returned in F. Friend suggestions are computed as follows: The user's egocentric network is extracted from the implicit social graph. The social graph G, is a set of contact groups, where each group g ϵ G is a set of contacts with whom u has exchanged emails. Each group g has an IR indicating the strength of a user's connection to the group g. Thus, the goal of EXPANDSEED is to find, among all the contacts in G, those whose interactions with the user are most similar to the user's interactions with the contacts in the seed S.
The EXPANDSEED function iterates over each group g in G, computing a score for each contact C that is a member of g. The algorithm does not suggest contacts that are already members of the seed S. Scores for each contact are computed iteratively via a helper function, UPDATESCORE, which takes the contact being considered, the contact's score so far, F[c], the seed S, and the group g. In the following section, we discuss several possible scoring heuristics that were considered for UPDATESCORE.

The scoring function
The Update_Score is a function pattern that takes a single contact c, from a user's egocentric network, and an implicit group g to which c belongs, and returns an incremental score based on the group's (g) connection to the seed group S. The sum of Update_Score for a contact c covers all of the implicit groups to which it (c) belongs. Thus, it is an evaluation of c's suitability to expand the seedsince both the implicit groups creating up a user's egocentric network as well as seed group inputted to the Friend Suggest Algorithm, are unordered sets of contacts (which can be compared via standard measures of set connection). However, we choose for this studya set member intersection to yield the various versions of the Update_Score Function, leaving more composite metrics for future survey.
The Intersecting Group Score -Implements Update_Score function that sums the scores of all of the groups to which that belongs, for groups that have a non-empty intersection with the seed. The algorithm is as thus: Intuitively, Intersect_Group_Score finds all contexts in which the contact c exchanged emails (or is co-recipient) with at least one seed group member. But, larger intersections between the members of the seed group and the members of a given implicit group, will seems to indicate a higher degree of similarity.
The Intersect Weighted Score implements the Update_Score and sums the scores of all groups with a non-empty intersection with the seed, weighted by the size of the intersection and a constant k. The larger intersection between the members of the seed group and the members of a given implicit group, is used to generate a larger contribution to the score for each non-seed member of the implicit group.
Function Intersect_Weighted_Score (c, S, g): Input: c, a single contact S is the seed being expanded g, a single contact group Returns: g's contribution to c's score return IR(g) X k|g ∩ s| Intersecting Group Countimplements an Update_Score function that counts the number of groups to which a contact belongs, and for groups that have a non-empty intersection with the seed. Thus, the routine counts the number of groups a contact c belongs to, that have some intersection with the seed S. This metric ignores the Interactions Rank entirely, and treats all implicit groups as having equal value to the user. It also notes the importance of using a seed of contacts to characterize a distinct friend group. And so, we compare its values to the Update_Score instantiation, which ignores the seed and always suggests the topranked contacts. Contact ranks are computed by summing the Interactions Ranks of implicit group containing each contact. In each metric, the final friend suggestion scores are normalized with respect to the highestranked contact, so that a single threshold can be used across all users, to cut off the list of suggested contacts. The algorithm is as thus: Function Intersect_Group_Count (c, S, g) Input: c, a single contact S is the seed being expanded g, a single contact group Returns: g's contribution to c's score if g ∩ S ≠ ∅ return 1 else; return 0 Top Contact Score -This function implements Update_Score routine that seeks to computes the Interactions Rank of a single contact by summing the scores of all of the groups to which that contact belongs. The algorithm is as thus: Function Top_Contact_Score (c, S, g); Input: c, a single contact S is the seed being expanded g, a single contact group Returns: an updated rank for the contact C return IR(g) The system identifies, in historical communications in a user account associated with the user, one or more groups of contacts (e.g., each group of contacts is a group of one or more contacts associated with a particular communication). System generates scores for contacts in identified groups including generating a contact score for a respective contact by accumulating score contributions for a plurality of the identified groups of contacts that includes respective contact (e.g., using Top_Contact_Score function). The system identifies one or more suggested contacts in accordance with the generated scores (e.g., identifying the contacts that have the top ten scores as calculated using the Top_Contact_Score function). After identifying the suggested contacts, system sends contact suggestion based on suggested contacts for display to the user. Wherein the contact suggestion includes suggested contact with a score above defined threshold (e.g., system sends a list of "top contacts" to the user).
Suggesting Contacts to Remove -The algorithm also replace and suggest contacts to remove from a draft message, email or communicationby suggesting removal of a first contact and addition of a second contact that is similar to the first contact. With the Remove_Contact algorithm based on user's egocentric network, system checks to know and ensure if the removal an existing recipients of an implicit group, creates a group with a higher score than the one formed by the existing recipient set. Initially, the system sets the Interactions Rank (if any) for the current recipients of an email as a maximum score. It then, for each contact ci in the current recipient list L, Remove_Contact builds a respective seed-set that includes all of the members of L except ci. For each respective seed-set, system determines an Interactions Rank (if any) for the seed set. And, if Interactions Rank for the seed set is greater than the current maximum score, the Interactions Rank of the seed set is then set as the current maximum scoreso that, the contact c that was removed from the current recipient list L is set as the WrongRecipient. Thus, the system determines a highest Interactions Rank between the IR of the current recipients, and respective Interactions Ranks for each possible set of contacts that is generated by removing a single contact from the current recipients list L. Furthermore, the WrongRecipient list is returned and sent to the client as a suggestion for a contact to remove from the communication.

RESULT, DISCUSSION AND FINDINGS 4.1. Summary output
Tables 1 and 2 shows the interaction rank (i.e. edge weights) between the user and its corresponding contact groups. Only the edge weights of twenty one (21) groups out of 3360 groups of Shackleton Sara email was documented. Tables 1 and 2 posits that as recency weight decay values reduce slowly, it yields an increase in the importance of interaction. Thus, the speed decay is quite slowthereby, given the interaction rank between the user and the group much edge weight. Conversely, an increase in the value of the recency weight decay will reduce the result values since the interaction importance is very far thereby the speed decay of the interaction importance will be very fast; While, Figure 3 shows the user contact grouping in relation to their tie-strenght.   Figure 3. Contact grouping using the user's tie-strength

The edge weight output
Tables 3-6 respectively shows the IR (i.e. weights of edges) between the user and its corresponding implicit contact groups. Only the edge weights of twenty (20) groups out of 3360 groups of Shackleton Sara email was documented. Table 3 shows the Interaction Rank between user and groups at initialization. Table 4 shows change in result value as recency weight decay with other parameters constant. Recency weight decay determines speed at which an interaction's importance fades.   Table 5 differs from Table 6 respectively due to corresponding change in value of ω_out and the other parameter which is the recency weight decay remains constant. Note, ω_out determines relative importance of the outgoing interaction versus incoming interaction. Thus, result above shows that as the value of ω_out increases, the edge weight also increases; Conversely, as ω_out decreases, the edge weight decreasesbecause the outgoing interaction weighs more than incoming interaction exchanged between the user and the group.

CONCLUSION
There has to be effective communication between two entities for an effective relationship. For this to happen, both persons have to be in constant communication. One means in which this can happen is through a social platform such as link editing, photo sharing, and email communication and so on. The FSA helps to connect relationship using an implicit social graph. We detailed the software architecture and its operations. Thus, the FSA is used to generate a friend's group, given a small seed set of labelled contacts as already categorized by the user as friends, colleagues groups; It then also suggest contacts to expand the seed-set. The FSA comprise of the implicit groups created by the user, in the user's egocentric network. The Email data-set used was the Enron Corpus Email dataset and used to determine the interaction rank (edge weight) between a user and his group of contacts. We further describe the benefits and strengths of the FSA by testing its many components. The system was able to determine the edge weight between the user and its group of contacts and also suggested contacts to the user in order to expand the seed set using the various update score functions.