TRUST ASSESSMENT IN ONLINE SOCIAL NETWORKS by Guangchi Liu A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science MONTANA STATE UNIVERSITY Bozeman, Montana April, 2017 c©COPYRIGHT by Guangchi Liu 2017 All Rights Reserved ii ACKNOWLEDGEMENTS First, I would like to thank my dear parents. Thanks for gifting me life and raising me so well. Thanks for selflessly supporting my decision to go to a foreign country and for my research. Thanks for forgiving my absence while I studied in my academic field. I would thank to my adviser, Dr. Qing Yang. Thanks for coming to me and offering me support at my lowest time. Thanks for the open-minded and rigorous mentoring on my research. Thanks for bringing me to INFOCOM. I would thank to my girlfriend Wenbo Gao. Thanks for being with me, sharing my emotions and caring for me during these years. I would thank my other committee members, Dr. Binhai Zhu, Dr. Mike Wittie and Dr. Brendan Mumey, for their advice on my research. I would thank my friends for the wonderful time we have been sharing at Bozeman. They are Xiaoming Li, Tianbo Liu, Chang Liu, Xiaoyi Yu, Qi Chen, Yi Xu, Ye Liu, Baiqiang Wen and others. I would thank Ms. Jingyi Zhou from Apollo Box, Inc., CA, Dr. Xiaoyu Wang and other colleagues from Stratifyd Inc., NC and Dr. Wenwen Dou from UNCC, NC, for their valuable help and support on my research and career path. In the end, I would thank to the hard and lonely time I have experienced during my research. Thanks for making me strong. Thanks for teaching me to cherish what I own and appreciate the people who helped me. Thanks for letting me learn that there are no Supermen in the world, but men striving to be super. iii TABLE OF CONTENTS 1. INTRODUCTION ........................................................................................1 Problem Statements......................................................................................2 Limitations of Prior Art ................................................................................3 Proposed Approaches....................................................................................5 Key Contributions ........................................................................................7 2. RELATED WORK .......................................................................................9 Definitions of Trust.......................................................................................9 Trust Models in OSNs................................................................................. 10 Topology based Trust Model ................................................................ 10 PageRank based Trust Model............................................................... 11 Probability based Trust Model ............................................................. 11 Subjective Logic based Trust Model ..................................................... 12 Applications of Trust in Online Systems....................................................... 13 Trust in Cloud Computing ................................................................... 14 Trust in P2P Network and Semantic Web ............................................. 14 Trust in Cyber-Physical Systems .......................................................... 14 Trust in Spam Detection and Sybil Defense........................................... 15 Trust in Recommendation and Crowdsourcing Systems.......................... 15 3. THREE-VALUED SUBJECTIVE LOGIC .................................................... 17 Preliminaries .............................................................................................. 17 A Probabilistic Interpretation of Trust ......................................................... 18 Opinion...................................................................................................... 21 Discounting Operation ................................................................................ 22 Combining Operation.................................................................................. 30 Expected Belief of An Opinion .................................................................... 35 4. THE ASSESSTRUST ALGORITHM ........................................................... 39 Properties of Different Opinions................................................................... 40 Arbitrary Network Topology ....................................................................... 45 Differences between 3VSL and SL................................................................ 47 AssessTrust Algorithm ................................................................................ 50 Illustration of the AssessTrust Algorithm ..................................................... 50 Time Complexity Analysis .......................................................................... 53 iv TABLE OF CONTENTS — CONTINUED 5. MASSIVE TRUST ASSESSMENT IN OSNS................................................ 54 Design of OpinionWalk ............................................................................... 55 Opinion Matrix ................................................................................... 56 Individual Opinion Vector.................................................................... 57 Opinion Walk Operation...................................................................... 57 OpinionWalk Algorithm....................................................................... 59 Illustration of the OpinionWalk Algorithm ................................................... 61 Correctness of OpinionWalk ........................................................................ 66 Series Network Topology...................................................................... 67 Parallel Network Topology ................................................................... 68 Arbitrary Topology ............................................................................. 70 Time Complexity Analysis .......................................................................... 73 Differences between AssessTrust and OpinionWalk Algorithms ...................... 74 Distributed OpinionWalk Algorithm ............................................................ 75 6. EVALUATIONS ......................................................................................... 79 Numerical Analysis ..................................................................................... 79 Discounting Operation......................................................................... 79 Combining Operation .......................................................................... 81 Bridge Topology.................................................................................. 82 Survey Experiments .................................................................................... 84 Setup of the Survey Experiments.......................................................... 84 Errors in Discounting and Combining Operations .................................. 86 Experimental Evaluations............................................................................ 87 Dataset............................................................................................... 88 Dataset Preparation ............................................................................ 89 Accuracy of 3VSL Model ..................................................................... 91 Performance of the AssessTrust Algorithm ............................................ 94 Performance of the OpinionWalk Algorithm ........................................ 101 7. CONCLUSION......................................................................................... 104 REFERENCES CITED.................................................................................. 105 v LIST OF TABLES Table Page 6.1 Statistics of the Advogato and PGP datasets ...................................... 89 6.2 Selected parameters (base trust level, total evidence value) for AT, SL* and TT.......................................................................... 94 vi LIST OF FIGURES Figure Page 3.1 Examples of series topologies ............................................................. 23 3.2 Examples of parallel topologies .......................................................... 30 3.3 Combining opinions with high and low uncertainties ........................... 36 4.1 Difference between distorting and original opinions ............................. 40 4.2 Illustration of an arbitrary network topology ...................................... 46 4.3 Difference between 3VSL and SL on the discounting operation............. 48 4.4 Difference between 3VSL and SL on the combining operation .............. 49 4.5 An illustration of 3VSL based on the bridge topology.......................... 51 5.1 A detailed illustration of OpinionWalk ............................................... 58 5.2 A general view of the “opinion walk” operation................................... 60 5.3 Illustration of how OpinionWalk processes the bridge topology ............ 63 5.4 Illustration of two fundamental topologies in an OSN.......................... 67 5.5 Illustration of a network with an arbitrary topology ............................ 71 5.6 A general view of the D-OpinionWalk algorithm ................................. 77 6.1 Influence of belief on discounting operation......................................... 80 6.2 Influence of belief and uncertainty on the discounting operation........... 81 6.3 Influence of total evidence value on combining operation ..................... 82 6.4 Influence of positive evidence value on combining operation ................. 82 6.5 Influence of bridge opinion’s positive/total evidence ratio .................... 83 6.6 Influence of bridge opinion’s total evidence value................................. 83 6.7 Absolute errors in expected belief of the discounting operation............. 86 6.8 Absolute errors in expected belief of the combining operation .............. 87 6.9 CDFs of errors in expected belief of the discounting operation ............. 87 6.10 CDFs of errors in expected belief of the combining operation............... 88 vii LIST OF FIGURES — CONTINUED Figure Page 6.11 F1 scores of 3VSL and SL using the Advogato dataset ........................ 91 6.12 F1 scores of 3VSL and SL using the PGP dataset ............................... 92 6.13 CDFs of α+β in opinions computed by 3VSL and subjective logic using the Advogato dataset........................................................ 93 6.14 F1 scores of the trust assessment results generated by TT, SL* and AT using the Advogato dataset............................................. 94 6.15 F1 scores of the trust assessment results generated by TT, SL* and AT using the PGP dataset ................................................... 95 6.16 Histogram of the errors generated by TT, SL* and AT using the Advogato dataset ........................................................................ 96 6.17 Histogram of the errors generated by TT, SL* and AT using the PGP dataset ............................................................................... 97 6.18 Fitted curves of the error distributions of TT, SL* and AT using the Advogato dataset ............................................................... 98 6.19 Fitted curves of the error distributions of TT, SL* and AT using the PGP dataset ...................................................................... 98 6.20 The CDFs of Kendall’s tau ranking correlation coefficients of different algorithms using the Advogato dataset .................................. 99 6.21 The CDFs of Kendall’s tau ranking correlation coefficients of different algorithms using the PGP dataset....................................... 100 6.22 Execution times of different algorithms (OW, MT, ET, AT, and TT) using the Advogato dataset................................................ 101 6.23 Execution times of different algorithms (OW, MT, ET, AT, and TT) using the PGP dataset....................................................... 102 viii LIST OF ALGORITHMS Algorithm Page 4.1 AssessTrust(G, A, C, H) ................................................................... 51 5.2 OpinionWalk(G, i, H) ....................................................................... 60 5.3 D-OpinionWalk Algorithm that is executed on user j .......................... 76 ix ABSTRACT Assessing trust in online social networks (OSNs) is critical for many applications such as online marketing and network security. It is a challenging problem, however, due to the difficulties of handling complex social network topologies and conducting accurate assessment in these topologies. To address these challenges, we model trust by proposing the three-valued subjective logic (3VSL) model. 3VSL properly models the uncertainties that exist in trust, thus is able to compute trust in arbitrary graphs. We theoretically prove the capability of 3VSL based on the Dirichlet-Categorical (DC) distribution and its correctness in arbitrary OSN topologies. Based on the 3VSL model, we further design the AssessTrust (AT) algorithm to accurately compute the trust between any two users connected in an OSN. AT is able to accurately conduct one-to-one trustworthiness, however, it is inefficient in addressing the massive trust assessment (MTA) problem, i.e., computing one-to-many trustworthiness in OSNs. MTA plays a vital role in OSNs, e.g., identifying trustworthy opinions in a crowdsourcing system. If the AssessTrust algorithm is applied directly to solve the MTA problem, its time complexity is exponential. To efficiently address MTA, we propose the OpinionWalk algorithm that yields an polynomial-time complexity. OpinionWalk uses a matrix to represent a social network’s topology and a vector to store the trustworthiness of all users in the network. The vector is iteratively updated when the algorithm “walks” through the entire network. To validate the 3VSL model, we first conduct a numerical analysis. An online survey system is then implemented to validate the correctness and accuracy of 3VSL in the real world. Finally, we validate 3VSL against two real-world OSN datasets: Advogato and Pretty Good Privacy (PGP). Experimental results indicate that 3VSL can accurately model the trust between any pair of indirectly connected users in the Advogato and PGP. To evaluate the performance of the AssessTrust and OpinionWalk algorithms, we use the same datasets. Compared to the state-of-art solutions, e.g., EigenTrust and MoleTrust, OpinionWalk yields the same order of time complexity and a higher accuracy in trust assessment. 1 INTRODUCTION Online social networks (OSNs) are among the most frequently visited places on the Internet. OSNs help people not only to strengthen their social connections with known friends but also to expand their social circles to friends of friends who they may not know previously. Trust is the enabling factor behind user interactions in OSNs and is crucial to almost all OSN applications. For example, in recommendation and crowdsourcing systems, trust helps to identify trustworthy opinions [9, 108]. In Twitter, spam undermines the trust among users by distributing false links [101], and thus seriously impacts the user experience. In online marketing applications [81], trust is used to identify trustworthy sellers. In a proactive friendship construction system [98], trust enables the discovery of potential friendships. In the networking security domain, trust is considered an important metric to detect malicious users [60,85,99, 100]. In social influence analysis, trust is a key factor in evaluating the impacts of influential users [65,105]. Given the above-mentioned applications, one confounding issue is to what degree can a user trust another user in an OSN. This dissertation studies the fundamental issue of trust assessment in OSNs: given an OSN, how to model and compute trust among users? Trust is traditionally defined as either a rating-based reputation or the probability that a user is benign. In an online marketing system, e.g. Ebay, users rate each other based on their previous interactions, so the trust of a given user is derived from aggregated ratings. In the network security domain, however, trust of a given user is defined as the probability that this user will behave normally in the future. Based on results from previous studies [23,26,73,84], we define trust as the probability 2 that a trustee will behave as expected, from the perspective of a trustor. Here, both trustor and trustee are regular users in an OSN where the trustor is interested in knowing the trustworthiness of the trustee. This general definition of trust makes it applicable for a wide range of applications. We also assume that trust in OSNs is determined by objective evidence, i.e., cognition based trust [4, 21, 41, 43] formed in the absence of interaction experiences, is not considered in the dissertation. Problem Statements This dissertation aims at addressing the fundamental issue of accurately modeling and computing trust in OSNs, which requires us to solve the following three technical problems. • P1: How to model direct trust in online social networks? • P2: How to compute indirect trust in online social networks? • P3: How to conduct massive trust assessment in online social networks? In the first problem, because the trustor and trustee have direct interactions between each other, we call the trust relation between them direct trust. Based on the assumption that trust is determined by objective evidences, this problem can be formulated as follows. P1: Given the interactions between a trustor and trustee, how does one model the trustworthiness of the trustee, from the perspective of the trustor? Solving the second problem will provide a method to calculate the trust between two users who have no previous interactions. As the two users did not interact with each other previously, their trust relation is called indirect trust. Here, we model a trust social network as a directed graph G = (V,E) where a vertex u ∈ V represents 3 a user, and an edge e(u, v) ∈ E denotes the trust relation from u to v. The weight of an edge w(u, v) denotes how much u trusts v, which is usually referred to as the direct trust from u to v. As such, the second problem is formulated as follows. P2: Given a trust social network G = (V,E), ∀ u and v, s.t. e(u, v) 6∈ E and ∃ at least one path from u to v, how does one compute u’s trustworthiness on v, i.e., how should u trust a stranger v? Massive trust assessment (MTA) allows a user to compute the direct/indirect trustworthiness of all other users in an OSN. MTA is important in many applications. For example, the LendingClub, Inc. [2] leverages the trust relations among users in Facebook.com [3] to improve its online peer-to-peer lending service. It offers a mechanism to evaluate the trustworthiness of all potential borrowers, from the perspective of a lender. Clearly, the hinge of this application is to efficiently and accurately compute the trustworthiness of all trustees, from the point of view of the trustor. Therefore, the third problem can be formulated as follows. P3: Given a trust social network G = (V,E), ∀ i and j, s.t. i, j ∈ V , ∃ at least one path from i to j, how does one efficiently compute the trustworthiness of users {j ∈ V, j 6= i}, from the perspective of user i? Limitations of Prior Art Existing trust models can be categorized as topology (or graph) based models [13, 29, 69, 95, 99, 100], PageRank based models [7, 36, 58], probability based models [20, 68, 90], and subjective logic based models [57]. None of them, however, are able to accurately model and compute trust in OSNs. Topology based models [13, 95, 99, 100] treat trust assessment as a community detection problem and employ a random-walk method to identify users within the same community. These users are considered as trustworthy to each other. The key 4 limitation of these models is that the trustworthiness of users within a community is indistinguishable [67], which limits the application of the models. Graph based models [29, 47, 62, 69] assign a real number, ranging from 0 to 1, on each edge in the trust social network, and employ graph searching algorithms to evaluate the trustworthiness between any two users. The major limitation of these models is that trust is represented as a single value, which omits the uncertainty existing in trust and thus is inaccurate in assessing trust. In addition to traditional graph searching algorithms, PageRank based models, e.g., TrustRank and EigenTrust [36,58,59], apply the idea of PageRank to rank users based on their trustworthiness. The trustworthiness of users is obtained by calculating how likely a user can be reached from the trustor within the network. In these models, the probability of reaching a user (from the trustor) is determined by the trust value of the edge connecting to the user. The key limitation of these models is that they mistakenly treat trust propagation in a social network as a random walk process, which is not correct. Probability based models [20, 68, 90] model trust as a probability distribution, i.e., a trustor uses previous interactions with a trustee to construct a probabilistic model to approximate the trustee’s future behavior. The major limitation of these models is that they only focus on direct trust and cannot be applied to compute indirect trust. Although the subjective logic [53, 57] model takes advantage of both graph and probability based models, it can only handle simple network topology. Its performance degrades drastically in a complex network topology that is common in real-world online social networks. 5 Proposed Approaches To address problem P1, we propose the three-valued subjective logic (3VSL) model that is able to accurately model trust based on users’ interactions within an OSN. 3VSL is based on the subjective logic (SL) model [57]. However, it is significantly different from SL. Instead of defining trust as a binary value in SL, 3VSL treats it as a ternary value (i.e., belief, distrust, and uncertainty). In other words, a user in an OSN could be trustworthy, not trustworthy, or uncertain. Therefore, the probability of a user being trustworthy can be modeled by the Dirichlet-Categorical (DC) distribution that is characterized by three parameters α, β and γ. Here, α represents the number of positive interactions/evidence that supports the user is trustworthy. For example, we observed that the user behaved as expected α times in the past. β denotes the amount of negative evidence indicating the user is not trustworthy. γ is the amount of neutral evidence that neither supports nor opposes the user is trustworthy. The reason of introducing the uncertain state in 3VSL is that it can accurately model the trust propagation process in an OSN. During trust propagation, certain evidence measured by α+β is “distorted” and becomes uncertain evidence, measured by γ. Distorted evidence is usual in trust assessment, however, they are totally omitted in SL. To address problem P2, we propose a trust computation algorithm, called AssessTrust (AT), based on 3VSL model. AT decomposes the sub-graph between the trustor and trustee as a parsing tree, which provides the correct order of applying trust propagation and fusion to compute the indirect trust between the trustor and trustee. Here, trust propagation and fusion are modeled by two basic operations: discounting and combining operations. Leveraging the properties of 3VSL, AT is proven to be able to accurately compute the trustworthiness between any two users 6 connected within an OSN. Because 3VSL uses a probability distribution to describe whether a user is trustworthy, AT offers more accurate trust assessment, compared to the topology and graph based solutions. On the other hand, while AT makes use of the social connections between the trustor and trustee to compute their trust, it outperforms the probability based models that are only applicable for direct trust. Experiment results indicate that AT achieves the best accuracy of trust assessment in OSNs. Specifically, AT achieves the F1 scores of 0.7 and 0.75, in trust assessment, using the Advogato and Pretty Good Privacy (PGP) datasets, respectively. AT can also be used to rank users, based on their trustworthiness. We measure the accuracy of the ranking results using the Kendall’s tau coefficients, compared to the ground truth ranking. Experiment results show that AT offers 0.73 and 0.77 kendall’s tau coefficients on average in Advogato and PGP, respectively. Although AT is able to conduct accurate trust assessment between any two users in an arbitrary social network, it is too slow to solve the problem P3. If AT is applied to solve the MTA problem in OSNs, it needs to be executed O(n) times, if the network contains n users. That will yield an O(nk+1) time complexity where k is the network’s diameter that is usually a function of n. Therefore, it is critical to design an algorithm to efficiently compute the trustworthiness of all users in the network, for any given user. Based on the 3VSL and AT algorithm, we propose a polynomial-time algorithm, called OpinionWalk, to efficiently address the MTA problem. In OpinionWalk, we use an opinion matrix to represent a social network’s topology. Elements in the opinion matrix are opinions that indicate the direct trust between users in an OSN. We design a set of matrix operations, called opinion walk, to capture the trust propagation and fusion with the network. Traditional multiplication and summation operations are replaced by the discounting and combining operations defined in 3VSL [66]. We prove the correctness of OpinionWalk and analyze its time complexity. We find 7 that OpinionWalk perfectly implements the 3VSL model and offers a better time complexity, O(n3), in addressing the MTA problem. Experiment results using the Advogato and PGP datasets validate the correctness of OpinionWalk. Key Contributions In this dissertation, we make the following key contributions. First, we propose 3VSL to model the direct and/or indirect trust between two users connected within an OSN. 3VSL differs from prior trust models in that it considers both trust relations and network topologies, and thus is applicable in large-scale OSNs. Second, 3VSL extends SL by introducing a neutral state, distinguishing distorting opinions from original opinions, and redesigning the discounting and combining operations. Third, based on 3VSL, we propose a trust assessment algorithm AT to accurately compute the trust between any two users in an OSN. Fourth, we propose another algorithm, called OpinionWalk, to address the massive trust assessment problem. Fifth, the correctness of OpinionWalk is proven and its time complexity is analyzed. Sixth, to validate the 3VSL model and associated algorithms, we conduct intensive experiments including numerical analysis, online surveys and validation against two real-world datasets, Advogato and PGP [72]. The rest of this dissertation is organized as follows. In chapter 2, the related work is introduced. In chapter 3, we introduce the 3VSL model and define the trust propagation and fusion operations. In chapter 4, we differentiate discounting opinions from original opinions and prove that 3VSL can handle arbitrary network topologies. Based on the model, we further propose the AssessTrust algorithm. In chapter 5, we introduce the OpinionWalk algorithm and prove its correctness and analyze its time complexity. In chapter 6, we validate 3VSL through numerical and experimental evaluations. Furthermore, we evaluate the performance of AT and OpinionWalk using 8 two real-world datasets. In chapter 7, we conclude the dissertation and present a plan for future work. 9 RELATED WORK Definitions of Trust Trust has been widely studied in psychology, sociology and management domains. A widely accepted definition of trust was summarized by Rousseau in [84], based on a cross-disciplinary literature review: “Trust is a psychological state comprising the intention to accept vulnerability based upon positive expectations of the intentions or behaviors of another.” Despite the various definitions of trust [23, 26, 73], they are similar to Rousseau’s, i.e., it can be concluded that trust is composed of two parts: expectation and vulnerability. While the former indicates the probability that the trustee will behave as expected, the latter shows the trustor’s willingness of relying on the trustee. Specifically, the word vulnerability emphasizes the trustor’s concerns about the uncertainty [17,76] of the trustee’s future behaviors. The definition of trust in this dissertation is inspired by the above studies, and we define trust as the probability that the trustee will behave as expected, from the perspective of the trustor. Although trust is commonly confused with reputation, they are two different concepts. Previous works [17,24,46] have identified the positive correlations between reputation and trust. However, reputation is not equivalent to trust. According to the definition from Merriam-Webster dictionary and Wikipedia, reputation is the common opinion that people have about someone or something, i.e., the overall quality or character as seen or judged by people in general. In essence, reputation comes from the public and general opinion. However, trust comes from individual opinions, i.e., from a trustor to a trustee with emphasis on personal interactions. On the other hand, reputation is a summary of past events while trust is the intention and expectation of the future. 10 Trust Models in OSNs Trust is built on the social ties between users and how to model trust in online social networks has attracted more attention in recent OSN studies. Several works exist regarding to modeling trust in social networks. In this section, we briefly introduce these works. Topology based Trust Model Topology based trust models treat a trust social network as a graph, where an edge represents the trust relationship between two neighboring nodes. The advantage of these methods is that they leverage random walk to evaluate trust, and thus can be easily applied in large-scale OSNs. By analyzing network topologies, the works in [13,95,99,100] are able to identify untrustworthy nodes in an OSN. Their fundamental idea is to identify untrustworthy nodes by distinguishing untrustworthy regions from trustworthy regions in the network. Specifically, they play random walk from a trustor and evaluate the probability of reaching a trustee. A low probability indicates that the trustee is not in the trustworthy region, and vice versa. Later on, people began to model indirect trust by considering the trust values between users. In [19], a trust relation between two users is treated as a probabilistic value. All users and their associated trust relations compose a graph. Then, the indirect trust inference problem becomes a network reachability problem. In [109], a trust network is considered a resistor network where the resistance of each edge is derived from the trustworthiness of the edge. In [31, 103], given a trust network, a depth-first search algorithm is employed to compute the trust between any two users. 11 PageRank based Trust Model PageRank based trust models employ the PageRank algorithm [78] to compute the relative trustworthiness of interested users. For example, the EigenTrust algorithm, proposed in a peer-to-peer system [58], starts from a peer and searches for trustworthy peers based on several rules. It moves from peer to peer with a probability that is proportional to the other peer’s trust score, i.e., higher the trust score, higher the moving probability. In this way, EigenTrust will more likely reach trustworthy peers. Later on, the relative trust of web pages is investigated in [36] to identify spam pages. The TrustRank algorithm proposed in [36] again employs the PageRank algorithm on the network to rank the trustworthiness of web pages. Both EigenTrust and TrustRank can be viewed as a variant of the PageRank algorithm that is a well known solution to assigning importance scores to pages on the Internet. These algorithms, however, only generate trust rankings, instead of absolute trust values of peers/pages. Probability based Trust Model Probability based trust models treat direct trust as probability distributions, where a trustor uses past interactions and observations of a trustee to construct a probabilistic model approximating the trustee’s future behavior. The advantage of these models is that trust can be accurately computed based on a wide variety of statistical and probability techniques, including Hidden Markov Chain, Maximum Likelihood Estimation, etc. Many previous efforts were devoted to the study of modeling direct trust between OSN users in a computational way [14,20,68,88,90]. For example, direct trust is modeled as a discrete multinomial distribution in [22]. Therefore, trust assessment becomes a problem of likelihood estimation, regarding to the distribution parameters 12 based on given evidence. If trust is modeled as a discrete binomial distribution (i.e., a user is either trustworthy or not), the likelihood estimation can be performed on the Beta distribution [57]. If trust is modeled as a continuous random variable, Gaussian distribution can be used [14,88] to model non-discrete cases where a possible outcome is a continuous value. The binomial distribution can be further extended to a multinomial distribution to handle the case of multiple discrete random variables [22]. Based on the multi- nomial distribution (including the binomial distribution), Bayesian analysis [14, 88] and Hidden Markov Model (HMM) [20, 68, 90] can be applied in trust assessment. While the former integrates evidence from various sources, e.g., reputation scores and preference similarity, the latter handles the dynamic in trust. Subjective Logic based Trust Model To understand trust in online social networks, Jøsang proposed the subjective logic model in [52,56,57]. Considering a binary trust value, subjective logic assumes the probability of a user being trustworthy follows the Beta distribution. The Beta distribution here can be computed from the numbers of positive and negative evidence, respectively. The advantage of using subjective logic is that trust can be more realistically modeled by considering the uncertainty in a person’s judgment about trust. Such uncertainty exists because it is difficult for a person to determine with absolute certainty whether a person is trustworthy or not. In [37,38,63,91,92,94], the subjective logic model is further refined to improve its accuracy in trust assessment. Subjective logic treats trust as opinions and introduces an algebra for opinion operations, e.g., discounting and consensus operations for trust propagation and fusion, respectively. The consensus operation provides a method for combining 13 possibly conflicting beliefs/opinions to generate a consensus opinion [49]. The consensus opinion reflects all opinions being combined in a fair and equal way. The discounting operation is the operation by which a new trust relationship can be derived from pre-existing trust relationships [54]. For example, if Alice trusts Bob, and Bob trusts Claire, then by trust propagation, Alice will also trust Claire. With the discounting and consensus operations, it is possible to compute the indirect trust between two connected users in OSNs. Besides the two basic discounting and consensus operations, Jøsang further defined the multiplication, co-multiplication, division, and co-division of opinions [55]. Although these operations are irrelevant in modeling trust propagation and fusion, they allow an opinion to be multiplied or divided by another opinion. Later on, subjective logic is extended to support conditional inference [51]. A conditional inference is usually in the form of “IF x THEN y” where x denotes the antecedent and y the consequent proposition. Here, the antecedent x is modeled by subjective logic so that it is not a binary value, true or false. Instead, it is a vector representing the probability that this antecedent is true. Overall, subjective logic was proven to be compatible with binary logic, probability calculus, and classical probabilistic logic [50]. Applications of Trust in Online Systems Along with the rapid development of the Internet and online services, trust has been used in many applications for either improving users’ quality of experience (QoE) or preventing the disturbance of malicious users. In this section, we briefly introduce these applications. 14 Trust in Cloud Computing Recently, trust was introduced in the concept of social cloud. In [75], Mohaisen et al. employ trust as a metric to identify good workers for an outsourcer through her social network. In [77], Moyano et al. proposed a framework to employ trust and reputation for cloud provider selection. In [79], Pietro et al. proposed a multi-round approach, called AntiCheetah, to dynamically assign tasks to cloud nodes, accounting for their trustworthiness. Trust in P2P Network and Semantic Web Trust analysis was first implemented in peer-to-peer (P2P) networks [58,96,104]. In P2P networks, trust is used to evaluate the trustworthiness of a particular resource owner, and thereby identify malicious sources. Trust analysis was also applied to semantic webs [8, 30, 82]. The purpose of analyzing trust in semantic webs is to study the trustworthiness of data with efficient knowledge processing mechanisms. For example, the trustworthiness of web hyperlinks are studied in [36, 61, 71]. Trust analysis is then applied to filter untrustworthy contents in [10,12,15,16,18,28]. Finally, trust was used to evaluate the quality of contents on semantic webs in [27,32, 71, 83, 86,106]. Trust in Cyber-Physical Systems Trust analysis is also introduced in cyber-physical systems (CPS), e.g., wireless sensor networks and vehicular networks [40]. For example, a trust based framework is proposed to secure data aggregation in wireless sensor networks [102], which evaluates the trustworthiness of each sensor node by the Kullback-Leibler (KL) distance to identify the compromised nodes through an unsupervised learning technique. In [64], trust analysis is employed to identify malicious and selfish nodes in a mobile ad 15 hoc network. In addition, Xiaoyan et al. propose a new trust architecture, called situation-aware trust (SAT), to address several important trust issues in vehicular networks, which are essential to overcome the weaknesses of current vehicular network security and trust models [40]. Trust in Spam Detection and Sybil Defense Another important domain in which trust analysis is widely applied is Sybil defense and spam detection [5,25,42,74,87,95]. The goal of these works is to identify forged multiple identities and spam information in OSNs. The basic idea of [5, 95] is to employ random walk to rank the neighbors in a given OSN from a seed node, and extract a trust community composed of high ranking nodes. Then, the users outside the trust community will be considered as not trustworthy, i.e., potential Sybil nodes. In [87], Tan et al. integrated traditional Sybil defense techniques with the analysis of user-link graphs. In [74], Mohaisen et al. proposed a derivation of the random walk algorithm, which employs biased random mechanism, to account for trust and other social ties. In [97], besides graph based features, Yang et al. introduced some other features to identify spammers. In addition, in [25, 42], spam detection approaches based on user similarity and content analysis are studied. Trust in Recommendation and Crowdsourcing Systems In addition to Sybil defense in OSNs, trust analysis is also useful in recom- mendation systems [6, 9, 37, 45, 70, 108]. For example, in [108], Zou et al. proposed a belief propagation algorithm to identify untrustworthy recommendations generated by spam users. In [9], Basu et al. proposed a privacy preserving trusted social feedback scheme to help users obtain opinions from friends and experts whom they trust. In [6], Andersen et al. proposed a trust-based recommendation system that generates 16 personalized recommendations by aggregating the opinions from other users. In addition, five axioms about trust in a recommendation system are studied in [6]. 17 THREE-VALUED SUBJECTIVE LOGIC In this chapter, we propose the three-valued subjective logic (3VSL) model to model the trust between users in OSNs. Designing this model is a challenging task because trust propagation in OSNs is not well understood, although it is widely used in many applications. We address this challenge by modeling trust as a probabilistic distribution over three different states, i.e., belief, distrust, and uncertainty. By looking at how the states of trust change during trust propagation, we redesign the trust discounting operation in subjective logic [57]. In 3VSL, the parameters controlling the probabilistic distribution are determined by the amount of evidence that support each state. The evidence is collected from the interactions between the trustor and trustee. To model trust fusion, we further design the combining operation. Together with the discounting operation, we are able to model and compute the trust between two users that are directly or indirectly connected to each other. Preliminaries To better understand 3VSL, we first briefly introduce the subjective logic [57]. Considering two users A and X, A’s opinion about the trustworthiness of X can be described by an opinion vector : ωAX = 〈αAX , βAX , 2〉 |aAX , where αAX , βAX , 2 denotes the amount of evidence that supports user X is trustworthy, not trustworthy, and uncertain, respectively. Note that the amount of uncertain evidence in an opinion in SL is always 2. aAX is called base rate and formed from an existing impression without solid evidence, e.g. prejudice, preference, 18 or general opinion obtained from hearsay. For example, if A always distrusts/trusts the persons from a certain group where X belongs to, then aAX will be smaller/greater than 0.5. Based on the Beta distribution, two opinions ω1 = 〈α1, β1, 2〉 |a1 and ω2 = 〈α2, β2, 2〉 |a2 can be combined to form a new opinion ω12 = 〈α12, β12, 2〉 |a12 , where α12, β12 and a12 are calculated as follows.  α12 = α1 + α2 β12 = β1 + β2 a12 = a1 + a2 2 . Let A and B denote two persons where ω1 = 〈α1, β1, 2〉 |a1 is A’s opinion about B’s trustworthiness. Assume C is another person where ω2 = 〈α2, β2, 2〉 |a2 is B’s opinion about C. Then, subjective logic applies the discounting operation to compute A’s opinion about C’s trustworthiness ωAC = 〈α12, β12, 2〉 |a12 , where α12, β12 and a12 is calculated as follows.  α12 = α1α2 β2 + α2 + 2 · 2 κ β12 = α1β2 β2 + α2 + 2 · 2 κ a12 = a2 , where κ = 1− α1α2 β2 + α2 + 2 − α1β2 β2 + α2 + 2 . A Probabilistic Interpretation of Trust Trust in 3VSL is defined as the probability that a user will behave as expected in the future. 3VSL models a user’s future behavior as a random variable x that takes 19 on one of three possible outcomes {1, 2, 3}, i.e., x = 1, x = 2 and x = 3 indicate the user will behave as expected, not as expected, or in an uncertain way, respectively. The third state, uncertain state, is introduced in 3VSL to capture the uncertainty that exists in trust assessment. Therefore, the probability density function (pdf) of x follows the categorical distribution: f(x|p) = 3∏ i=1 p [x=i] i , where p = (p1, p2, p3) and p1 + p2 + p3 = 1, pi represents the probability of seeing event i. The Iverson bracket [x = i] evaluates to 1 if x = i, and 0 otherwise. If the value tuple p is available, the pdf of x will be known and the probability of x = i can be computed. Unfortunately, p is an unknown parameter and needs to be estimated based on the observations of x. We treat p as a group of random variables that follows the Dirichlet distribution: p ∼ Dir(α, β, γ), where α, β, γ are hyper-parameters that control the shape of the Dirichlet distribution. We assume p follows Dirichlet distribution mainly because it is a conjugate prior of categorical distribution. In addition, because Dirichlet distribution belongs to a family of continuous multivariate probability distributions, we can have various pdfs of f(p) by changing the values of α, β, γ: f(p) = Cp1 α−1p2 β−1p3 γ−1, (3.1) where C is a normalizing factor ensuring p1 + p2 + p3 = 1. In this way, we assume p ∼ Dir(α, β, γ) to model the uncertainty in estimating p. 20 With the mathematical model in place, parameter p can be estimated based on the observations of x, according to the Bayesian inference. Given a set of independent observations of x, denoted by D = {x1, x2, · · · , xn} where xj ∈ {1, 2, 3} and j = 1, 2, · · · , n, we want to know how likely D is observed. This probability can be computed as P (D|p) = n∏ j=1 p [xj=1] 1 p [xj=2] 2 p [xj=3] 3 . Let ci denote the number of observations where x = i, we know ∑ ci = n. Then, the above equation becomes pc11 p c2 2 p c3 3 . Based on Bayesian inference, given observed data D, the posterior pdf of p can be estimated from f(p|D) = P (D|p)f(p) P (D) , where P (D|p) is the likelihood function pc11 p c2 2 p c3 3 , and f(p) the prior pdf of p. P (D) is the probability that D occurs, which is independent of p. So we have f(p|D) ∝ pc11 p c2 2 p c3 3 × pα−1 1 pβ−1 2 pγ−1 3 . That means the posterior pdf f(p|D) can be modeld by another Dirichlet distribution Dir(α+c1, β+c2, γ+c3). With the posterior pdf of p, we have the following predicative model for x: f(x|D) = ∫ f(x|p)f(p|D)dp. (3.2) This function is in fact a composition of Categorical (f(x|p)) and Dirichlet (f(p|D)) distributions, so it is called Dirichlet-Categorical (DC) distribution [89]. 21 Opinion In the previous section, we introduce how to model trust as a DC distribution. Because the shape of a DC distribution is determined by three parameters, we can instead use these parameters to form a vector to represent trust. This vector is called an opinion that expresses a trustor’s opinion about a trustee’s trustworthiness. For a given DC distribution, the only undetermined parameters are α, β, γ. We set α = β = γ = 1 in default, if there is no prior knowledge about D. In this case, the Dirichlet distribution becomes an uniform distribution, i.e., p1 = p2 = p3 = 1/3. Assuming p initially follows the uniform distribution is reasonable because we make no observation of x, and the best choice is to believe that x could be 1, 2, 3 with equal probability. As more observations of x are made, the pdf of p approaches to the true one. From Eq. 3.2, we can predict the probability of x = i where i = 1, 2, 3, i.e., whether a user will behave as expected, not as expected, or in an uncertain way. In other words, we can use Eq. 3.2 to compute the trustworthiness of a user. From Eq. 3.2, we can obtain the expectation of the probability that a user will behave as expected: P (x = 1|D) = ∫ P (x = 1|p1, p2, p3)P (p1, p2, p3|c1, c2, c3)d(p1, p2, p3) = Γ(c1 + c2 + c3) Γ(c1)Γ(c2)Γ(c3) ∫ pc1−1 1 pc2−1 2 pc3−1 3 = Γ(c1 + c2 + c3)Γ(c1 + 1)Γ(c2)Γ(c3) Γ(c1)Γ(c2)Γ(c3)Γ(c1 + c2 + c3 + 1) = c1 c1 + c2 + c3 , (3.3) 22 where Γ(n) = (n − 1)! is the Gamma function. Similarly, the probabilities that the user will behave not as expected or uncertain are P (x = 2|D) = c2 c1 + c2 + c3 and P (x = 3|D) = c3 c1 + c2 + c3 . If the hyper-parameters α, β, γ equal to 1, the trustworthiness of a user is only determined by c1, c2, c3, i.e., the numbers of observations that support the user will behave as expected, not as expected, and uncertain. We call these observations positive, negative, and uncertain evidence. In other words, a trustee X’s trustworthiness to trustor A can be modeled by the interaction evidence between them: ωAX = 〈αAX , βAX , γAX〉 |aAX . Here, ωAX denotes A’s opinion on X’s trustworthiness, and αAX , βAX , γAX refers to the amount of observed positive, negative and uncertain evidence, based on A’s interactions with X. We further name them belief, distrust and uncertainty parameters in the rest of this dissertation. The subscripts of αAX , βAX , γAX differentiate them from the prior α, β, γ, i.e., the former represents observed evidence while the latter is set as 1. Discounting Operation Because trust is modeled by DC distribution, in this section, we will model the trust propagation by defining the operation between two DC distributions (or opinions). Trust propagation in OSNs was intensively studied in the past decade. It 23 ⋅⋅⋅⋅⋅⋅ C1iA − 1i iA Aω − 1A iA 1iA + 1i iA Aω + (a) A general illustration of series topology. A ABω B C BCω (b) A simple example of series topology. Figure 3.1: Examples of series topologies can be illustrated in a series topology, e.g., Fig. 3.1(a), where two edges are connected in series if they are incident to a vertex of degree 2. In Fig. 3.1(a), the nodes are users in a trust social network. The directed edges indicate the opinions between them. Trust propagation means that if user Ai−1 trusts Ai and Ai trusts Ai+1, then Ai−1 will trust Ai+1, even if Ai−1 did not interact with Ai+1 before. Let’s take the example shown in Fig. 3.1(b) to define the discounting operation in 3VSL. Based on existing research works about trust propagation [11, 34, 35, 107], it is commonly agreed that the following assumptions hold: • A1: If A trusts B, B trusts C, then A trusts C. • A2: If A trusts B, B does not trust C, then A does not trust C. • A3: If A trusts B, B is uncertain about the trustworthiness of C, then A is uncertain about C’s trustworthiness. • A4: If A does not trust B, or A is uncertain about the trustworthiness of B, A is uncertain about the trustworthiness of C. It is worth mentioning that trust propagation refers to an opinion being transferred from a trustful user to another user. In other words, if A trusts B, then B’s opinion 24 of C will be transferred and becomes A’s opinion of C. Otherwise, if A does not trust or is uncertain about B, then A is uncertain about C as B’s opinion on C cannot be trusted. According to the 3VSL model, we model A’s opinion of B as ωAB = 〈αAB, βAB, γAB〉 , and B’s opinion on C as ωBC = 〈αBC , βBC , γBC〉 , where {αAB, βAB, γAB} = DAB and {αBC , βBC , γBC} = DBC represent the obser- vations made by A and B (on B and C), respectively. In this way, the expected probability that C will behave as A’s expectation can be computed from ∫∫ (x = 1|pAB)f(pAB|DAB)× f(x = 1|pBC)f(pBC |DBC)d(pAB)d(pBC). (3.4) The intuition behind Eq. 3.4 can be explained as follows. A trusts C if and only if A trusts B and B trusts C, which is the assumption A1 we made based on the findings from [11, 34, 107]. In other words, the probability that C will behave as A expects is equal to the probability that C will behave as B’s expects, if A trusts B. In the above equation, f(x = 1|pAB)f(pAB|DAB) gives the probability that A trusts B, and f(x = 1|pBC)f(pBC |DBC)d(pAB) denotes the probability that B trusts C. 25 Because the two events, i.e., A trusts B and B trusts C, are independent with each other, Eq. 3.4 can be rewritten as ∫ f(x = 1|pAB)f(pAB|DAB)d(pAB)×∫ f(x = 1|pBC)f(pBC |DBC)d(pBC). (3.5) The two integrations in the above equation are used to compute the expected probabilities that A trusts B and B trusts C, respectively. According to Eq. 3.3, we know the expected probabilities are ∫ f(x = 1|pAB)f(pAB|DAB)d(pAB) = αAB αAB + βAB + γAB ,∫ f(x = 1|pBC)f(pBC |DBC)d(pBC) = αBC αBC + βBC + γBC . (3.6) Inserting these two values into Eq. 3.5, we have the probability that C will behave as A expects: αABαBC (αAB + βAB + γAB)(αBC + βBC + γBC) . (3.7) According to assumption A2, the probability that C will not behave as A expects can be computed from ∫∫ f(x = 1|pAB)f(pAB|DAB)× f(x = 2|pBC)f(pBC |DBC)d(pAB)d(pBC). (3.8) 26 This equation makes sense because A does not trust C if and only if he trusts B and B does not trust C. Because the events that A trusts B and B does not trust C are independent, we have the expected probability that A does not trust C as αABβBC (αAB + βAB + γAB)(αBC + βBC + γBC) . (3.9) Finally, the expected probability that A is uncertain about C’s trustworthiness can be derived from assumptions A3 and A4: ∫∫ f(x = 1|pAB)f(pAB|DAB)× f(x = 3|pBC)f(pBC |DBC)+ f(x = 2|pAB)f(pAB|DAB) + f(x = 3|pAB)f(pAB|DAB) d(pAB)d(pBC). The expected probability can then be computed as αABγBC + (βAB + γAB)(αBC + βBC + γBC) (αAB + βAB + γAB)(αBC + βBC + γBC) . (3.10) Note that the summation of Eqs. 3.4, 3.8 and 3.10 equals 1. Because Eqs. 3.7, 3.9 and 3.10 give the current estimates of probabilities that C will behave as expected, not as expected, or in a uncertain way, respectively, we could use the following categorical distribution to model C’s future behavior (from A’s perspective). f(x|pAC) = 3∏ i=1 p [x=i] i , (3.11) 27 where pAC = (p1, p2, p3) and p1 = αABαBC (αAB + βAB + γAB)(αBC + βBC + γBC) , p2 = αABβBC (αAB + βAB + γAB)(αBC + βBC + γBC) , p3 = (βAB + γAB)(αBC + βBC + γBC) + αABγBC (αAB + βAB + γAB)(αBC + βBC + γBC) . (3.12) Based on our calculation, we know the categorical distribution is derived from B’s opinion on C. Let’s assume that B makes a set of observations x = {x1, x2, · · · , xn} on C’s behavior. According to our definitions, we know αBC , βBC , γBC equal to the number of observations where x = 1, x = 2, x = 3, respectively. Clearly, the observations B made about C do not reflect A’s opinion on C because 〈αBC , βBC , γBC〉 represents only B’s opinion on C. Here, the question is if A were asked to make the n observations, how many of them will be positive, negative, and uncertain. In other words, A needs to re-categorize B’s observations on C such that the updated evidence supports A’s current opinion on C. For each xj ∈ x where j = 1, 2, · · · , n, we know xj is observed given the underlying categorical distribution in Eq. 3.11. Therefore, we know x follows the multinomial distribution with parameters (n,pAC). From the multinomial distribution, we can compute the probability of any means of re-categorizing the observation set x. The maximum probability corresponds to the most-likely way of re-categorizing x. Therefore, we know the following re-categorization occurs with the 28 highest probability. αAC = p1(αBC + βBC + γBC) = αABαBC (αAB + βAB + γAB) , βAC = p2(αBC + βBC + γBC) = αABβBC (αAB + βAB + γAB) , γAC = p3(αBC + βBC + γBC), = (βAB + γAB)(αBC + βBC + γBC) + αABγBC (αAB + βAB + γAB) . (3.13) Therefore, we use ωAC = 〈αAC , βAC , γAC〉 to represent A’s opinion about C’s trustworthiness. It is worth mentioning that opinion ωAC is generated from distorting the positive/negative evidence in ωBC and saving them as uncertain evidence, i.e., αAC + βAC + γAC = αBC + βBC + γBC . (3.14) In other words, the total amount of evidence observed does not change during the discounting process. Based on the previous analysis, we formally define the discounting operation in 3VSL as follows. Definition 1 (Discounting Operation). Given three users A, B and C, if ωAB = 〈αAB, βAB, γAB〉 is A’s opinion on B’s trustworthiness, and ωBC = 〈αBC , βBC , γBC〉 is B’s opinion on C’s trustworthiness; the discounting operation ∆(ωAB, ωBC) computes A’s opinion on C as ωAC = ∆(ωAB, ωBC) = 〈αAC , βAC , γAC〉 , 29 where αAC = αABαBC (αAB + βAB + γAB) , βAC = αABβBC (αAB + βAB + γAB) , γAC = (βAB + γAB)(αBC + βBC + γBC) + αABγBC (αAB + βAB + γAB) . (3.15) Intuitively, the discounting operation can be understood as certain evidence from ωBC is distorted by ωAB and transferred into the uncertainty space of ωBC . Recall that the total amount of evidence of opinion ∆(ωAB, ωBC) is the same as ωBC ’s, we conclude the resulting opinion of a discounting operation shares exactly the same evidence space as the original opinion. It is worth mentioning that the discounting operation yields two properties. The first one is called the decay property: Corollary 3.0.1. Decay Property: Given two opinions ωAB and ωBC, ∆(ωAB, ωBC) operation yields αAC ≤ αBC, βAC ≤ βBC and γAC > γBC. Proof. Since αAB (αAB + βAB + γAB) ≤ 1, according to Eq 3.15, we have αAC ≤ αBC as well as βAC ≤ βBC . Hence, −αAC − βAC ≥ −βAC − βAC . According to Eq. 3.14, so we have γAC ≥ γBC . In other words, by applying a discounting operation, the uncertainty parameter of the resulting opinion increases while the belief and distrust parameters decrease. This property implies that the more trust propagates, the more uncertain the resulting opinion. The second one is called associative property: Corollary 3.0.2. Associative Property: Given three opinions ωAB, ωBC and ωCD, ∆(∆(ωAB, ωBC), ωCD) ≡ ∆(ωAB,∆(ωBC , ωCD)). 30 A ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 1 1 A B ω n n A B ω i i A B ω B (a) A 1 1 A B ω 2 2 A B ω B (b) Figure 3.2: Examples of parallel topologies Proof. Simply based on Eq 3.15. Discounting operation is, however, not commutative, i.e., ∆(ωAB, ωBC) 6= ∆(ωBC , ωAB). Given a series topology where opinions are ordered as ωA1A2 , ωA2,A3 , · · · , ωAn−1An , the final opinion can be calculated as ∆(∆(∆(ωA1A2 , ωA2A3), · · · ), ωAn−1An). As the discounting operation is associative, it is simplified as ∆(ωA1A2 , ωA2A3 , · · ·ωAn−1An). Combining Operation In this section, we will introduce the combining operation in 3VSL. According to previous works [11,34,107], several trust opinions can be fused into a consensus one by aggregating the opinions from different sources. Trust fusion can be illustrated by a parallel topology, e.g., Fig. 3.2(a), where two edges are connected in parallel if they join the same pair of distinct vertices. In Fig. 3.2(a), nodes A, B are users in a trust social network. The edges from A to B denote A’s opinions about B’s trustworthiness that are formed from different sources. Let’s use the simplest parallel topology shown in Fig. 3.2(b) to explain the combining operation. Let ωA1B1 = 〈αA1B1 , βA1B1 , γA1B1〉 31 and ωA2B2 = 〈αA2B2 , βA2B2 , γA2B2〉 be A’s opinions of B from two different sources. Here, we use {αA1B1 , βA1B1 , γA1B1} = DA1B1 and {αA2B2 , βA2B2 , γA2B2} = DA2B2 to represent the observations made by A on B from two different sources. According to the definition of an opinion, the expected probability that B will behave as A expects is computed from the following DC distribution. ∫ f(x = 1|pAB)f(pAB|DA1B1 ,DA2B2)d(pAB), (3.16) where DA1B1 and DA2B2 denote the aggregated observations from two sources. The intuition of Eq. 3.16 can be explained as follows. A first infers the parameters pAB by aggregating the observations DA1B1 and DA2B2 . Therefore, the posterior pdf of pAB becomes f(pAB|DA1B1 ,DA2B2). (3.17) Based on the inferred parameters pAB, the probability that B will behave as A expects can be computed from f(x = 1|pAB). By considering all possible values of pAB, we can obtain Eq. 3.16. We now derive the analytic form of Eq. 3.16. The intuition of trust fusion can be explained as follows. A first forms his opinion on B’s trustworthiness from observation DA1B1 . As such, the pdf of parameters pAB can be computed. Then, A adjusts its estimate about pAB based on a new set of evidence DA2B2 . Therefore, Eq. 3.17 can be regarded as the distribution of pAB based on (1) the posterior evidence in DA2B2 and (2) the prior parameters pA1B1 estimated from DA1B1 . According to Bayes’ rule, 32 it can be expressed as follows. f(pAB|DA1B1 ,DA2B2) = f(DA2B2|pA1B1)f(pA1B1) f(DA2B2) = f(DA2B2|pA1B1)f(pA1B1)∫ f(DA2B2|pA1B1)f(pA1B1)dpA1B1 . (3.18) In the above equation, parameters pA1B1 are derived from DA1B1 that follow the Dirichlet distribution, so its pdf f(pA1B1) can be computed as follows. f(pA1B1) = Γ(αA1B1 + βA1B1 + γA1B1) Γ(αA1B1)Γ(βA1B1)Γ(γA1B1) × (p1)αA1B1 −1(p2)βA1B1 −1(p3)γA1B1 −1. (3.19) On the other hand, because DA2B2 follows the multinomial distribution derived from pA1B1 , its pdf f(DA2B2|pA1B1) can be expressed as f(DA2B2|pA1B1) = Γ(αA2B2 + βA2B2 + γA2B2 + 1) Γ(αA2B2 + 1)Γ(βA2B2 + 1)Γ(γA2B2 + 1) × (p1)αA2B2 (p2)βA2B2 (p3)γA2B2 . (3.20) 33 Substituting f(pA1B1) and f(DA2B2|pA1B1) in Eq. 3.18 by Eq. 3.19 and Eq. 3.20, we obtain the analytic form of Eq. 3.18. f(pAB|DA1B1 ,DA2B2) = Γ(αAB + βAB + γAB) Γ(αAB)Γ(βAB)Γ(γAB) × (p1)αAB−1(p2)βAB−1(p3)γAB−1, (3.21) where αAB = αA1B1 + αA2B2 , βAB = βA1B1 + βA2B2 , γAB = γA1B1 + γA2B2 . Obviously, Eq. 3.21 can be considered the pdf of the following Dirichlet distribution. Dir(αA1B1 + αA2B2 , βA1B1 + βA2B2 , γA1B1 + γA2B2). Therefore, the following equation ∫ f(x|pAB)f(pAB|DA1B1 ,DA2B2)d(pAB) can be regarded as a DC distribution upon observations {αA1B1 + αA2B2 , βA1B1 + βA2B2 , γA1B1 + γA2B2}. According to the definition of an opinion, the analytic form of 34 Eq. 3.16 can be expressed as follows. ∫ f(x = 1|pAB)f(pAB|DA1B1 ,DA2B2)d(pAB) = αA1B1 + αA2B2 αA1B1 + αA2B2 + βA1B1 + βA2B2 + γA1B1 + γA2B2 . The probability that B will not behave as A expects and the probability that B will behave in an uncertain way can be expressed as follows. ∫ f(x = 2|pAB)f(pAB|DA1B1 ,DA2B2)d(pAB) = βA1B1 + βA2B2 αA1B1 + αA2B2 + βA1B1 + βA2B2 + γA1B1 + γA2B2 , and ∫ f(x = 3|pAB)f(pAB|DA1B1 ,DA2B2)d(pAB) = γA1B1 + γA2B2 αA1B1 + αA2B2 + βA1B1 + βA2B2 + γA1B1 + γA2B2 , respectively. Now, we formally define the combining operation as follows. Definition 2 (Combining Operation). Let ωA1B1 = 〈αA1B1 , βA1B1 , γA1B1〉 and ωA2B2 = 〈αA2B2 , βA2B2 , γA2B2〉 be the opinions on two parallel paths from users A to B, the combining operation Θ(ωA1B1 , ωA1B1) is carried out as follows. ωAB = Θ(ωA1B1 , ωA2B2) = 〈αAB, βAB, γAB〉 , (3.22) where  αAB = αA1B1 + αA2B2 βAB = βA1B1 + βA2B2 γAB = γA1B1 + γA2B2 . (3.23) 35 It is worth mentioning that the combining operation yields two properties. Corollary 3.0.3. Commutative Property: Given two independent opinions ωA1B1 and ωA2B2, Θ(ωA1B1 , ωA2B2) ≡ Θ(ωA2B2 , ωA1B1). Proof. Based on Eq. 3.23. Corollary 3.0.4. Associative Property: Given three independent opinions ωA1B1, ωA2B2 and ωA3B3, then Θ(ωA1B1 ,Θ(ωA2B2 , ωA3B3)) ≡ Θ(Θ(ωA1B1 , ωA2B2), ωA3B3). Proof. Based on Eq. 3.23. If there exist multiple parallel opinions ωA1B1 , ωA2B2 · · ·ωAnBn from A to B, the overall opinion can be calculated as Θ(Θ(Θ(ωA1B1 , ωA2B2), · · · ), ωAnBn). As combining operation is commutative and associative, it is simplified to Θ(ωA1B1 , ωA2B2 , · · ·ωAnBn). Expected Belief of An Opinion With the proposed discounting and combining operations, the trust between two users in an OSN can be computed. which will be introduced in chapter 4. Many times, it is desired to represent the trust by a single number, rather than a vector composed of three numbers. Therefore, we introduce how to compute the expected belief of an opinion. Given an opinion ωAX = 〈αAX , βAX , γAX〉, it is interesting to know how likely X will perform the desired actions requested by A. We call this probability as the expected belief of ωAX . Although αAX denotes the belief of the opinion ωAX , other components like βAX , γAX also need to be considered to compute the expected belief. 36 The expected belief of an opinion in the subjective logic is defined as EωAX = αAX αAX + βAX + 2 + 1× aAX αAX + βAX + 2 = αAX + aAX αAX + βAX + 2 . According to this definition, the expected belief in 3VSL would become EωAX = αAX + aAXγAX αAX + βAX + γAX . The above definition, however, is incorrect and we will illustrate the problem using an example shown in Fig. 3.3. In this figure, there exist two opinions ω1 and ω2. 1 ω 2 ω � � 3 ω Figure 3.3: Combining opinions with high and low uncertainties We assume the total evidence values of ω1 and ω2 are equal, i.e., λ1 = λ2 where λ1 = α1 + β1 + γ1 and λ2 = α2 + β2 + γ2. If these two opinions are combined, the resulting opinion ω3 can be seen as a mixture of ω1 and ω2. According to the combining operation (Eq. 3.23) in 3VSL, the evidence value for the neutral state γ3 of ω3 becomes γ1 + γ2. Assuming γ2 � γ1, we have γ3 � γ1. Combining more opinions actually increases the evidence values, so the uncertainty of the resulting opinion should decrease. However, based on Eq. 3.24, the resulting opinion ω3 is more uncertain than ω1 (as γ3 � γ1), which is contradictory to the common sense. In 37 other words, ω2 polluted the certainty of ω3 if uncertainty is considered in computing expected belief. We know that αAX and βAX are the numbers of (negative and positive) certain evidence, so they must be used in computing the expected belief. γAX only records the amount of neutral evidence, so it should be omitted in the expected belief computation. Ignoring the uncertain evidence recorded as γAX , the DC distribution of ωAX is collapsed into a Beta-Categorical (BC) distribution: f(p1, p2 |αAX , βAX ) = Γ(αAX + βAX) Γ(αAX) · Γ(βAX) (1− p1)αAX−1pβAX−1 2 . Consequently, the original opinion is collapsed into ωAX = 〈αAX , βAX〉 . With the collapsed opinion, we apply the approach proposed in [93] to compute the expected belief as follows. EωAX = ( αAX αAX + βAX + βAX αAX + βAX ) aAX · (1− cAX) + αAX αAX + βAX · cAX = αAX αAX + βAX · cAX + aAX · (1− cAX), (3.24) where cAX is the certainty factor [93] of a Beta distribution, and aAX is the base rate. The certainty factor cAX , ranging from 0 to 1, is determined by the total amount of certain evidence and the ratio between positive and negative evidence. It is computed from cAX = 1 2 ∫ 1 0 ∣∣∣∣ 1 B(αAX , βAX) xαAX (1− xβAX )− 1 ∣∣∣∣dx. (3.25) 38 Basically, cAX approaches 1 when the amount of certain evidence or the disparity between positive and negative evidence becomes large. 39 THE ASSESSTRUST ALGORITHM In this chapter, we introduce the AssessTrust (AT) algorithm that implements the 3VSL model and is able to conduct trust assessment in social networks with arbitrary topologies. Here, we assume that social network graph does not contain cycles, i.e., we are interested in the trust assessment in a directed acyclic graph (DAG). To ensure AT works in arbitrary DAGs topologies, we need to prove that AT can handle non-series-parallel network topologies, e.g., the bridge topology in Fig. 4.5(a). This is a challenge because the only operations available for trust computation are the discounting and combining operations. The issue is that discounting/combining operation requires the network topologies to be series/parallel. We address this challenge by differentiating the distorting and original opinions in trust propagation. For example, if A trusts B and B trusts C, then A’s opinion on B is called the distorting opinion, and B’s opinion on C is the original opinion. We discover that, in trust fusion, the original opinions can be used only once but the distorting opinions can be used any number of times. This is because the distorting opinion only depreciates certain evidence values into uncertain ones, it does not change the total amount of evidence. That also implies the distorting opinion from A to B, shown in the bridge topology in Fig. 4.5(a), can appear twice in both sub-graphs A→ B → C and A→ B → D → C. In addition, we have to further show that AT works in arbitrary DAGs. This is a challenge because it is impossible to test AT in all possible network topologies. We address this challenge by mathematically proving AT works in arbitrary networks. By addressing these two challenges, we present the AT algorithm and will use an example to illustrate how the AT algorithm works. 40 2ω 3ω 1ω A B C (a) 1ω 2ω 3ω A B C (b) Figure 4.1: Difference between distorting and original opinions. Properties of Different Opinions Before introducing the AT algorithm, we need to understand some important features of the discounting operation defined in 3VSL. For a discounting operation, there must be two opinions involved. However, the functionality of the two opinions are different. Definition 3 (Distorting and Original Opinions). Given a discounting operation ∆(ωAB, ωBC), we define ωAB as the distorting opinion, and ωBC the original opinion. To understand the difference between the distorting and original opinions, we study two special cases shown in Fig. 4.1. By analyzing them, we discover a distorting opinion can be used several times in trust computation but an original opinion can be used only once. Theorem 4.0.1. Let ωB1C1 = 〈αB1C1 , βB1C1 , γB1C1〉 and ωB2C2 = 〈αB2C2 , βB2C2 , γB2C2〉 41 be the two opinions on two parallel paths from users B to C. Let ωAB = (αAB, βAB, γAB) be the opinion from A to B, then the following equation will always hold: Θ(∆(ωAB, ωB1C1),∆(ωAB, ωB2C2)) ≡ ∆(ωAB,Θ(ωB1C1 , ωB2C2)). (4.1) Proof. Let’s take a look at the left side of Eq. 4.1: Θ(∆(ωAB, ωB1C1),∆(ωAB, ωB2C2)). According to the definition of the discounting operation, the result of ∆(ωAB, ωB1C1) can be written as ωAC1 = ∆(ωAB, ωB1C1) = 〈αAC1 , βAC1 , γAC1〉 , where αAC1 = αABαB1C1 αAB + βAB + γAB , βAC1 = αABβB1C1 αAB + βAB + γAB , γAC1 = (βAB + γAB)(αB1C1 + βB1C1 + γB1C1) αAB + βAB + γAB + αABγB1C1 αAB + βAB + γAB . (4.2) 42 The result of ∆(ωAB, ωB2C2) can be written as ωAC2 = ∆(ωAB, ωB2C2) = 〈αAC2 , βAC2 , γAC2〉 , where αAC2 = αABαB2C2 αAB + βAB + γAB , βAC2 = αABβB2C2 αAB + βAB + γAB , γAC2 = (βAB + γAB)(αB2C2 + βB2C2 + γB2C2) αAB + βAB + γAB + αABγB2C2 αAB + βAB + γAB . (4.3) If these two opinions are combined, we will have ωAC = Θ(∆(ωA1B1 , ωBC),∆(ωA2B2 , ωBC)) = 〈αAC , βAC , γAC〉 , where αAC = αABαB1C1 + αABαB2C2 αAB + βAB + γAB , βAC = αABβB1C1 + αABβB2C2 αAB + βAB + γAB , 43 γAC = (βAB + γAB)(αB1C1 + βB1C1 + γB1C1) αAB + βAB + γAB + αABγB1C1 αAB + βAB + γAB + (βAB + γAB)(αB2C2 + βB2C2 + γB2C2) αAB + βAB + γAB + αABγB2C2 αAB + βAB + γAB . Now, we look at the right side of Eq. 4.1: ∆(ωAB,Θ(ωB1C1 , ωB2C2)). (4.4) The term Θ(ωB1C1 , ωB2C2) in the above formula can be written as ωBC = Θ(ωB1C1 , ωB2C2) = 〈αBC , βBC , γBC〉 , (4.5) where αBC = αB1C1 + αB2C2 , βBC = βB1C1 + βB2C2 , γBC = γB1C1 + γB2C2 . Putting Eq. 4.6 back into Eq. 4.4, we will have ω′AC = ∆(ωAB,Θ(ωB1C1 , ωB2C2)) = 〈α′AC , β′AC , γ′AC〉 , 44 where α′AC = αAB(αB1C1 + αB2C2) αAB + βAB + γAB = αABαB1C1 + αABαB2C2 αAB + βAB + γAB , β′AC = αAB(βB1C1 + βB2C2) αAB + βAB + γAB = αABβB1C1 + αABβB2C2 αAB + βAB + γAB , γ′AC = (βAB + γAB)(αB1C1 + βB1C1 + γB1C1) αAB + βAB + γAB + αABγB1C1 αAB + βAB + γAB + (βAB + γAB)(αB2C2 + βB2C2 + γB2C2) αAB + βAB + γAB + αABγB2C2 αAB + βAB + γAB . (4.6) Clearly, ω′AC is equivalent to ωAC . Theorem 4.0.2. Let ωA1B1 = (αA1B1 , βA1B1 , γA1B1) and ωA2B2 = (αA2B2 , βA2B2 , γA2B2) be the opinions on two parallel paths between two users A and B; Let ωBC = (αBC , βBC , γBC) be an opinion from users B to C, then the following equation does not hold: Θ(∆(ωA1B1 , ωBC),∆(ωA2B2 , ωBC)) ≡ ∆(Θ(ωA1B1 , ωA2B2), ωBC). (4.7) 45 Proof. In Chapter 3, we have shown that the combining operation applies on Θ(ωA1B1 , ωA2B2) when the evidence of ωA1B1 and ωA2B2 are from different sources, i.e., they are independent. In the left side of Eq. 4.7, opinions ∆(ωA1B1 , ωBC) and ∆(ωA2B2 , ωBC) share the same evidence from the opinion ωBC . As a result, the combining operation does not apply to them. Therefore, ∆(Θ(ωA1B1 , ωA2B2), ωBC) is the only correct solution, and is not equal to Θ(∆(ωA1B1 , ωBC),∆(ωA2B2 , ωBC)). From Theorem 4.0.1 and 4.0.2, we note that reusing ωAB in case (a) is allowed but reusing ωBC in case (b) is not. The difference between ωAB and ωBC is that ωAB is a distorting opinion while ωBC is an original opinion. Therefore, we conclude that in trust computation, original opinions can be combined only once, while distorting opinions can be used any number of times because they do not change the total amount of evidence of final opinions. Arbitrary Network Topology As the distorting and original opinions are distinguished, we will prove that 3VSL is capable of handling non-series-parallel network topologies. Theorem 4.0.3. Given an arbitrary two-terminal directed graph G = (V,E) where A, C are the first and second terminals. In the graph, a vertex u represents a user, the edge e(u, v) denotes u’s opinion about v’s trustworthiness, denoted as ωuv. By applying the discounting and combining operations, the resulting opinion ωAC is solvable and unique. Proof. We prove the theorem in a recursive manner, i.e., reducing the original problem into sub-problem(s) and continuing to reduce the sub-problems until the base case is solvable and yields a unique solution. 46 A C �� �� �� �� �� �� � � � � ���������� � � � Figure 4.2: Illustration of an arbitrary network topology. As shown in Fig. 4.2, we assume there are m nodes (c1, c2, · · · , cm) connecting to C, i.e., e(ci, C) ∈ E where i ∈ [1,m]. There are n nodes (a1, a2, · · · , an) being connected from A, i.e., e(A, aj) ∈ E where j ∈ [1, n]. Reduction rules Case 1 : If there is only one node connecting to C, i.e., m = 1, then ωAC = ∆(ωAc1 , ωc1C). In this case, we reduce the problem of computing ωAC to ωAc1 , and A and c1 are connected by a smaller sub-graph. Case 2 : If there is more than one node connected to C, i.e., m > 1, ωAC is equal to Θ(∆(ωAc1 , ωc1C),∆(ωAc2 , ωc2C), · · · ,∆(ωAcm , ωcmC)) due to Theorem 4.0.1. Therefore, ωAC is solvable and unique if and only if each ωAci is solvable and unique, where ωAci corresponds to the sub-graph G′ = G − Σe(ci, C) − C. In this case, we reduce the problem of computing ωAC to ωAci . In each round of reduction, G is reduced into a smaller graph with such that |E| = |E|−m and |V | = |V |− 1. After applying the reduction rules on sub-problems recursively, the base case will finally be reached, i.e., |E| = 1 and |V | = 2. 47 Base Case The sub-graph of base case contains only one edge from A to aj where j ∈ [1, n]. As ωAaj is known from the original graph G, the base case is solvable and its solution is unique. Applying the equations in Case 1 and 2 repeatedly, we can obtain an unique ωAC . Differences between 3VSL and SL In this section, we present the differences between 3VSL and SL by introducing several examples. Compared to SL, 3VSL introduces the uncertainty state to keep track of the uncertainty generated when trust propagates within an OSN. Particularly, the uncertainty state is used to store the “distorted” positive and negative evidence in trust propagation and fusion. It is well-known that SL can only handle series-parallel network topologies. A series-parallel graph can be decomposed into many series (see Fig. 3.1) or parallel (see Fig. 3.2) sub-graphs so that every edge in the original graph will appear only once in the sub-graphs [44]. In real-world social networks, however, the connection between two users could be too complicated to be decomposed into series-parallel graphs. To apply the SL model, a complex topology has to be simplified into a series-parallel topology by removing or selecting edges [37–39]. The simplifications will result in information loss and inaccurate trust assessment. This problem is also observed in our numerical experiments, which will be presented in Chapter 6. Furthermore, it is not clear which edges need to be removed in a large-scale OSN, i.e., there is no algorithm for the solutions proposed in [37–39]. Due to the lack of the uncertainty state, SL results in inaccurate trust assessments even if it processes a social network similarly to 3VSL. We take two 48 examples to explain why the inaccuracy will occur. Example 1 α γ β � ABω BCω ACω A B C ∆ (a) Illustration of the discounting operation BCω ACω (b) Result of 3VSL BC ω AC ω AC ω′ 2 α α κ ′= × 2 κ κ × 2 β κ ′× 2 (c) Result of Subjective Logic Figure 4.3: Difference between 3VSL and SL on the discounting operation. Let’s consider a series topology composed of A, B and C, as shown in Fig 4.3(a). We assume the evidence values for α, β and γ are non-zero in both ωAB and ωBC . A’s opinion of C’s trustworthiness can be computed by applying the discounting operation defined in 3VSL (or SL) on ωAB and ωBC , i.e., ωAC = ∆(ωAB, ωBC). With the 3VSL model, the total number of evidence in the resulting opinion ωAC is the same as ωBC , as shown in Fig 4.3(b). Part of αBC and βBC will be transferred into γAC , indicating a “distortion” from positive and negative evidence values to uncertain evidence values. On the other hand, with the SL model, the distorted evidence values are merged into the prior uncertainty state, which is a fixed 49 number and always equals to 2, as shown in Fig 4.3(c). As a result, the positive and negative evidence values in ωAC shrink, leading to the missing of evidence. Example 2 2 2A Bω 1 1A Bω A BΘ α γ (a) Two opinions are combined 2 2A Bω 1 1A Bω + + ABω (b) Result of 3VSL 1 1 A B ω′ 2 2γ γ κ ′≡ = × 2 2 A B ω′ 2 2 A B ω 1 1 A B ω 2 α α κ ′= × + 2 2 A B ω 1 1 A B ω AB ω (c) Result of Subjective Logic Figure 4.4: Difference between 3VSL and SL on the combining operation. Let’s consider a parallel topology shown in Fig 4.4(a). A has two parallel opinions ωA1B1 and ωA2B2 on B. We assume the evidence values α and γ are non- zero and β is zero in both ωA1B1 and ωA2B2 . A’s opinion of B can be computed by applying the combining operation defined in 3VSL (or SL) on ωA1B1 and ωA2B2 , i.e., ωAB = Θ(ωA1B1 , ωA2B2). 50 As shown in Fig 4.4(b), according to 3VSL, the numbers of positive and uncertain evidence in the resulting opinion ωAB are the sums of the positive and uncertain evidence numbers in ωA1B1 and ωA2B2 . As shown in Fig 4.4(c), using SL, the uncertain evidence value in the resulting opinion is always 2. According to the combining operation defined in SL, either the uncertain evidence values in ωA1B1 or those in ωA2B2 are ignored. As a result, the number of positive evidence values will be more than the actual. The problems identified in the above examples will impact the accuracy of SL. On the other hand, 3VSL avoids these problems by treating uncertainty as a third state. This conclusion will be validated in Chapter 6, by comparing 3VSL and SL using two real OSN datasets. AssessTrust Algorithm Based on Theorem 4.0.3, we design the AssessTrust algorithm. The algorithm is based on the 3VSL model and is able to work with arbitrary network topologies. The inputs of this algorithm include the corresponding graph G, the trustor A, the trustee C, and the maximum searching depth H, measured by number of hops. Specifically, H determines the longest distance between the trustor and trustee. H controls the searching depth on graph G, which is necessary because G could be potentially very large. H is helpful in reducing the running time of AssessTrust without sacrificing much trust assessment accuracy. Illustration of the AssessTrust Algorithm In this section, we will use the bridge topology shown in Fig. 4.5(a) to illustrate how the AT algorithm computes A’s indirect opinion upon C, denoted as ΩAD. 51 Algorithm 4.1: AssessTrust(G, A, C, H). Require: G, A, C, and H. Ensure: ωAC . 1: n← 0 2: if H > 0 then 3: for all incoming edges e(ci, C) ∈ G do 4: if ci = A then 5: ωi ← ωAci 6: else 7: G′ ← G− e(ci, C) 8: ωAci ← AssessTrust(G′, A, ci, H − 1) 9: ωi ← ∆(ωAci , ωciC) 10: end if 11: n← n+ 1 12: end for 13: if n > 1 then 14: ωAC = Θ(ω1 · · ·ωn) 15: else 16: ωAC = ωn 17: end if 18: else 19: ωAC = 〈0, 0, 0〉 20: end if A D B C ABΩ ACΩ ACω ABω BDω CDω ADΩ (a) Bridge topology (?, )BDω∆ (? , ?)ADΩ = Θ (?, )CDω∆ (? , ?)Θ (?, )BCω∆ ABω ABω ACω (b) Decomposition parsing pars- ing tree Figure 4.5: An illustration of 3VSL based on the bridge topology. 52 To differentiate from the direct opinion, we use Ω to denote the indirect opinion. As shown in Fig. 4.5(a), to compute ΩAD, discounting and combining operations are applied on opinions ωAB, ωAD, ωBD, ωCD, and ωBC . AT starts from the trustee D in Fig. 4.5(a), searches the network backwards and recursively computes the trustworthiness of every user during the search. As a result, we get a parsing tree, shown in Fig. 4.5(b), to describe how discounting and combining operations are applied in computing A’s opinion of D. Traversing the parsing tree in a bottom- up manner, A’s indirect opinion of D, ΩAD, can be computed as Θ (∆(ωAB, ωBD),∆(Θ(∆(ωAB, ωBC), ωAC), ωCD)) . (4.8) To understand the time complexity of AT when it is applied on the bridge topology, we use AT (k)(i, j) to denote that it is the kth time that AT is called to compute the user i’s opinion on j. At the time when AT is first called, A’s opinion on D is computed from Θ (∆(ΩAB, ωBD),∆(ΩAC , ωCD)) , where ΩAB and ΩAC are A’s indirect opinions on B and C, respectively. These two opinions will then be provided by AT (2)(A,B) and AT (3)(A,C), respectively. In AT (3)(A,C), AT computes A’s opinion of C as Θ (∆(ΩAB, ωBC), ωAC) , where ΩAB is computed by AT (4)(A,B). Finally, A’s opinion of D can be computed from Eq. 4.8. With the bridge topology, AT is called four times in total: AT (1)(A,D), AT (2)(A,B), AT (3)(A,C) and AT (4)(A,B). Note that the AT (A,B) is called twice 53 in this example, i.e., in sub-graphs A → B → C and A → B → D → C, which is allowed in 3VSL. Time Complexity Analysis Finally, we present the time complexity of AssessTrust in this section. Since AT is a recursive algorithm, the recurrence equation of its time complexity is T (n) = (n− 1) · (T (n− 1) + C1) + C2 +O(n− 1) = (n− 1) · T (n− 1) +O(n− 1) + C, where (n − 1) is the maximum number of branches from the trustee node (line 3), assuming there are n nodes in the network. T (n − 1) is the time complexity of recursively running AT on each branch (line 8), C1 is the time for lines 4, 5, 6, 7, 9, 10 and 11. O(n − 1) is the time for combining operations (line 14). C2 is the time used outside the “for” loop (line 13− 20). Therefore, the time complexity of AT is O ( k∑ i=1 (n− 1)! (n− 1− i)! ) = O(nk), where k is the searching depth, and n is the number of nodes in the network. Note that the time complexity is for one-to-one trust assessment in OSNs. To solve the MTA problem, AT needs to go through every trustee one by one, so its time complexity is O(n·nk) = O(nk+1) in total. Clearly, this is unacceptable for most MTA applications. This problem motivates us to propose a more efficient solution to MTA problem, which will be shown in the next chapter. 54 MASSIVE TRUST ASSESSMENT IN OSNS One major limitation of AssessTrust is that it is inefficient in conducting massive trust assessments (MTA). To efficiently address the MTA problem, we propose the OpinionWalk algorithm that is based on AT and offers a better time complexity. To design the OpinionWalk algorithm, we need to address the following three challenges. The first challenge is how to address the MTA problem while keeping low time complexity. This is a challenge because AT is designed for one-to-one trust assessment but MTA focuses on one-to-many situations. To address this challenge, we use an opinion matrix to represent the network’s topology and an individual opinion vector to store the trustworthiness of all nodes. In this way, similar to matrix operations, the individual opinion vector can be updated in a parallel manner. Based on this novel design, the time complexity is reduced from O(nk+1) to O(n3), where k is the longest distance in hops from the trustor to the trustee node. The second challenge is to eliminate the recursive operations in the AT algorithm. This is a challenge because AT needs to first transform a trust social network into a recursion tree, and then processes its sub-trees before getting into the upper-level of the tree. To address this challenge, we design the OpinionWalk algorithm to implement this recursive procedure in an iterative way. This is non- trivial because OpinionWalk has to use the operations defined in 3VSL. As recursive operations are slower and take up more memory/stack, OpinionWalk offers a faster running time, especially in large-scale networks. The third challenge is to show OpinionWalk is equivalent to AT, in addressing MTA. This is a challenge because these two algorithms are different and we need to prove they output the same results in arbitrary network topologies. To address this challenge, we first prove the opinion walk operations equivalently implement 55 the discounting and combination operations in 3VSL. Then, we extend the proof into arbitrary network topologies and recursively show that each case encountered in AT can be equivalently solved by OpinionWalk. In other words, OpinionWalk is an equivalent implementation of AT. Additionally, we analyze OpinionWalk’s time complexity and show that it offers a better time complexity. At the end of this chapter, we also use an example to illustrate how OpinionWalk works. Design of OpinionWalk OpinionWalk is essentially a matrix-based algorithm that implements 3VSL in a more efficient way to address the MTA problem. Given a trust social network G = (V,E,w), OpinionWalk represents this graph by an opinion matrix M . The elements in M are edges/opinions between nodes in the graph G = (V,E,w). The trustworthiness of all nodes are stored in the individual opinion vector Y . The procedure of OpinionWalk can be expressed as an iteration equation: Y (k) = MT � Y (k−1), where k is the current searching depth in the graph from the trustor to the trustee. The operation rules of � will be introduced later. The trustworthiness of a given trustee can be obtained from Y . OpinionWalk is more efficient than AT because it uses an iterative method rather than a recursive one to address the MTA problem. Based on 3VSL, we define two special opinions that will be used to initialize the OpinionWalk algorithm. Definition 4 (Uncertain Opinion). An uncertain opinion O is defined as O ∆ = 〈0, 0, 0〉 , 56 that indicates the trustor is totally uncertain about the trustee’s trustworthiness. Definition 5 (Absolute Opinion). An absolute opinion I is defined as I ∆ = 〈∞, 0, 0〉 , that indicates the trustor has infinite positive evidence, hence absolutely trusts the trustee. Based on the uncertain opinion and 3VSL, we can have the following corollaries. Corollary 5.0.1. Applying the discounting operation on O and an opinion ω, we have ∆(ω,O) = O and ∆(O, ω) = O. Corollary 5.0.2. Applying the combining operation on O and an opinion ω, we have Θ(ω,O) = Θ(ω,O) = ω. Based on the absolute opinion and 3VSL, we can have the following corollaries. Corollary 5.0.3. Applying the discounting operation on I and an opinion ω, we have ∆(ω, I) = ω and ∆(I, ω) = ω. Corollary 5.0.4. Applying the combining operation on I and an opinion ω, we have Θ(ω, I) = Θ(ω, I) = I. Opinion Matrix Definition 6. Given a trust social network containing n nodes, an opinion matrix M is an n× n matrix: Mn×n ∆ =  ω11 ω12 ... ω1n ω21 ω22 · · · · · · · · · · · · · · · · · · ωn1 · · · · · · ωnn  , 57 where each element ωij (i, j ≤ n) denotes the direct opinion from node i to j. Unlike the traditional representation of a graph, e.g., adjacency or Laplace matrix, the entries in M are the direct opinions between nodes in G. If i does not have a direct opinion on j, we use the uncertain opinion to represent the corresponding entry, i.e., ωij ∆ = O (i, j ∈ n, i 6= j) if eij /∈ E. Individual Opinion Vector Definition 7. An individual opinion vector Y (k) i is an n× 1 column vector composed of n opinions: Y (k) i ∆ = [ Ω (k) i1 ,Ω (k) i2 , · · · ,Ω (k) ij , · · · ,Ω (k) in ]T , where Ω (k) ij denotes user i’s individual opinion on j. The head note k indicates the current iteration step in the OpinionWalk algorithm. Opinion Walk Operation Definition 8. An opinion walk operation � “multiplies” matrix M and vector Y (k−1) i to yield a new vector Y (k) i as follows. Y (k) i = MT � Y (k−1) i ∆ =  Θ(∆(Ω (k−1) i1 , ω11), · · · ,∆(Ω (k−1) in , ωn1)), Θ(∆(Ω (k−1) i1 , ω12), · · · ,∆(Ω (k−1) in , ωn2)), · · · Θ(∆(Ω (k−1) i1 , ω1n), · · · ,∆(Ω (k−1) in , ωnn))  ∆ = [ Ω (k) i1 ,Ω (k) i2 , · · · ,Ω (k) ij , · · · ,Ω (k) in ]T , where k denotes the iteration step. The Θ and ∆ implement the combining and discounting operations in 3VSL. 58 j ms i … 1 s 2 s smj w s2j w s1j w (a) (k − 1)-th iteration ( ) ( 1) ( 1) 1 1( ( , )... ( , ), ...)k k k ij i s is sjω ω − − Ω = Θ ∆ Ω ∆ Ω j s sjω i � � � ( 1)k ij − Ω 1 (b) k-th iteration ( ) ( 1) ( 1) 1 1( ( , )... ( , ), ...)k k k ij i s is sjω ω − − Ω = Θ ∆ Ω ∆ Ω ( 1) ( 2) ( 2) 1 1( ( , )... ( , ), ...)k k k ij i s is sjω ω − − − Ω = Θ ∆ Ω ∆ Ω � k (c) i’s individual opinions on the other nodes is updated by the status of last iteration Figure 5.1: A detailed illustration of OpinionWalk. When the OpinionWalk algorithm is initialized, Y (0) is set as Y (0) i = [ Ω (0) i1 ,Ω (0) i2 , · · · ,Ω (0) ii , · · ·Ω (0) in ]T = [O,O,O, · · · , I, · · · ,O]T, This vector indicates node i does not trust other nodes except for itself. In the following steps, the OpinionWalk either updates Ω (k−1) ij or keeps it unchanged. A detailed explanation of the opinion walk operation can be seen in Fig. 5.1. As shown in Fig. 5.1(a), at the (k−1)-th iteration, node i’s individual opinions on all other nodes (∀ j ∈ V \i ) are stored in the individual opinion vector, which is denoted as Y (k−1). Then, at the k-th iteration, as shown in Fig. 5.1(b), i’s individual opinion 59 on j (∀ j ∈ V \i ) is updated by applying the discounting and combining operations, respectively. The discounting operation is applied to Ω (k−1) is and ωsj. Ω (k−1) is is i’s individual opinion on s ∈ S for the (k − 1)-th iteration, where S is the set of j’s in- neighbors. ωsj is s’s direct opinion on j, which does not change in all iterations. As shown in Fig. 5.1(c), the logic behind the discounting operation is using i’s individual opinion on s (in the (k − 1)-th iteration) and s’s direct opinion on j to form i’s “partial” opinion Ωis|sj on j (in the k-th iteration). In other words, the“partial” opinion is made through i’s individual opinion on s (Ωis) and s’s direct opinion on j (ωsj). The combining operation is then applied on the results of the discounting operations from above. The logic behind the combining operation is aggregating all of the “partial” opinions Ωis|sj (from j’s in-neighbor nodes) together to form i’s overall individual opinion on j, which is shown in Fig. 5.1(c). As shown in Fig. 5.2, the OpinionWalk operation is similar to the multiplication between a matrix and a vector. The difference is that the summation and production operations are replaced by the combining and discounting operations. Notice that mathematically k denotes the iteration number of the opinion walk operation, as shown in Fig. 5.2. On the other hand, as shown in Fig. 5.1(a) and Fig. 5.1(b), the physical meaning of k is the searching depth of the OpinionWalk algorithm that originates from the trustor. Most importantly, the trustor’s individual opinion on any trustee within k hops can be found from Y (k). OpinionWalk Algorithm The pseudo-code of the OpinionWalk algorithm is shown in Algorithm 5.2. In the algorithm, line 3 controls how many levels OpinionWalk will search on the network. Lines 5-14 update the indirect opinion Ωij iteratively. Line 5 considers all users, other than i, as the trustees. Lines 7-12 combines all opinions derived from 60 1 2, ... j j njw w w é ù ê ú ê ú ê ú ë û … ( 1) ( 1) ( 1) 1 2 1 2 ... ... k k k i i in j j njw w w - - -W W W Q ( 1)kY -TM ( )kY … ( 1) 1 ( 1) ( 1) , ... , ... k i k ij k in - - - é ùW ê ú ê ú ê úW ê ú ê ú ê ú Wê úë û ( ) 1 ( ) ( ) , ... , ... k i k ij k in é ùW ê ú ê ú ê úW ê ú ê ú ê ú Wê úë û Δ Δ Δ Figure 5.2: A general view of the “opinion walk” operation. Algorithm 5.2: OpinionWalk(G, i, H). Require: A directed graph G with a trustor i and the maximum searching level H. Ensure: i’s opinion j where j 6= i. 1: Initialize M and Y (1) i based on G 2: k ← 1 3: while k < H do 4: k ← k + 1 5: for all columns cj ∈ M s.t. j 6= i do 6: Ω (k) ij ← O 7: for all direct opinions ωsj ∈ cj s.t. ωsj 6= O do 8: Ω (k−1) is ← Y (k−1) i [s] 9: if Ω (k−1) is 6= O then 10: Ω (k) ij ← Θ(Ω (k) ij ,∆(Ω (k−1) is , ωsj)) 11: end if 12: end for 13: Y (k) i [j]← Ω (k) ij 14: end for 15: end while 16: return Y (k) i ωsj 6= O. Line 8 obtains i’s indirect opinion on one of the predecessors of j, e.g., s. If this opinion already exists, i discounts s’s opinion on j to update Ω (k) ij at line 61 9. Otherwise, it checks another predecessor. Line 10 combines all opinions that are currently computed from ωsj 6= O. Note that line 10 essentially combines opinions one by one, so Ω (k) ij equals to Θ(∆(Ω (k−1) i1 , ω1j), · · · ,Θ(∆(Ω (k−1) in−1 , ωn−1j),∆(Ω (k−1) in , ωnj))). Because the combining operation is associative (see corollary 5.0.2), the above equation is the same as the following form: Θ ( ∆(Ω (k−1) i1 , ω1j), · · · ,∆(Ω (k−1) in , ωnj) ) . After processing all users connecting to j, at line 13, the newly computed Ωij is used to update the corresponding element in the individual opinion vector. When i’s opinions on all possible j’s are updated, at line 14, OpinionWalk searches the next level. Finally, the vector Y (k) i will contain i’s opinions about the trustworthiness of all other users. Illustration of the OpinionWalk Algorithm In this section, we use the example shown in Fig. 4.5(a) to illustrate how OpinionWalk is used to compute the trustworthiness of all users (B, C and D), from the perspective of A. The opinion matrix of the corresponding graph in Fig. 4.5(a) can be expressed as 62 MT =  O O O O ωAB O O O ωAC ωBC O O O ωBD ωCD O  . Because we want to evaluate A’s opinions on other users, the algorithm starts from A and ends at D. Hence, we set the initial individual opinion vector as Y (0) A = [ Ω (0) AA,Ω (0) AB,Ω (0) AC ,Ω (0) AD ]T = [I,O,O,O]T. After the initialization, the algorithm will go through several iterations that can be expressed as Y (k) A = MT � Y (k−1) A =  Θ(∆(Ω (k−1) AA , ωAA)), Θ(∆(Ω (k−1) AA , ωAB)), Θ(∆(Ω (k−1) AA , ωAC),∆(Ω (k−1) AB , ωBC)), Θ(∆(Ω (k−1) AB , ωBD),∆(Ω (k−1) AC , ωCD))  , where k is the number of iterations. 63 A D B C (1) ABΩ վ (1) AAΩ → (1) ACΩ վ (1) ADΩ → O ( 0 )( , )AA ABω∆ Ω ( 0 )( , )AA ACω∆ Ω Unchanged→ Changedվ (a) 1st iteration A D B C (2) ABΩ → (2) AAΩ → (2) ACΩ վ (2) ADΩ վ (1)( , )AB BDω∆ Ω (1)( , )AC CDω∆ Ω Unchanged→ Changedվ (1)( , )AB BCω∆ Ω (1)( , )AB BDω∆ Ω (1)( , )AC CDω∆ Ω Θ (b) 2nd iteration A D B C (3) ABΩ → (3) AAΩ → (3) ACΩ → (3) ADΩ վ Unchanged→ Changedվ ( 2 )( , )AB BCω∆ Ω (1)( , )AB BDω∆ Ω (1)( , )AC CDω∆ Ω Θ ( 2 )( , )AC CDω∆ Ω (c) 3rd iteration A D B C (4) ABΩ → (4) AAΩ → (4) ACΩ → (4) ADΩ → Unchanged→ Changedվ (1)( , )AB BDω∆ Ω ( 2 )( , )AC CDω∆ Ω (d) 4th iteration Figure 5.3: Illustration of how OpinionWalk processes the bridge topology. The dashed box shows how the combining operation in OpinionWalk works. The first iteration is shown Fig. 5.3(a) and can be expressed as Y (1) A = MT � Y (0) A =  Θ(∆(I, I)), Θ(∆(I, ωAB)), Θ(∆(I, ωAC),∆(O, ωBC)), Θ(∆(O, ωBD),∆(O, ωCD))  = [I,Θ(O,∆(I, ωAB)),Θ(O,∆(I, ωAC)),O]T = [I, ωAB, ωAC ,O]T. 64 In this iteration, because both Ω (0) AB and Ω (0) AC are still O, we have Ω (1) AD = Θ(∆(Ω (0) AB, ωBD),∆(Ω (0) AC , ωCD)) = Ω (0) AD. The result of the second iteration is shown in Fig. 5.3(b) and can be expressed as Y (2) A = MT � Y (1) A = [ Ω (2) AA,Ω (2) AB,Ω (2) AC ,Ω (2) AD ]T =  Θ(∆(I, I)), Θ(∆(I, ωAB)), Θ(∆(I, ωAC),∆(ωAB, ωBC)), Θ(∆(ωAB, ωBD),∆(ωAC , ωCD))  = [I, ωAB,Θ(ωAC ,∆(ωAB, ωBC)), Θ(∆(ωAB, ωBD),∆(ωAC , ωCD))]T. In this iteration, as both Ω (1) AB and Ω (1) AC hchanged (compared to Ω (0) AB and Ω (0) AC), we have Ω (2) AD = Θ(∆(Ω (1) AB, ωBD),∆(Ω (1) AC , ωCD)) = Θ(∆(Ω (1) AB, ωBD),∆(Ω (1) AC , ωCD)). The result of the third iteration shown in Fig. 5.3(c) reflects the following computa- tion. 65 Y (3) A = MT � Y (2) A = [ Ω (3) AA,Ω (3) AB,Ω (3) AC ,Ω (3) AD ]T =  Θ(∆(I,O)), Θ(∆(I, ωAB)), Θ(∆(I, ωAC),∆(ωAB, ωBC)), Θ(∆(ωAB, ωBD),∆(Θ(ωAC ,∆(ωAB, ωBC)), ωCD))  = [I, ωAB,Θ(ωAC ,∆(ωAB, ωBC)) , Θ(∆(ωAB, ωBD),∆(Θ(ωAC ,∆(ωAB, ωBC)), ωCD))]T. It is worth mentioning that Ω (2) AB did not change, but Ω (2) AC has changed, so we update ΩAD by substituting Ω (1) AC with Ω (2) AC as follows. Ω (3) AD = Θ(∆(Ω (2) AB, ωBD),∆(Ω (2) AC , ωCD)) = Θ(∆(Ω (1) AB, ωBD),∆(Ω (2) AC , ωCD)). In the end, the fourth iteration (see Fig. 5.3(d)) can be expressed as Y (4) A = MT � Y (3) A = [ Ω (4) AA,Ω (4) AB,Ω (4) AC ,Ω (4) AD ]T =  Θ(∆(I,O)), Θ(∆(I, ωAB)), Θ(∆(I, ωAC),∆(ωAB, ωBC)), Θ(Θ(ωAC ,∆(ωAB, ωBC),∆(ωAB, ωBC))  = [I, ωAB,Θ(ωAC ,∆(ωAB, ωBC)) , Θ(∆(ωAB, ωBD),∆(Θ(ωAC ,∆(ωAB, ωBC)), ωCD))]T. 66 In this iteration, neither Ω (3) AB nor Ω (3) AC changed, so we have Ω (4) AD = Θ(∆(Ω (3) AB, ωBD),∆(Ω (3) AC , ωCD)) = Ω (3) AD. The components in the final individual opinion vector are Ω (4) AA = I, Ω (4) AB = ωAB, Ω (4) AC = Θ(∆(ωAB, ωBC), ωAC), Ω (4) AD = Θ(∆(ωAB, ωBD),∆(Θ(ωAC ,∆(ωAB, ωBC)), ωCD)). which are exactly the same as those obtained by the AT algorithm. Correctness of OpinionWalk To prove OpinionWalk equivalently implements the AT algorithm, we first show both AT and OpinionWalk generate the same result if network topology is either series or parallel. Then, we show this is true for arbitrary network topologies. If we zoom into a trust social network, two edges can be connected in series if they are incident to a vertex of degree 2, or in parallel if they join the same pair of distinct vertices. Therefore, two users can be connected in a series topology shown in Fig. 5.4(a), or a parallel topology shown in Fig. 5.4(b). Note that the paths from i to s1, s2, · · · , sm in Fig. 5.4(b) are disjoint, i.e., no sharing edges along the paths. 67 i s1 sm j (a) Series topology i s1 sm j . . . (b) Parallel topology Figure 5.4: Illustration of two fundamental topologies in an OSN. Series Network Topology Lemma 5.0.1. Given two users i and j who are connected b