P a is the loss that the other agent causes to this

Info icon This preview shows pages 7–9. Sign up to view the full content.

suggested Web service. ( P A ) is the loss that the other agent causes to this one by being obeyed in its place. Consider the Q-learning process, when agent k is the winner and has its Web service executed, all other agents except k update their W values as follows: W i ( x ) ( Q i ( x, a i ) ( r i + γmax b a Q i ( y, b ))) , (10) where the reward r i and the next state y are caused by the agent k than by this agent itself. This process is described by Algorithm 1. 3.2 Multiple Policy Multi-objective Service Composition In the second algorithm, the multiple policy service composition problem is solved by introducing the concept of the convex hull into Q-learning based Web service composition [8]. The convex hull is defined as the smallest convex set that contains all of a set of points. In this case, we mean the points that lie on the boundary of this convex set, which are of course the extreme points, the ones that are maximal in some direction. This is somewhat similar to the Pareto front, since both are maxima over trade-offs in linear domains. The proposed algorithm exploits the fact that the Pareto optimal set of the Q-vectors is the same as the convex hull of these Q-vectors. In order to acquire the set of Pareto optimal service selection policies for all the QoS objectives, the set of the vertices in the convex hull of the Q-vectors at state s is updated by the value iteration method: ˆ Q ( s, a ) = (1 α ) ˆ Q ( s, a ) + α r ( s, a ) + γhull a ˆ Q ( s , a ) , (11) where ˆ Q ( s, a ) is the vertices of the convex hull of all possible Q -value vectors for taking action a at state s , α is the learning rate, γ is the discount value, r is the immediate reward, the operator hull means to extract the set of the vertices of the convex hull from the set of vectors. Algorithm 2. Multiple Policy Algorithm initialize ˆ Q ( s, a ) arbitrarily s, a while not converged do for all s S, a A do ˆ Q ( s, a ) = (1 α ) ˆ Q ( s, a ) + α r ( s, a ) + γhull a ˆ Q ( s , a ) end for end while
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

Multi-Objective Service Composition Using Reinforcement Learning 305 Given these definitions, now we can rewrite the Q-learning based Web service composition algorithm [8] in terms of operations on the convex hull of Q-values. In the proposed algorithm, an action is selected based on the dominance relation between Q-vectors following the -greedy exploration strategy. This algorithm can be viewed as an extension to [8], where instead of repeatedly backing up maximal expected rewards, it backs up the set of expected rewards that are maximal for some set of linear preferences. The proposed multiple policy Web service composition algorithm is illustrated in Algorithm 2. 4 Simulation Results and Analysis Two simulation experiments have been conducted to evaluate the proposed algo- rithms from different perspectives. The first experiment examines the ability of the single policy algorithm in composing Web services with Multiple QoS criteria and unknown user preferences. The second experiment examines the efficiency of the second algorithm in learning the set of Pareto optimal compositions consid-
Image of page 8
Image of page 9
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern