Show simple item record

dc.contributor.authorHARRISON, WILLIAMen
dc.date.accessioned2011-05-25T09:28:55Z
dc.date.available2011-05-25T09:28:55Z
dc.date.createdMay 2-6, 2011en
dc.date.issued2011en
dc.date.submitted2011en
dc.identifier.citationDominik Dahlem, Jim Dowling, William Harrison, Cognitive Policy Learner: Biasing Winning or Losing Strategies, Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan, May 2-6, 2011, 2011en
dc.identifier.otherYen
dc.descriptionPUBLISHEDen
dc.descriptionTaipei, Taiwanen
dc.description.abstractIn continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits ?uctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magni?es or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance.en
dc.language.isoenen
dc.rightsYen
dc.subjectDistributed Arti?cial Intelligenceen
dc.subjectMultiagent Reinforcement Learningen
dc.subjectStochastic Policiesen
dc.titleCognitive Policy Learner: Biasing Winning or Losing Strategiesen
dc.title.alternativeProceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systemsen
dc.typeConference Paperen
dc.type.supercollectionscholarly_publicationsen
dc.type.supercollectionrefereed_publicationsen
dc.identifier.peoplefinderurlhttp://people.tcd.ie/harrisowen
dc.identifier.rssinternalid70328en
dc.subject.TCDThemeSmart & Sustainable Planeten
dc.contributor.sponsorScience Foundation Ireland (SFI)en
dc.identifier.urihttp://hdl.handle.net/2262/55995


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record