Cognitive Policy Learner: Biasing Winning or Losing Strategies

HARRISON, WILLIAM

dc.contributor.author	HARRISON, WILLIAM	en
dc.date.accessioned	2011-05-25T09:28:55Z
dc.date.available	2011-05-25T09:28:55Z
dc.date.created	May 2-6, 2011	en
dc.date.issued	2011	en
dc.date.submitted	2011	en
dc.identifier.citation	Dominik Dahlem, Jim Dowling, William Harrison, Cognitive Policy Learner: Biasing Winning or Losing Strategies, Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan, May 2-6, 2011, 2011	en
dc.identifier.other	Y	en
dc.description	PUBLISHED	en
dc.description	Taipei, Taiwan	en
dc.description.abstract	In continuous learning settings stochastic stable policies are often necessary to ensure that agents continuously adapt to dynamic environments. The choice of the decentralised learning system and the employed policy plays an important role in the optimisation task. For example, a policy that exhibits ?uctuations may also introduce non-linear effects which other agents in the environment may not be able to cope with and even amplify these effects. In dynamic and unpredictable multiagent environments these oscillations may introduce instabilities. In this paper, we take inspiration from the limbic system to introduce an extension to the weighted policy learner, where agents evaluate rewards as either positive or negative feedback, depending on how they deviate from average expected rewards. Agents have positive and negative biases, where a bias either magni?es or depresses a positive or negative feedback signal. To contain the non-linear effects of biased rewards, we incorporate a decaying memory of past positive and negative feedback signals to provide a smoother gradient update on the probability simplex, spreading out the effect of the feedback signal over time. By splitting the feedback signal, more leverage on the win or learn fast (WoLF) principle is possible. The cognitive policy learner is evaluated using a small queueing network and compared with the fair action and weighted policy learner. Emphasis is placed on analysing the dynamics of the learning algorithms with respect to the stability of the queueing network and the overall queueing performance.	en
dc.language.iso	en	en
dc.rights	Y	en
dc.subject	Distributed Arti?cial Intelligence	en
dc.subject	Multiagent Reinforcement Learning	en
dc.subject	Stochastic Policies	en
dc.title	Cognitive Policy Learner: Biasing Winning or Losing Strategies	en
dc.title.alternative	Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems	en
dc.type	Conference Paper	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/harrisow	en
dc.identifier.rssinternalid	70328	en
dc.subject.TCDTheme	Smart & Sustainable Planet	en
dc.contributor.sponsor	Science Foundation Ireland (SFI)	en
dc.identifier.uri	http://hdl.handle.net/2262/55995

Files in this item

Name:: Cognitive Policy Learner- Biasing ...
Size:: 1.008Mb
Format:: PDF
Description:: Published (publisher's copy) - ...

View/Open

Name:: license.txt
Size:: 3.243Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Cognitive Policy Learner: Biasing Winning or Losing Strategies

Files in this item

This item appears in the following Collection(s)