3/16/2024 0 Comments Greenfoot set delay in sound![]() ![]() in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. ![]() The use of general descriptive names, registered names, trademarks, service marks, etc. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. ![]() ISBN 978-9-5 ISBN 978-0-1 (eBook) © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. Springer Nature works continuously to further the development of tools for the production of books and on the related technologies to support the authors. A subsequent human revision was done primarily in terms of content, so that the book will read stylistically differently from a conventional translation. The translation was done with the help of artificial intelligence (machine translation by the service ). Uwe Lorenz Reinforcement Learning From Scratch Understanding Current Approaches with Examples in Java and Greenfoot Reinforcement Learning From Scratch Uwe Lorenz Reinforcement Learning From Scratch Understanding Current Approaches - with Examples in Java and Greenfoot Uwe Lorenz Neckargemünd Baden-Württemberg Germany This book is a translation of the original German edition “Reinforcement Learning” by Lorenz, Uwe, published by Springer-Verlag GmbH, DE in 2020. Exploitation 4.2 Retroactive Processing of Experience (“Model-Free Reinforcement Learning”) 4.2.1 Goal-Oriented Learning (“Value-Based”) Subsequent evaluation of complete episodes (“Monte Carlo” Method) Immediate Valuation Using the Temporal Difference (Q- and SARSA Algorithm) Consideration of the Action History (Eligibility Traces) 4.2.2 Policy Search Monte Carlo Tactics Search Evolutionary Strategies Monte Carlo Policy Gradient (REINFORCE) 4.2.3 Combined Methods (Actor-Critic) “Actor-Critic” Policy Gradients Technical Improvements to the Actor-Critic Architecture Feature Vectors and Partially Observable Environments 4.3 Exploration with Predictive Simulations (“Model-Based Reinforcement Learning”) 4.3.1 Dyna-Q 4.3.2 Monte Carlo Rollout 4.3.3 Artificial Curiosity 4.3.4 Monte Carlo Tree Search (MCTS) 4.3.5 Remarks on the Concept of Intelligence 4.4 Systematics of the Learning Methods Bibliography 5: Artificial Neural Networks as Estimators for State Values and the Action Selection 5.1 Artificial Neural Networks 5.1.1 Pattern Recognition with the Perceptron 5.1.2 The Adaptability of Artificial Neural Networks 5.1.3 Backpropagation Learning 5.1.4 Regression with Multilayer Perceptrons 5.2 State Evaluation with Generalizing Approximations 5.3 Neural Estimators for Action Selection 5.3.1 Policy Gradient with Neural Networks 5.3.2 Proximal Policy Optimization 5.3.3 Evolutionary Strategy with a Neural Policy Bibliography 6: Guiding Ideas in Artificial Intelligence over Time 6.1 Changing Guiding Ideas 6.2 On the Relationship Between Humans and Artificial Intelligence Bibliography Citation preview Table of contents : Preface Introduction Contents 1: Reinforcement Learning as a Subfield of Machine Learning 1.1 Machine Learning as Automated Processing of Feedback from the Environment 1.2 Machine Learning 1.3 Reinforcement Learning with Java Bibliography 2: Basic Concepts of Reinforcement Learning 2.1 Agents 2.2 The Policy of the Agent 2.3 Evaluation of States and Actions (Q-Function, Bellman Equation) Bibliography 3: Optimal Decision-Making in a Known Environment 3.1 Value Iteration 3.1.1 Target-Oriented Condition Assessment (“Backward Induction”) 3.1.2 Policy-Based State Valuation (Reward Prediction) 3.2 Iterative Policy Search 3.2.1 Direct Policy Improvement 3.2.2 Mutual Improvement of Policy and Value Function 3.3 Optimal Policy in a Board Game Scenario 3.4 Summary Bibliography 4: Decision-Making and Learning in an Unknown Environment 4.1 Exploration vs.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |