Q-LEARNING 中文是什么意思

Reinforced learning, see Q-learning;

强化学习-Qlearning.

The Q-learning update formula is:.

此时的Q-Learning的更新公式为.

One approach to the above-discussed problem is called Q-learning.

解决问题的一种方法叫做Q-learning。

Q-learning is about learning Q-values through observations.

Q学习是通过观察来学习Q值的。

A simple description of Q-learning can be summarized as follows:.

Q学习的简单描述可以总结如下：.

Q-learning is one of the easiest Reinforcement Learning algorithms.

Q-Learning是最著名的强化学习算法之一。

A simple description of Q-learning can be summarized as follows:.

Q学习算法的简单描述可以总结如下：.

There is a simple procedure to learn all the Q-values called Q-learning.

它只需要一个简单的被称为Q学习的过程来学习所有的Q值。

These include Q-Learning, SARSA and some other variants.

这些包括Q-学习，SARSA和其他一些变体。

The tools you will use will be TD-Learning, Q-Learning and genetic algorithms.

你将使用的工具将是TD-Learning，Q-Learning和遗传算法。

These include Q-Learning, SARSA and some other variants.

这其中包括Q-Learning、SARSA及其他算法。

The tools you will use will be TD-Learning, Q-Learning and genetic algorithms.

你将使用的工具包括TD-Learning、Q-Learning和遗传算法。

These include Q-Learning, SARSA and some other variants.

这些包括Q-Learning，SARSA和其他一些变体。

The tools that you would use include TD-Learning, Q-Learning and genetic algorithms.

你将使用的工具包括TD-Learning、Q-Learning和遗传算法。

These include Q-Learning, SARSA and some other variants.

这里面包括Q-Learning,SARSA和一些其它变型。

Machine learning approaches such as reinforcement learning andin particular, Q-learning might be applicable in this context.

强化学习、尤其是Q学习等机械学习方法可能适用于这种情况。

Q-learning is a values-based learning algorithm in reinforcement learning.

Q-Learning是强化学习中基于价值的学习算法。

Reinforcement learning(Q-learning, temporal difference learning).

Q-Learning以及时间差学习（Temporaldifferencelearning）.

Q-Learning is considered to be one of the most important breakthroughs in Reinforcement Learning.

Q-Learning是最著名的强化学习算法之一。

This is formulated as a Markov Decision Process(MDP), and Q-learning is used to perform the optimization.

这被形式化为了一个马尔可夫决策过程（MDP），然后使用Q学习来执行优化。

The popular Q-learning algorithm is known to overestimate action values under certain conditions.

众所周知，流行的Q学习算法会高估某些条件下的动作值。

In this course, you will be introduced to the foundation of RL methods,such as value/policy iteration, Q-learning, policy gradient, and many more.

在这里您将发现：-RL方法的基础：价值/政策迭代，q学习，政策梯度等。

In 2015, DeepMind showed its Deep Q-learning AI figuring out how to play Atari breakout.

年，DeepMind展示了它的深度Q-learningAI，该AI能够解决如何玩Ataribreakout。

Currently, there are a multitude of algorithms that can be used to perform TD control,including Sarsa, Q-learning, and Expected Sarsa.

目前，有大量算法可用于执行TD控制，包括Sarsa、Q-learning和ExpectedSarsa。

Additionally, Q-learning can handle problems with stochastic transitions and rewards, without requiring adaptations.

此外，Q学习可以处理随机过渡和奖励的问题，而不需要任何适应。

Unlike policy learning, Q-Learning takes two inputs- state and action- and returns a value for each pair.

与策略学习不同，Q-Learning算法有两个输入，分别是状态和动作，并为每个状态动作对返回对应值。

By contrast, Q-learning has no constraint over the next action, as long as it maximizes the Q-value for the next state.

相比之下，Q-learning对下一个动作没有约束，只要它能较大化下一个状态的Q值就行了。

Q-LEARNING 中文是什么意思 - 中文翻译

在英语中使用 Q-learning 的示例及其翻译为中文

顶级字典查询

英语 - 中文

Q-LEARNING 中文是什么意思 - 中文翻译

在 英语 中使用 Q-learning 的示例及其翻译为 中文

顶级字典查询

英语 - 中文

在英语中使用 Q-learning 的示例及其翻译为中文