tokenpocket官方app下载网站|openai gym
tokenpocket官方app下载网站|openai gym
Gym Documentation
Gym Documentation
Contents
Menu
Expand
Light mode
Dark mode
Auto light/dark mode
Hide navigation sidebar
Hide table of contents sidebar
Toggle site navigation sidebar
Gym Documentation
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
Gym Documentation
Introduction
Basic Usage
API
Core
Spaces
Wrappers
Vector
Utils
Environments
AtariToggle navigation of Atari
Adventure
Air Raid
Alien
Amidar
Assault
Asterix
Asteroids
Atlantis
Bank Heist
Battle Zone
Beam Rider
Berzerk
Bowling
Boxing
Breakout
Carnival
Centipede
Chopper Command
Crazy Climber
Defender
Demon Attack
Double Dunk
Elevator Action
Enduro
FishingDerby
Freeway
Frostbite
Gopher
Gravitar
Hero
IceHockey
Jamesbond
JourneyEscape
Kangaroo
Krull
Kung Fu Master
Montezuma Revenge
Ms Pacman
Name This Game
Phoenix
Pitfall
Pong
Pooyan
PrivateEye
Qbert
Riverraid
Road Runner
Robot Tank
Seaquest
Skiings
Solaris
SpaceInvaders
StarGunner
Tennis
TimePilot
Tutankham
Up n’ Down
Venture
Video Pinball
Wizard of Wor
Zaxxon
MuJoCoToggle navigation of MuJoCo
Ant
Half Cheetah
Hopper
Humanoid Standup
Humanoid
Inverted Double Pendulum
Inverted Pendulum
Reacher
Swimmer
Walker2D
Toy TextToggle navigation of Toy Text
Blackjack
Taxi
Cliff Walking
Frozen Lake
Classic ControlToggle navigation of Classic Control
Acrobot
Cart Pole
Mountain Car Continuous
Mountain Car
Pendulum
Box2DToggle navigation of Box2D
Bipedal Walker
Car Racing
Lunar Lander
Third Party Environments
Tutorials
Make your own custom environment
Vectorising your environments
Development
Github
Contribute to the Docs
Back to top
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
Gym is a standard API for reinforcement learning, and a diverse collection of reference environments#
The Gym interface is simple, pythonic, and capable of representing general RL problems:
import gym
env = gym.make("LunarLander-v2", render_mode="human")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = policy(observation) # User-defined policy function
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Copyright © 2022
Made with Sphinx and @pradyunsg's
Furo
Gym Documentation
All development of Gym has been moved to Gymnasium, a new package in the Farama Foundation that's maintained by the same team of developers who have maintained Gym for the past 18 months. If you're already using the latest release of Gym (v0.26.2), then you can switch to v0.27.0 of Gymnasium by simply replacing import gym with import gymnasium as gym with no additional steps. Gym will not be receiving any future updates or bug fixes, and no further changes will be made to the core API in Gymnasium.
Read more about the Farama Foundation and the backstory of the transition from Gym to Gymnasium
Stay in Gym
Go to Gymnasium website
OpenAI Gym Beta
OpenAI Gym Beta
CloseSearch Submit Skip to main contentSite NavigationResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTSafetyCompanyAboutBlogCareersResidencyCharterSecurityCustomer storiesSearch Navigation quick links Log inTry ChatGPTMenu Mobile Navigation CloseSite NavigationResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTSafetyCompanyAboutBlogCareersResidencyCharterSecurityCustomer stories Quick Links Log inTry ChatGPTSearch Submit ResearchOpenAI Gym BetaWe’re releasing the public beta of OpenAI Gym, a toolkit for developing and comparing reinforcement learning (RL) algorithms. It consists of a growing suite of environments (from simulated robots to Atari games), and a site for comparing and reproducing results.April 27, 2016More resourcesRead paperEnvironments, Games, Reinforcement learning, Robotics, Software engineering, Open source, Release, PublicationOpenAI Gym is compatible with algorithms written in any framework, such as Tensorflow and Theano. The environments are written in Python, but we’ll soon make them easy to use from any language. We originally built OpenAI Gym as a tool to accelerate our own RL research. We hope it will be just as useful for the broader community.Getting startedIf you’d like to dive in right away, you can work through our tutorial. You can also help out while learning by reproducing a result.Why RL?Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari results, BRETT from Pieter Abbeel’s group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.However, RL research is also slowed down by two factors:The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.OpenAI Gym is an attempt to fix both problems.The environmentsOpenAI Gym provides a diverse suite of environments that range from easy to difficult and involve many different kinds of data. We’re starting out with the following collections:Classic control and toy text: complete small-scale tasks, mostly from the RL literature. They’re here to get you started.Algorithmic: perform computations such as adding multi-digit numbers and reversing sequences. One might object that these tasks are easy for a computer. The challenge is to learn these algorithms purely from examples. These tasks have the nice property that it’s easy to vary the difficulty by varying the sequence length.Atari: play classic Atari games. We’ve integrated the Arcade Learning Environment (which has had a big impact on reinforcement learning research) in an easy-to-install form.Board games: play Go on 9x9 and 19x19 boards. Two-player games are fundamentally different than the other settings we’ve included, because there is an adversary playing against you. In our initial release, there is a fixed opponent provided by Pachi, and we may add other opponents later (patches welcome!). We’ll also likely expand OpenAI Gym to have first-class support for multi-player games.2D and 3D robots: control a robot in simulation. These tasks use the MuJoCo physics engine, which was designed for fast and accurate robot simulation. Included are some environments from a recent benchmark by UC Berkeley researchers (who incidentally will be joining us this summer). MuJoCo is proprietary software, but offers free trial licenses.Over time, we plan to greatly expand this collection of environments. Contributions from the community are more than welcome.Each environment has a version number (such as Hopper-v0). If we need to change an environment, we’ll bump the version number, defining an entirely new task. This ensures that results on a particular environment are always comparable.EvaluationsWe’ve made it easy to upload results to OpenAI Gym. However, we’ve opted not to create traditional leaderboards. What matters for research isn’t your score (it’s possible to overfit or hand-craft solutions to particular tasks), but instead the generality of your technique.We’re starting out by maintaining a curated list of contributions that say something interesting about algorithmic capabilities. Long-term, we want this curation to be a community effort rather than something owned by us. We’ll necessarily have to figure out the details over time, and we’d would love your help in doing so.We want OpenAI Gym to be a community effort from the beginning. We’ve starting working with partners to put together resources around OpenAI Gym:NVIDIA: technical Q&A with John.Nervana: implementation of a DQN OpenAI Gym agent.Amazon Web Services (AWS): $250 credit vouchers for select OpenAI Gym users. If you have an evaluation demonstrating the promise of your algorithm and are resource-constrained from scaling it up, ping us for a voucher. (While supplies last!)During the public beta, we’re looking for feedback on how to make this into an even better tool for research. If you’d like to help, you can try your hand at improving the state-of-the-art on each environment, reproducing other people’s results, or even implementing your own environments. Also please join us in the community chat!AuthorsAuthorsGreg BrockmanResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTCompanyAboutBlogCareersCharterSecurityCustomer storiesSafetyOpenAI © 2015 – 2024Terms & policiesPrivacy policyBrand guidelinesSocialTwitterYouTubeGitHubSoundCloudLinkedInBack to top
GitHub - openai/gym: A toolkit for developing and comparing reinforcement learning algorithms.
GitHub - openai/gym: A toolkit for developing and comparing reinforcement learning algorithms.
Skip to content
Toggle navigation
Sign in
Product
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
For
Enterprise
Teams
Startups
Education
By Solution
CI/CD & Automation
DevOps
DevSecOps
Resources
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
openai
/
gym
Public
Notifications
Fork
8.5k
Star
33.6k
A toolkit for developing and comparing reinforcement learning algorithms.
www.gymlibrary.dev
License
View license
33.6k
stars
8.5k
forks
Branches
Tags
Activity
Star
Notifications
Code
Issues
80
Pull requests
5
Actions
Projects
0
Wiki
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights
openai/gym
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
masterBranchesTagsGo to fileCodeFolders and filesNameNameLast commit messageLast commit dateLatest commit History1,757 Commits.github.github binbin gymgym teststests .gitignore.gitignore .pre-commit-config.yaml.pre-commit-config.yaml CODE_OF_CONDUCT.rstCODE_OF_CONDUCT.rst CONTRIBUTING.mdCONTRIBUTING.md LICENSE.mdLICENSE.md README.mdREADME.md py.Dockerfilepy.Dockerfile pyproject.tomlpyproject.toml requirements.txtrequirements.txt setup.pysetup.py test_requirements.txttest_requirements.txt View all filesRepository files navigationREADMELicense
Important Notice
The team that has been maintaining Gym since 2021 has moved all future development to Gymnasium, a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates. Please switch over to Gymnasium as soon as you're able to do so. If you'd like to read more about the story behind this switch, please check out this blog post.
Gym
Gym is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API. Since its release, Gym's API has become the field standard for doing this.
Gym documentation website is at https://www.gymlibrary.dev/, and you can propose fixes and changes to it here.
Gym also has a discord server for development purposes that you can join here: https://discord.gg/nHg2JRN489
Installation
To install the base Gym library, use pip install gym.
This does not include dependencies for all families of environments (there's a massive number, and some can be problematic to install on certain systems). You can install these dependencies for one family like pip install gym[atari] or use pip install gym[all] to install all dependencies.
We support Python 3.7, 3.8, 3.9 and 3.10 on Linux and macOS. We will accept PRs related to Windows, but do not officially support it.
API
The Gym API's API models environments as simple Python env classes. Creating environment instances and interacting with them is very simple- here's an example using the "CartPole-v1" environment:
import gym
env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Notable Related Libraries
Please note that this is an incomplete list, and just includes libraries that the maintainers most commonly point newcommers to when asked for recommendations.
CleanRL is a learning library based on the Gym API. It is designed to cater to newer people in the field and provides very good reference implementations.
Tianshou is a learning library that's geared towards very experienced users and is design to allow for ease in complex algorithm modifications.
RLlib is a learning library that allows for distributed training and inferencing and supports an extraordinarily large number of features throughout the reinforcement learning space.
PettingZoo is like Gym, but for environments with multiple agents.
Environment Versioning
Gym keeps strict versioning for reproducibility reasons. All environments end in a suffix like "_v0". When changes are made to environments that might impact learning results, the number is increased by one to prevent potential confusion.
MuJoCo Environments
The latest "_v4" and future versions of the MuJoCo environments will no longer depend on mujoco-py. Instead mujoco will be the required dependency for future gym MuJoCo environment versions. Old gym MuJoCo environment versions that depend on mujoco-py will still be kept but unmaintained.
To install the dependencies for the latest gym MuJoCo environments use pip install gym[mujoco]. Dependencies for old MuJoCo environments can still be installed by pip install gym[mujoco_py].
Citation
A whitepaper from when Gym just came out is available https://arxiv.org/pdf/1606.01540, and can be cited with the following bibtex entry:
@misc{1606.01540,
Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
Title = {OpenAI Gym},
Year = {2016},
Eprint = {arXiv:1606.01540},
}
Release Notes
There used to be release notes for all the new Gym versions here. New release notes are being moved to releases page on GitHub, like most other libraries do. Old notes can be viewed here.
About
A toolkit for developing and comparing reinforcement learning algorithms.
www.gymlibrary.dev
Resources
Readme
License
View license
Activity
Custom properties
Stars
33.6k
stars
Watchers
1.1k
watching
Forks
8.5k
forks
Report repository
Releases
19
0.26.2
Latest
Oct 4, 2022
+ 18 releases
Packages
0
No packages published
Used by 49.5k
+ 49,539
Contributors
372
+ 358 contributors
Languages
Python
99.9%
Other
0.1%
Footer
© 2024 GitHub, Inc.
Footer navigation
Terms
Privacy
Security
Status
Docs
Contact
Manage cookies
Do not share my personal information
You can’t perform that action at this time.
Tutorials - Gym Documentation
Tutorials - Gym Documentation
Contents
Menu
Expand
Light mode
Dark mode
Auto light/dark mode
Hide navigation sidebar
Hide table of contents sidebar
Toggle site navigation sidebar
Gym Documentation
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
Gym Documentation
Introduction
Basic Usage
API
Core
Spaces
Wrappers
Vector
Utils
Environments
AtariToggle navigation of Atari
Adventure
Air Raid
Alien
Amidar
Assault
Asterix
Asteroids
Atlantis
Bank Heist
Battle Zone
Beam Rider
Berzerk
Bowling
Boxing
Breakout
Carnival
Centipede
Chopper Command
Crazy Climber
Defender
Demon Attack
Double Dunk
Elevator Action
Enduro
FishingDerby
Freeway
Frostbite
Gopher
Gravitar
Hero
IceHockey
Jamesbond
JourneyEscape
Kangaroo
Krull
Kung Fu Master
Montezuma Revenge
Ms Pacman
Name This Game
Phoenix
Pitfall
Pong
Pooyan
PrivateEye
Qbert
Riverraid
Road Runner
Robot Tank
Seaquest
Skiings
Solaris
SpaceInvaders
StarGunner
Tennis
TimePilot
Tutankham
Up n’ Down
Venture
Video Pinball
Wizard of Wor
Zaxxon
MuJoCoToggle navigation of MuJoCo
Ant
Half Cheetah
Hopper
Humanoid Standup
Humanoid
Inverted Double Pendulum
Inverted Pendulum
Reacher
Swimmer
Walker2D
Toy TextToggle navigation of Toy Text
Blackjack
Taxi
Cliff Walking
Frozen Lake
Classic ControlToggle navigation of Classic Control
Acrobot
Cart Pole
Mountain Car Continuous
Mountain Car
Pendulum
Box2DToggle navigation of Box2D
Bipedal Walker
Car Racing
Lunar Lander
Third Party Environments
Tutorials
Make your own custom environment
Vectorising your environments
Development
Github
Contribute to the Docs
Back to top
Toggle Light / Dark / Auto color theme
Toggle table of contents sidebar
Tutorials#
Getting Started With OpenAI Gym: The Basic Building Blocks#
https://blog.paperspace.com/getting-started-with-openai-gym/
A good starting point explaining all the basic building blocks of the Gym API.
Reinforcement Q-Learning from Scratch in Python with OpenAI Gym#
Good Algorithmic Introduction to Reinforcement Learning showcasing how to use Gym API for Training Agents.
https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/
Tutorial: An Introduction to Reinforcement Learning Using OpenAI Gym#
https://www.gocoder.one/blog/rl-tutorial-with-openai-gym
An Introduction to Reinforcement Learning with OpenAI Gym, RLlib, and Google Colab#
https://www.anyscale.com/blog/an-introduction-to-reinforcement-learning-with-openai-gym-rllib-and-google
Intro to RLlib: Example Environments#
https://medium.com/distributed-computing-with-ray/intro-to-rllib-example-environments-3a113f532c70
Ray and RLlib for Fast and Parallel Reinforcement Learning#
https://towardsdatascience.com/ray-and-rllib-for-fast-and-parallel-reinforcement-learning-6d31ee21c96c
Copyright © 2022
Made with Sphinx and @pradyunsg's
Furo
On this page
Tutorials
Getting Started With OpenAI Gym: The Basic Building Blocks
Reinforcement Q-Learning from Scratch in Python with OpenAI Gym
Tutorial: An Introduction to Reinforcement Learning Using OpenAI Gym
An Introduction to Reinforcement Learning with OpenAI Gym, RLlib, and Google Colab
Intro to RLlib: Example Environments
Ray and RLlib for Fast and Parallel Reinforcement Learning
Gym Documentation
All development of Gym has been moved to Gymnasium, a new package in the Farama Foundation that's maintained by the same team of developers who have maintained Gym for the past 18 months. If you're already using the latest release of Gym (v0.26.2), then you can switch to v0.27.0 of Gymnasium by simply replacing import gym with import gymnasium as gym with no additional steps. Gym will not be receiving any future updates or bug fixes, and no further changes will be made to the core API in Gymnasium.
Read more about the Farama Foundation and the backstory of the transition from Gym to Gymnasium
Stay in Gym
Go to Gymnasium website
OpenAI Gym 介绍 - 知乎
OpenAI Gym 介绍 - 知乎切换模式写文章登录/注册OpenAI Gym 介绍Juror8工科博士在读0 简介Gym 是一个用于开发和对比 RL 算法的工具箱,兼容大部分数值计算的库,比如 TensorFlow 和 Theano。Gym 库主要提供了一系列测试环境——environments,方便我们测试,并且它们有共享的数据接口,以便我们部署通用的算法。1 安装Gym 基于 Python 3.5+,安装很简单,直接 pip 就可以:>>> pip install gym 从GitHub安装如果后面想更改或者增加 environments,也可以直接克隆 Git 库,这样会更方便:>>> git clone https://github.com/openai/gym
>>> cd gym
>>> pip install -e . 2 环境测试安装完成后,可以用一个经典的小例子测试一下,CartPole,下面的代码会运行预先设定好的实例 CartPole-v0 1000步,每一步都进行渲染,运行后应该会有窗口弹出渲染结果。>>> import gym
>>> env = gym.make('CartPole-v0')
>>> env.reset()
>>> for _ in range(1000):
... env.render()
... env.step(env.action_space.sample()) # take a random action
>>> env.close() 就像这样:CartPole可能会出现调用的警告,可以不管。如果想试一下其他的运行环境,可以将 CartPole-v0 替换为 MountainCar-v0, MsPacman-v0(需要Atari 依赖项)或 Hopper-v1(需要 MuJoCo 依赖项)。环境都来自 Env 基类。3 观测(Observations)上图展示的是随机动作的运行,如果想表现得好一点,必须要对整个环境的参数有一个掌握。而 environment 带有的 step 函数就会返回我们需要的值:observation( object ):一个基于环境的对象,代表我们对环境的观测。例如,来自相机的像素数据、机器人的关节角度和关节速度,或棋盘游戏中的棋盘状态。我的理解就是强化学习定义中的 state。reward( float ): 同强化学习的 reward,执行前一个动作后获得的奖励。done( boolean ): 是否要重置环境。通常用于判断该 episode 有没有结束。info( dict ):对调试有用的诊断信息。用得比较少(例如,它可以包含环境最后状态变化背后的原始概率)。官方的评价函数是不允许 agent 将这个信息用于学习。这就是经典的“agent-environment loop”的一个实现。每个 timestep,agent 选择一个action,环境返回一个observation和一个reward。马尔科夫循环该过程开始于调用reset(),它返回一个初始observation。因此,之前代码更正确的写法是考虑done标志:>>> import gym
>>> env = gym.make('CartPole-v0')
>>> for i_episode in range(20):
... observation = env.reset()
... for t in range(100):
... env.render()
... print(observation)
... action = env.action_space.sample()
... observation, reward, done, info = env.step(action)
... if done:
... print(f"Episode finished after {t + 1} timesteps")
... break
>>> env.close() 这样就可以看到小棒不断的定住,重启:CartPole输出片段:[-0.061586 -0.75893141 0.05793238 1.15547541]
[-0.07676463 -0.95475889 0.08104189 1.46574644]
[-0.0958598 -1.15077434 0.11035682 1.78260485]
[-0.11887529 -0.95705275 0.14600892 1.5261692 ]
[-0.13801635 -0.7639636 0.1765323 1.28239155]
[-0.15329562 -0.57147373 0.20218013 1.04977545]
Episode finished after 14 timesteps
[-0.02786724 0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191 0.26388759]
[-0.03161324 0.00474768 -0.03443415 -0.04105167]4 空间(Spaces)在上面的例子中,我们是从环境的动作空间中随机地进行采样。但其实动作空间是有明确的定义的,每个 environment 都包含一个动作空间(actionspace)和观测空间(observation_sapce)。它们都是 Space 类的属性。>>> import gym
>>> env = gym,make('CartPole-v0')
>>> print(env.action_space)
Discrete(2)
>>> print(env.observation_space)
Box(4, )Discrete空间为一个固定的非负数范围,因此在这种情况下,有效的 action 为 0 或 1。Box表示一个 n 维的空间,在上面的代码中,有效的范围是一个有4个数的数组,我们还可以用以下代码检查它的边界:>>> print(env.observation_space.high)
array([ 2.4, inf, 0.20943951, inf])
>>> print(env.observation_space.low)
array([-2.4, -inf, -0.20943951, -inf]) 5 其他可用环境Gym 本身自带很多种可用的环境,从简单到困难,涉及许多不同类型的数据,可以从以下网址中看到:主要包括:经典控制和简单测试:完成小规模任务,主要来自于 RL 文献。可以用这些来入门。算法:执行计算,例如添加多位数和反转序列。可以很容易地通过改变序列长度来改变难度。雅达利:经典的雅达利游戏。集成了 Arcade Learning 环境(该环境对强化学习研究产生了重大影响),而且很容易安装2D和3D机器人:控制虚拟机器人。这些任务使用 MuJoCo 物理引擎,该引擎专为快速准确的机器人仿真而设计。其中包括 UCB 研究人员最近基准测试的一些环境。5.1 注册(Registry)gym的主要目的是提供大量环境集合,这些环境共享一个通用接口,同时进行版本控制以方便做比较。要列出安装中可用的环境,只需要用:gym.envs.registry >>> from gym import envs
>>> print(envs.registry.all())
[EnvSpec(DoubleDunk-v0), EnvSpec(InvertedDoublePendulum-v0), EnvSpec(BeamRider-v0),
EnvSpec(Phoenix-ram-v0), EnvSpec(Asterix-v0), EnvSpec(TimePilot-v0),
EnvSpec(Alien-v0), EnvSpec(Robotank-ram-v0), EnvSpec(CartPole-v0),
EnvSpec(Berzerk-v0), EnvSpec(Berzerk-ram-v0), EnvSpec(Gopher-ram-v0), ...这会输出所有EnvSpec的对象。每个对象都是一个特定任务的环境,它们会定义所需要的所有参数,包括要运行的试验数和最大步骤数。例如EnvSpec(Hopper-v1)定义了一个环境,其目标是让2D模拟机器人跳跃;EnvSpec(Go9x9-v0)定义了 9x9 棋盘上的围棋游戏。这些环境 ID 被视为不透明的字符串(opaque strings)。为了保证未来的有效比较,这些预定义的环境永远不会以影响性能的方式更改,而只增加新版本。目前是在每个环境后缀上一个版本号,比如v0 v1 v2 等。要把自己的环境添加进来非常容易,只要在程序一开始注册自己的环境预加载即可,之后就能通过gym.make()调用了,注册自己环境的代码可以参考以下网站:编辑于 2021-12-27 15:17深度学习(Deep Learning)强化学习 (Reinforcement Learning)OpenAI赞同 171 条评论分享喜欢收藏申请
OpenAI-Gym学习——Getting Started - 知乎
OpenAI-Gym学习——Getting Started - 知乎切换模式写文章登录/注册OpenAI-Gym学习——Getting Started立伟一线创业斗士!Gym是一个用于开发和比较强化学习算法工具包,它对目标系统不做假设,并且跟现有的库相兼容(比如TensorFlow、Theano)Gym是一个包含众多测试问题的集合库,有不同的环境,我们可以用它去开发自己的强化学习算法,这些环境有共享接口,这样我们可以编写常规算法。安装Gym安装Gym之前,我们需要先安装Python,3.5以上版本,安装代码很简单:pip install gym下载源码从Github上面克隆Gym代码,如果我们准备对Gym进行修改或者添加环境,这份代码就很方便,通过以下代码进行克隆:git clone https://github.com/openai/gym
cd gym
pip install -e .在之后的学习过程中,我们可以通过 pip install -e .[all] 来下载所有环境,在此之前需要安装更多的支持工具,包括 cmake 和最近版本 pip。环境讲解这里通过一个最简单的例子来讲解,我们运行一个CartPole-v0 环境,运行时间周期为1000步,每一步都会实时渲染,我们可以看到一个渲染出来的车型倒立摆模型运动。import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
env.close()通常我们会在车型倒立摆跑出视频界面之前结束仿真,现在我们需要忽略 step() 警告,哪怕是环境已经返回到 done = True。如果想要测试其他项目,那么可以把上述代码中的 CartPole-v0 替换掉,比如 MountainCar-v0,MsPacman-v0(需要Atari支持包) 或者 Hopper-v1 (需要MuJoCo支持包),所有的环境都继承于 Env 基类。观测器环境中的 step 函数返回了我们需要的观测参数,有四个参数,他们分别是:observation(object):特定环境的观测值,比如相机的数据、角度、角速度等等;reward(float):根据上一个操作得出来的奖励值,改值大小总趋向于增加总奖励值的大小;done(boolean):无论是否到了重置环境的时候,大部分任务被分成完全定义的不同环节,如果 done 变为 True 了,那么表明该环节被停止了;info(dict):诊断信息对于调试很有用处。下图表示的是一个经典的“目标——环境”的循环,目标的每一个 action 都会得到环境的一个 observation 和 reward 。整个过程开始于 reset() 函数,它会返回一个 观测值 。更好的做法是在前面的代码结束后,对 done 有一个明显的输出,表示结束。import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()空间每一个环境都会有一个 action_space和一个 observation_space ,他们的属性都是 Space 它们描述了有效操作和观察的格式: 它们描述了有效操作和观察的格式:import gym
env = gym.make('CartPole-v0')
print(env.action_space)
#> Discrete(2)
print(env.observation_space)
#> Box(4,)离散(Discrete)空间的 action 为 一定范围内的非负整数,在这个例子中,合法的 action 是 0 或 1。Box空间是n维的,所以合法的 observations 是一个4维向量,我们可以查看 Box 的范围。 print(env.observation_space.high)
#> array([ 2.4 , inf, 0.20943951, inf])
print(env.observation_space.low)
#> array([-2.4 , -inf, -0.20943951, -inf])Box 和 Discrete 是最常见的 Space。 我们可以对一个 Space 进行采样或者查看其中的元素:from gym import spaces
space = spaces.Discrete(8) # Set with 8 elements {0, 1, 2, ..., 7}
x = space.sample()
assert space.contains(x)
assert space.n == 8Gym有哪些环境Gym拥有众多的不同环境,从易到难,包含了大量不同数据,我们可以通过full list of environments 查看有哪些环境。Classic control 和 toy text: 这部分内容大部分来自强化学习的论文,可以完成小规模任务。Algorithmic: 这部分内容用于执行计算,比如多位数相加、反转序列等等。Atari: 这部分内容可以用来玩雅达利游戏,Gym以一种易于安装的形式集成了 Arcade学习环境2D and 3D robots: 这部分内容可以通过仿真控制机器人,这些任务使用MuJoCo物理引擎,MuJoCo专门用于快速精准的机器人仿真控制,注册表Gym的主要目的是提供大量的环境,这些环境拥有通用接口,并进行了版本控制以进行比较。如果要列出安装中可用的环境,可以查询Gym.envs.registry:from gym import envs
print(envs.registry.all())
#> [EnvSpec(DoubleDunk-v0), EnvSpec(InvertedDoublePendulum-v0), EnvSpec(BeamRider-v0), EnvSpec(Phoenix-ram-v0), EnvSpec(Asterix-v0), EnvSpec(TimePilot-v0), EnvSpec(Alien-v0), EnvSpec(Robotank-ram-v0), EnvSpec(CartPole-v0), EnvSpec(Berzerk-v0), EnvSpec(Berzerk-ram-v0), EnvSpec(Gopher-ram-v0), ...编辑于 2020-06-21 15:11PythonTensorFlow 学习OpenAI赞同 13添加评论分享喜欢收藏申请
Releases · openai/gym · GitHub
Releases · openai/gym · GitHub
Skip to content
Toggle navigation
Sign in
Product
Actions
Automate any workflow
Packages
Host and manage packages
Security
Find and fix vulnerabilities
Codespaces
Instant dev environments
Copilot
Write better code with AI
Code review
Manage code changes
Issues
Plan and track work
Discussions
Collaborate outside of code
Explore
All features
Documentation
GitHub Skills
Blog
Solutions
For
Enterprise
Teams
Startups
Education
By Solution
CI/CD & Automation
DevOps
DevSecOps
Resources
Learning Pathways
White papers, Ebooks, Webinars
Customer Stories
Partners
Open Source
GitHub Sponsors
Fund open source developers
The ReadME Project
GitHub community articles
Repositories
Topics
Trending
Collections
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search
Clear
Search syntax tips
Provide feedback
We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
Sign up
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
openai
/
gym
Public
Notifications
Fork
8.5k
Star
33.6k
Code
Issues
80
Pull requests
5
Actions
Projects
0
Wiki
Security
Insights
Additional navigation options
Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights
Releases: openai/gym
Releases
Tags
Releases · openai/gym
0.26.2
04 Oct 16:39
jkterry1
0.26.2
a368cfa
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.26.2
Latest
Latest
Release notes
This is another very minor bug release.
Bugs Fixes
As reset now returns (obs, info) then in the vector environments, this caused the final step's info to be overwritten. Now, the final observation and info are contained within the info as "final_observation" and "final_info" @pseudo-rnd-thoughts
Adds warnings when trying to render without specifying the render_mode @younik
Updates Atari Preprocessing such that the wrapper can be pickled @vermouth1992
Github CI was hardened to such that the CI just has read permissions @sashashura
Clarify and fix typo in GraphInstance @ekalosak
Contributors
ekalosak, vermouth1992, and 3 other contributors
Assets
2
3
achuthasubhash, djjaron, and QuentinBin reacted with thumbs up emoji
1
QuentinBin reacted with laugh emoji
❤️
6
dmytrochumakov, rodrigodelazcano, younik, alexunderch, pseudo-rnd-thoughts, and QuentinBin reacted with heart emoji
6
dmytrochumakov, younik, till2, ricklei2777, macrombilux, and QuentinBin reacted with rocket emoji
All reactions
3 reactions
1 reaction
❤️
6 reactions
6 reactions
11 people reacted
0.26.1
16 Sep 20:40
jkterry1
0.26.1
53d784e
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.26.1
Release Notes
This is a very minor bug fix release for 0.26.0
Bug Fixes
#3072 - Previously mujoco was a necessary module even if only mujoco-py was used. This has been fixed to allow only mujoco-py to be installed and used. @YouJiacheng
#3076 - PixelObservationWrapper raises an exception if the env.render_mode is not specified. @vmoens
#3080 - Fixed bug in CarRacing where the colour of the wheels were not correct @foxik
#3083 - Fixed BipedalWalker where if the agent moved backwards then the rendered arrays would be a different size. @younik
Spelling
Fixed truncation typo in readme API example @rdnfn
Updated pendulum observation space from angle to theta to make more consistent @ikamensh
Contributors
foxik, ikamensh, and 4 other contributors
Assets
2
1
achuthasubhash reacted with thumbs up emoji
2
rodrigoAMF and Jeyhooon reacted with hooray emoji
All reactions
1 reaction
2 reactions
3 people reacted
0.26.0
06 Sep 18:23
jkterry1
0.26.0
824507b
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.26.0
Release notes for v0.26.0
This release is aimed to be the last of the major API changes to the core API. All of the previously "turned off" changes of the base API (step termination / truncation, reset info, no seed function, render mode determined by initialization) are now expected by default. We still plan to make breaking changes to Gym itself, but to things that are very easy to upgrade (environments and wrappers), and things that aren't super commonly used (the vector API). Once those aspects are stabilized, we'll do a proper 1.0 release and follow semantic versioning. Additionally, unless something goes terribly wrong with this release and we have to release a patched version, this will be the last release of Gym for a while.
If you've been waiting for a "stable" release of Gym to upgrade your project given all the changes that have been going on, this is the one.
We also just wanted to say that we tremendously appreciate the communities patience with us as we've gone on this journey taking over the maintenance of Gym and making all of these huge changes to the core API. We appreciate your patience and support, but hopefully, all the changes from here on out will be much more minor.
Breaking backward compatibility
These changes are true of all gym's internal wrappers and environments but for environments not updated, we provide the EnvCompatibility wrapper for users to convert old gym v21 / 22 environments to the new core API. This wrapper can be easily applied in gym.make and gym.register through the apply_api_compatibility parameters.
Step Termination / truncation - The Env.step function returns 5 values instead of 4 previously (observations, reward, termination, truncation, info). A blog with more details will be released soon to explain this decision. @arjun-kg
Reset info - The Env.reset function returns two values (obs and info) with no return_info parameter for gym wrappers and environments. This is important for some environments that provided action masking information for each actions which was not possible for resets. @balisujohn
No Seed function - While Env.seed was a helpful function, this was almost solely used for the beginning of the episode and is added to gym.reset(seed=...). In addition, for several environments like Atari that utilise external random number generators, it was not possible to set the seed at any time other than reset. Therefore, seed is no longer expected to function within gym environments and is removed from all gym environments @balisujohn
Rendering - It is normal to only use a single render mode and to help open and close the rendering window, we have changed Env.render to not take any arguments and so all render arguments can be part of the environment's constructor i.e., gym.make("CartPole-v1", render_mode="human"). For more detail on the new API, see blog post @younik
Major changes
Render modes - In v25, there was a change in the meaning of render modes, i.e. "rgb_array" returned a list of rendered frames with "single_rgb_array" returned a single frame. This has been reverted in this release with "rgb_array" having the same meaning as previously to return a single frame with a new mode "rgb_array_list" returning a list of RGB arrays. The capability to return a list of rendering observations achieved through a wrapper applied during gym.make. #3040 @pseudo-rnd-thoughts @younik
Added save_video that uses moviepy to render a list of RGB frames and updated RecordVideo to use this function. This removes support for recording ansi outputs. #3016 @younik
RandomNumberGenerator functions: rand, randn, randint, get_state, set_state, hash_seed, create_seed, _bigint_from_bytes and _int_list_from_bigint have been removed. @balisujohn
Bump ale-py to 0.8.0 which is compatibility with the new core API
Added EnvAPICompatibility wrapper @RedTachyon
Minor changes
Added improved Sequence, Graph and Text sample masking @pseudo-rnd-thoughts
Improved the gym make and register type hinting with entry_point being a necessary parameter of register. #3041 @pseudo-rnd-thoughts
Changed all URL to the new gym website https://www.gymlibrary.dev/ @FieteO
Fixed mujoco offscreen rendering with weight and height value > 500 #3044 @YouJiacheng
Allowed toy_text environment to render on headless machines #3037 @RedTachyon
Renamed the motors in the mujoco swimmer envs #3036 @lin826
Contributors
lin826, pseudo-rnd-thoughts, and 6 other contributors
Assets
2
12
qgallouedec, 8bitmp3, rodrigodelazcano, achuthasubhash, alexunderch, KaleabTessera, driesmarzougui, hr0nix, Howuhh, pkuderov, and 2 more reacted with thumbs up emoji
8
AYUSHMIT, jetnew, benjaminbuzek, kayhantolga, jank324, foxik, Howuhh, and MarcCote reacted with hooray emoji
9
8bitmp3, pseudo-rnd-thoughts, vairodp, younik, sven1977, jakelogemann, Markus28, Howuhh, and MarcCote reacted with rocket emoji
All reactions
12 reactions
8 reactions
9 reactions
24 people reacted
0.25.2
18 Aug 17:41
jkterry1
0.25.2
a8d4dd7
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.25.2
Release notes for v0.25.2
This is a fairly minor bug fix release.
Bug Fixes
Removes requirements for _TimeLimit.truncated in info for step compatibility functions. This makes the step compatible with Envpool @arjun-kg
As the ordering of Dict spaces matters when flattening spaces, updated the __eq__ to account for the .keys() ordering. @XuehaiPan
Allows CarRacing environment to be pickled. Updated all gym environments to be correctly pickled. @RedTachyon
seeding Dict and Tuple spaces with integers can cause lower-specification computers to hang due to requiring 8Gb memory. Updated the seeding with integers to not require unique subseeds (subseed collisions are rare). For users that require unique subseeds for all subspaces, we recommend using a dictionary or tuple with the subseeds. @olipinski
Fixed the metaclass implementation for the new render api to allow custom environments to use metaclasses as well. @YouJiacheng
Updates
Simplifies the step compatibility functions to make them easier to debug. Time limit wrapper with the old step API favours terminated over truncated if both are true. This is as the old done step API can only encode 3 states (cannot encode terminated=True and truncated=True) therefore we must encode to only terminated=True or truncated=True. @pseudo-rnd-thoughts
Add Swig as a dependency @kir0ul
Add type annotation for render_mode and metadata @bkrl
Contributors
olipinski, kir0ul, and 6 other contributors
Assets
2
5
achuthasubhash, rodrigodelazcano, pseudo-rnd-thoughts, YossiSuper, and bassiilll reacted with thumbs up emoji
All reactions
5 reactions
5 people reacted
0.25.1
26 Jul 22:30
jkterry1
0.25.1
96e6cd6
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.25.1
Release notes
Added rendering for CliffWalking environment @younik
PixelObservationWrapper only supports the new render API due to difficulty in supporting both old and new APIs. A warning is raised if the user is using the old API @vmoens
Bug fix
Revert an incorrect edition on wrapper.FrameStack @ZhiqingXiao
Fix reset bounds for mountain car @psc-g
Removed skipped tests causing bugs not to be caught @pseudo-rnd-thoughts
Added backward compatibility for environments without metadata @pseudo-rnd-thoughts
Fixed BipedalWalker rendering for RGB arrays @1b15
Fixed bug in PixelObsWrapper for using the new rendering @younik
Typos
Rephrase observations' definition in Lunar Lander Environment @EvanMath
Top-docstring in gym/spaces/dict.py @Ice1187
Several typos in humanoidstandup_v4.py, mujoco_env.py, and vector_list_info.py @timgates42
Typos in passive environment checker @pseudo-rnd-thoughts
Typos in Swimmer rotations @lin826
Contributors
lin826, ZhiqingXiao, and 8 other contributors
Assets
2
10
achuthasubhash, pseudo-rnd-thoughts, cnhemiya, younik, alexunderch, TheFanatr, evdcush, GhostSixFive, vmoens, and Dhanushvarma reacted with thumbs up emoji
1
GhostSixFive reacted with hooray emoji
❤️
1
knotgrass reacted with heart emoji
All reactions
10 reactions
1 reaction
❤️
1 reaction
11 people reacted
0.25.0
13 Jul 20:01
jkterry1
0.25.0
ddce4a5
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.25.0
Release notes
This release finally introduces all new API changes that have been planned for the past year or more, all of which will be turned on by default in a subsequent release. After this point, Gym development should get massively smoother. This release also fixes large bugs present in 0.24.0 and 0.24.1, and we highly discourage using those releases.
API Changes
Step - A majority of deep reinforcement learning algorithm implementations are incorrect due to an important difference in theory and practice as done is not equivalent to termination. As a result, we have modified the step function to return five values, obs, reward, termination, truncation, info. The full theoretical and practical reason (along with example code changes) for these changes will be explained in a soon-to-be-released blog post. The aim for the change to be backward compatible (for now), for issues, please put report the issue on github or the discord. @arjun-kg
Render - The render API is changed such that the mode has to be specified during gym.make with the keyword render_mode, after which, the render mode is fixed. For further details see https://younis.dev/blog/2022/render-api/ and #2671. This has the additional changes
with render_mode="human" you don't need to call .render(), rendering will happen automatically on env.step()
with render_mode="rgb_array", .render() pops the list of frames rendered since the last .reset()
with render_mode="single_rgb_array", .render() returns a single frame, like before.
Space.sample(mask=...) allows a mask when sampling actions to enable/disable certain actions from being randomly sampled. We recommend developers add this to the info parameter returned by reset(return_info=True) and step. See #2906 for example implementations of the masks or the individual spaces. We have added an example version of this in the taxi environment. @pseudo-rnd-thoughts
Add Graph for environments that use graph style observation or action spaces. Currently, the node and edge spaces can only be Box or Discrete spaces. @jjshoots
Add Text space for Reinforcement Learning that involves communication between agents and have dynamic length messages (otherwise MultiDiscrete can be used). @ryanrudes @pseudo-rnd-thoughts
Bug fixes
Fixed car racing termination where if the agent finishes the final lap, then the environment ends through truncation not termination. This added a version bump to Car racing to v2 and removed Car racing discrete in favour of gym.make("CarRacing-v2", continuous=False) @araffin
In v0.24.0, opencv-python was an accidental requirement for the project. This has been reverted. @KexianShen @pseudo-rnd-thoughts
Updated utils.play such that if the environment specifies keys_to_action, the function will automatically use that data. @Markus28
When rendering the blackjack environment, fixed bug where rendering would change the dealers top car. @balisujohn
Updated mujoco docstring to reflect changes that were accidently overwritten. @Markus28
Misc
The whole project is partially type hinted using pyright (none of the project files is ignored by the type hinter). @RedTachyon @pseudo-rnd-thoughts (Future work will add strict type hinting to the core API)
Action masking added to the taxi environment (no version bump due to being backwards compatible) @pseudo-rnd-thoughts
The Box space shape inference is allows high and low scalars to be automatically set to (1,) shape. Minor changes to identifying scalars. @pseudo-rnd-thoughts
Added option support in classic control environment to modify the bounds on the initial random state of the environment @psc-g
The RecordVideo wrapper is becoming deprecated with no support for TextEncoder with the new render API. The plan is to replace RecordVideo with a single function that will receive a list of frames from an environment and automatically render them as a video using MoviePy. @johnMinelli
The gym py.Dockerfile is optimised from 2Gb to 1.5Gb through a number of optimisations @TheDen
Contributors
TheDen, araffin, and 10 other contributors
Assets
2
8
pkuderov, rickstaa, sentayhu19, xinchaosong, TheDen, avnishn, bramgrooten, and Vg647 reacted with thumbs up emoji
13
EmmanuelMess, rodrigoAMF, achuthasubhash, lukaemon, axelbr, pseudo-rnd-thoughts, qgallouedec, stefanbschneider, Forbu, younik, and 3 more reacted with hooray emoji
❤️
3
rodrigoAMF, rodrigodelazcano, and Vg647 reacted with heart emoji
3
zhiqing-xiao, diegoferigo, and Vg647 reacted with rocket emoji
All reactions
8 reactions
13 reactions
❤️
3 reactions
3 reactions
23 people reacted
0.24.1
07 Jun 13:51
jkterry1
0.24.1
66c431d
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.24.1
This is a bug fix release for version 0.24.0
Bugs fixed:
Replaced the environment checker introduced in V24, such that the environment checker will not call step and reset during make. This new version is a wrapper that will observe the data that step and reset returns on their first call and check the data against the environment checker. @pseudo-rnd-thoughts
Fixed MuJoCo v4 arguments key callback, closing the environment in renderer and the mujoco_rendering close method. @rodrigodelazcano
Removed redundant warning in registration @RedTachyon
Removed maths operations from MuJoCo xml files @quagla
Added support for unpickling legacy spaces.Box @pseudo-rnd-thoughts
Fixed mujoco environment action and observation space docstring tables @pseudo-rnd-thoughts
Disable wrappers from accessing _np_random property and np_random is now forwarded to environments @pseudo-rnd-thoughts
Rewrite setup.py to add a "testing" meta dependency group @pseudo-rnd-thoughts
Fixed docstring in rescale_action wrapper @gianlucadecola
Contributors
pseudo-rnd-thoughts, RedTachyon, and 3 other contributors
Assets
2
8
xsa-dev, pseudo-rnd-thoughts, mahi97, arjun-kg, achuthasubhash, gianlucadecola, peterdavidfagan, and jjshoots reacted with thumbs up emoji
All reactions
8 reactions
8 people reacted
0.24.0
25 May 16:44
jkterry1
0.24.0
e89b00c
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.24.0
Major changes
Added v4 mujoco environments that use the new deepmind mujoco 2.2.0 module.
This can be installed through pip install gym[mujoco] with the old bindings still being
available using the v3 environments and pip install gym[mujoco-py].
These new v4 environment should have the same training curves as v3. For the Ant, we found that there was a
contact parameter that was not applied in v3 that can enabled in v4 however was found to produce significantly
worse performance see comment for more details. @rodrigodelazcano
The vector environment step info API has been changes to allow hardware acceleration in the future.
See this PR for the modified info style that now uses dictionaries instead of a list of environment info.
If you still wish to use the list info style, then use the VectorListInfo wrapper. @gianlucadecola
On gym.make, the gym env_checker is run that includes calling the environment reset and step to check if the
environment is compliant to the gym API. To disable this feature, run gym.make(..., disable_env_checker=True). @RedTachyon
Re-added gym.make("MODULE:ENV") import style that was accidentally removed in v0.22 @arjun-kg
Env.render is now order enforced such that Env.reset is required before Env.render is called. If this a required
feature then set the OrderEnforcer wrapper disable_render_order_enforcing=True. @pseudo-rnd-thoughts
Added wind and turbulence to the Lunar Lander environment, this is by default turned off,
use the wind_power and turbulence parameter. @virgilt
Improved the play function to allows multiple keyboard letter to pass instead of ascii value @Markus28
Added google style pydoc strings for most of the repositories @pseudo-rnd-thoughts @Markus28
Added discrete car racing environment version through gym.make("CarRacing-v1", continuous=False)
Pygame is now an optional module for box2d and classic control environments that is only necessary for rendering.
Therefore, install pygame using pip install gym[box2d] or pip install gym[classic_control] @gianlucadecola @RedTachyon
Fixed bug in batch spaces (used in VectorEnv) such that the original space's seed was ignored @pseudo-rnd-thoughts
Added AutoResetWrapper that automatically calls Env.reset when Env.step done is True @balisujohn
Minor changes
BipedalWalker and LunarLander's observation spaces have non-infinite upper and lower bounds. @jjshoots
Bumped the ALE-py version to 0.7.5
Improved the performance of car racing through not rendering polygons off screen @andrewtanJS
Fixed turn indicators that were black not red/white in Car racing @jjshoots
Bug fixes for VecEnvWrapper to forward method calls to the environment @arjun-kg
Removed unnecessary try except on Box2d such that if Box2d is not installed correctly then a more helpful error is show @pseudo-rnd-thoughts
Simplified the gym.registry backend @RedTachyon
Re-added python 3.6 support through backports of python 3.7+ modules. This is not tested or compatible with the mujoco environments. @pseudo-rnd-thoughts
Contributors
pseudo-rnd-thoughts, Markus28, and 8 other contributors
Assets
2
8
achuthasubhash, urbanonymous, virgilt, CaiqueCoelho, vwxyzjn, Jabed-Akhtar, pkuderov, and ikamensh reacted with thumbs up emoji
13
arjun-kg, LucasAlegre, pseudo-rnd-thoughts, balisujohn, yueguoguo, Raiszo, trigaten, virgilt, lydiazoghbi, CaiqueCoelho, and 3 more reacted with hooray emoji
9
kir0ul, qgallouedec, gianlucadecola, virgilt, diegoferigo, CaiqueCoelho, vwxyzjn, mkhlyzov, and ikamensh reacted with rocket emoji
All reactions
8 reactions
13 reactions
9 reactions
22 people reacted
0.23.1
11 Mar 17:25
jkterry1
0.23.1
252c333
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.23.1
This release contains a few small bug fixes and no breaking changes.
Make VideoRecorder backward-compatible to gym<0.23 by @vwxyzjn in #2678
Fix issues with pygame event handling (which should fix support on windows and in jupyter notebooks) by @andrewtanJS in #2684
Add py.typed to package_data by @micimize in https://github.com/openai/gym/p
Fixes around 1500 warnings in CI @pseudo-rnd-thoughts
Deprecation warnings correctly display now @vwxyzjn
Fix removing striker and thrower @RushivArora
Fix small dependency warning errorr @ZhiqingXiao
Contributors
vwxyzjn, micimize, and 4 other contributors
Assets
2
4
achuthasubhash, trigaten, dberweger2017, and TomorrowIsAnOtherDay reacted with thumbs up emoji
All reactions
4 reactions
4 people reacted
0.23.0
04 Mar 20:51
jkterry1
0.23.0
2dddaf7
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
GPG key ID: 4AEE18F83AFDEB23
Expired
Learn about vigilant mode.
Compare
Choose a tag to compare
View all tags
0.23.0
This release contains many bug fixes and a few small changes.
Breaking changes:
Standardized render metadata variables ahead of render breaking change @trigaten
Removed deprecated monitor wrapper and associated dead code @gianlucadecola
Unused striker and thrower MuJoCo envs moved to https://github.com/RushivArora/Gym-Mujoco-Archive @RushivArora
Many minor bug fixes (@vwxyzjn , @RedTachyon , @rusu24edward , @Markus28 , @dsctt , @andrewtanJS , @tristandeleu , @duburcqa)
Contributors
tristandeleu, vwxyzjn, and 9 other contributors
Assets
2
4
achuthasubhash, sven1977, floringogianu, and vwxyzjn reacted with thumbs up emoji
All reactions
4 reactions
4 people reacted
Previous 1 2 Next
Previous Next
Footer
© 2024 GitHub, Inc.
Footer navigation
Terms
Privacy
Security
Status
Docs
Contact
Manage cookies
Do not share my personal information
You can’t perform that action at this time.
Gym Retro
Gym Retro
CloseSearch Submit Skip to main contentSite NavigationResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTSafetyCompanyAboutBlogCareersResidencyCharterSecurityCustomer storiesSearch Navigation quick links Log inTry ChatGPTMenu Mobile Navigation CloseSite NavigationResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTSafetyCompanyAboutBlogCareersResidencyCharterSecurityCustomer stories Quick Links Log inTry ChatGPTSearch Submit ResearchGym RetroWe’re releasing the full version of Gym Retro, a platform for reinforcement learning research on games. This brings our publicly-released game count from around 70 Atari games and 30 Sega games to over 1,000 games across a variety of backing emulators. We’re also releasing the tool we use to add new games to the platform.May 25, 2018Reinforcement learning, Games, Environments, Open source, Software engineering, ReleaseWe use Gym Retro to conduct research on RL algorithms and study generalization. Prior research in RL has mostly focused on optimizing agents to solve single tasks. With Gym Retro, we can study the ability to generalize between games with similar concepts but different appearances.This release includes games from the Sega Genesis and Sega Master System, and Nintendo’s NES, SNES, and Game Boy consoles. It also includes preliminary support for the Sega Game Gear, Nintendo Game Boy Color, Nintendo Game Boy Advance, and NEC TurboGrafx. Some of the released game integrations, including those games in the data/experimental folder of Gym Retro, are in a beta state — please try them out and let us know if you encounter any bugs. Due to the large scale of the changes involved the code will only be available on a branch for the time being. To avoid breaking contestants’ code we won’t be merging the branch until after the contest concludes.The ongoing Retro Contest (ending in a couple weeks!) and our recent technical report focus on the easier problem of generalizing between different levels of the same game (Sonic The Hedgehog™). The full Gym Retro dataset takes this idea further and makes it possible to study the harder problem of generalization between different games. The scale of the dataset and difficulty of individual games makes it a formidable challenge, and we are looking forward to sharing our research progress over the next year. We also hope that some of the solutions developed by participants in the Retro Contest can be scaled up and applied to the full Gym Retro dataset.Integration toolWe’re also releasing the tool we use to integrate new games. Provided you have the ROM for a game, this tool lets you easily create save states, find memory locations, and design scenarios that reinforcement learning agents can then solve. We’ve written an integrator’s guide for people looking to add support for new games.The integration tool also supports recording and playing movie files that save all the button inputs to the game. These files are small because they only need the starting state and sequence of button presses, as opposed to storing each frame of the output. Movie files like these are useful for visualizing what your reinforcement learning agent is doing as well as storing human input to use as training data.Reward farmingWhile developing Gym Retro we’ve found numerous examples of games where the agent learns to farm for rewards (defined as the increase in game score) rather than completing the implicit mission. In the above clips, characters in Cheese Cat-Astrophe (left) and Blades of Vengeance (right) become trapped in infinite loops because they’re able to rapidly accrue rewards that way. This highlights a phenomenon we’ve discussed previously: the relatively simple reward functions we give to contemporary reinforcement learning algorithms, for instance by maximizing the score in a game, can lead to undesirable behaviors.For games with dense (frequent and incremental) rewards where most of the difficulty comes from needing fast reaction times, reinforcement learning algorithms such as PPO do very well.In a game such as Gradius (pictured on the right), you get points for each enemy you shoot, so it’s easy to get rewards and start learning. Surviving in a game like this is based on your ability to dodge enemies, which is no problem for reinforcement learning algorithms since they play the game one frame at a time.For games that have a sparse reward or require planning more than a few seconds into the future, existing algorithms have a hard time. Many of the games in the Gym Retro dataset have a sparse reward or require planning, so tackling the full dataset will likely require new techniques that have not been developed yet.If you are excited about conducting research on transfer learning and meta-learning with an unprecedentedly large dataset, then consider joining OpenAI.AuthorsVicki PfauAlex NicholChristopher HesseLarissa SchiavoJohn SchulmanOleg KlimovResearchOverviewIndexGPT-4DALL·E 3SoraAPIOverviewPricingDocsChatGPTOverviewTeamEnterprisePricingTry ChatGPTCompanyAboutBlogCareersCharterSecurityCustomer storiesSafetyOpenAI © 2015 – 2024Terms & policiesPrivacy policyBrand guidelinesSocialTwitterYouTubeGitHubSoundCloudLinkedInBack to top
[1606.01540] OpenAI Gym
[1606.01540] OpenAI Gym
Skip to main content
We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors. Donate
> cs > arXiv:1606.01540
Help | Advanced Search
All fields
Title
Author
Abstract
Comments
Journal reference
ACM classification
MSC classification
Report number
arXiv identifier
DOI
ORCID
arXiv author ID
Help pages
Full text
Search
open search
GO
open navigation menu
quick links
Login
Help Pages
About
Computer Science > Machine Learning
arXiv:1606.01540 (cs)
[Submitted on 5 Jun 2016]
Title:OpenAI Gym
Authors:Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, Wojciech Zaremba Download a PDF of the paper titled OpenAI Gym, by Greg Brockman and 6 other authors
Download PDF
Abstract:OpenAI Gym is a toolkit for reinforcement learning research. It includes a growing collection of benchmark problems that expose a common interface, and a website where people can share their results and compare the performance of algorithms. This whitepaper discusses the components of OpenAI Gym and the design decisions that went into the software.
Subjects:
Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:
arXiv:1606.01540 [cs.LG]
(or
arXiv:1606.01540v1 [cs.LG] for this version)
https://doi.org/10.48550/arXiv.1606.01540
Focus to learn more
arXiv-issued DOI via DataCite
Submission history From: John Schulman [view email] [v1]
Sun, 5 Jun 2016 17:54:48 UTC (546 KB)
Full-text links:
Access Paper:
Download a PDF of the paper titled OpenAI Gym, by Greg Brockman and 6 other authorsDownload PDFTeX SourceOther Formats
view license
Current browse context: cs.LG
< prev
|
next >
new
|
recent
|
1606
Change to browse by:
cs
cs.AI
References & Citations
NASA ADSGoogle Scholar
Semantic Scholar
1 blog link (what is this?)
DBLP - CS Bibliography
listing | bibtex
Greg BrockmanVicki CheungLudwig PetterssonJonas SchneiderJohn Schulman …
a
export BibTeX citation
Loading...
BibTeX formatted citation
×
loading...
Data provided by:
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Code, Data and Media Associated with this Article
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Links to Code Toggle
Papers with Code (What is Papers with Code?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos
Demos
Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers
Recommenders and Search Tools
Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Core recommender toggle
CORE Recommender (What is CORE?)
IArxiv recommender toggle
IArxiv Recommender
(What is IArxiv?)
Author
Venue
Institution
Topic
About arXivLabs
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
Which authors of this paper are endorsers? |
Disable MathJax (What is MathJax?)
About
Help
contact arXivClick here to contact arXiv
Contact
subscribe to arXiv mailingsClick here to subscribe
Subscribe
Copyright
Privacy Policy
Web Accessibility Assistance
arXiv Operational Status
Get status notifications via
or slack
-1.5
%�
64 0 obj
<< /Filter /FlateDecode /Length 3096 >>
stream
xڽZ[���~����s^DIl�u;n�H������E�����wfggI.�X5�`�Z�ev.�|3:���W�^����ë��F�U�{�����j��~��[���~]�z���m�Jԝ����������~s��w�>D���ꦣYge�z��4��a#z��T)z۴4�."��g��T���u'��&��i&�eWg�h���/s��f��U�ȋ� ]� ����?���i�h1���
��h�˓rQ5��ڴ��Yp��K����Я�n�U�%�p�j #/ �c������������҇��V(�}�c���+ߋ�Z���W�_�į�s4ai�,Riu)�-+A.6�Mwiw�\���-��������'٪�.��h����ڿ/�c�k:5��#/ʹY��_ܲ�7,̊��U"���v�����7�yS~��������U�~y ��@�����9S�� "�e���ZOi���N���eǦ)Ez"����u��e��o[}�e&�[<�"U�ߴT��Y���e��^���% 8��e�1�#�G0B��:z\����4I^�3If� (����I2�~AY����1O�ݲ��ooY�������1D��Ŷ*Z�:-�:�� �U����.D
x�0�{��1C��7��� %φG~X�+��� VC/�'z@
E!�QZE��O4�$�5IǗ�z��*���5 u�Y�wEӪ��Zl�x��"Z)��� ��#�esA�s����5�!88zr� �B�T �6�9���ܴ�+*q�B�7��uӷ��J�| ��@|}iIB��F�0����#�͚���(�i��� ��j���m�
%;�BȀ���PaŌ ����7`� JYR�s�%;�֠cƠ�ܛ`=T�p����~����:-˨��GY�gI�T����0�����s����fja̶O��^���p3�]�j���E���;1K���4v�}9�����zq��fha-IMb���wf�H&Ҭ��NV�6/�i�X�2;YD���Ku�"7ÃE���e������Z`�=��=Jt�F��,!_�p����vZH�J'���"�l�2j?��H�3��u&A̦'R3h$z�Y�U�7�q��ihrrt:*/\_�u^t��EN_��W����3�S0���?�V�;n崢�զ�#��Y��ct��Dy�N�J�"�R1��}�A����������^Ԥf��ր,b��2>�5�ՂlGۛ�dC|�ķ��)����RXhRXu)REh�Δ�a)>��yS�A��I���H���
�k�S ��c7܆^2�{h�V��Ôpxt]���Q̴�N�05�L��sO��hN�D���]*-b2�q��q��_�g� j��x��3嘻}�a�'w�Uӳt�ڲS����j�m5�����sk��f&���@b�8ji��#o�ݽ|Kh�i���]�т�� ��&�~f|8!,\��d� �D���珓I���'�y[�e'�6I�gu��ꒃ�U�@Le@�Zݪ���(��iԡX��i�`���Z���4���z�&�[��Ԝ0��&�Ʉ�En�X��IΡV�L)a
λ_J9 g�(��&�h�+��$��w�#G��i�Ƥ���+��F�O�lVNgF�F����)Vr�&�+��İ�4�/ǗI�/�6M� �ʧ�p"����R��SZ�+�=J�/'�RS}R�lْ_�~l���\��M�q��nbS�*�bA<�4 �bƁd�fܮ�?DM������yK�!?�4>P���+�~���܌������9��d����&.Ge �e/�*}!�
'|��7U�F$�TY�c�Fڟj� H�1Y��l����﯂U��LJ�u���#=����tG��]Ȭ`�W|F���I�" K6M1�ɡH�s7�����@&S�q�Ë�6dJY]��E{*�ЩS�>d-ي�~�^F�#BX�-G�p�L;�#TAͧ�����܅2�C½s;���67�N}ɯ���8�J ����*�v�Ƶ��y��xW���
,��3c�=�J��~_�z�!v����s�#��yel�)M���p�@.�!z�,���.mD� ��1�}p�J>�G��*'�2υ���6�$rRЪ�������0�X7J-] {L�\t��)yd�1��/�x����̲ӜK7M�m_��c!F@��ukG�C�s�}��XL�GRs�Ҥg�u���7lk5�C���v5�4��-����V�0��}��[$t;��̡�;X��V:!@"%�?2V���Y���`��[p��zlK�@��6��h {�D��`>l�un�RR!7uU1%�#�����֙P.�h~Q�Yw��E1�+�u#�<�2��]��5K�\}|�6L� � �u9�<`.�`���,P�;�< ʟ���q8�xŝc�Q���3��F9���I����s"��t~B����!w.�`ļ�\p5&s���!��Β*�uX��k���6ڥ��hK���;S.K!sn�g��1s��E�cN-)����Ё����/�������յ��ף�r��YG0}-���O?KD|{�8r��{���ܖ�{ ��[�X[䏝�!NF�{\K�FB����=:�BG�
���z!��=�����UT��x�;OM������)̴�tm� ՝����v0Χl�6}��v��⌗2��X4ט�mgT�oi���)g�m�;�W���J���C����?�N���}��.��Q��=dn��v;U۩�3"��0uء��r�kn+������ǚ~�E�x~��w/6� ��k��Џ�0��- 'K�x�?��a�endstream
endobj
94 0 obj
<< /Filter /FlateDecode /Length 2158 >>
stream
xڵY�r�6}�W�m�*M\��^��]�q\���DB�IBI/��sA,$!ڝt*�U-�".�rιt�:���������«4FaBV �Q
�E}��
c�����8�/����nj��(.���Uf(��]�� �H/��l���8��F"���I���(H�/I�L��^�"'� I�k{��/7�%S�"�ޖ��A��f�Pڸ�8M��r�)�3�4M'v��#OQ��ʸK8�k�6����ɘ�0Kbl�K�����^z� �PO�,3����=�M��Ys>,��N��6����xwM�P�{�&! [�a�������5?��m�a��O�-�������{b�?[Q��X����#���Yә�#����^Jx�z�_OTvs���Xsu��?��h����g��?�]Gx�Ca ��6 �����0JtPw�ᝐ�9h�;�B����:�ϐ��L�`@Ȱ�<��m�Zz�V\�i;٫_X��<��hһ�P��g|y:������d�dk"i�m�r� ����k&_���(@)�x�1�A��r/9��dh;vZ�2L�pه~�@4&1Ê9�x�W�㢁�IQS.��f֬d-[*$`���t�֧�A��fHē-�8����c_�iU�� ��h�Z�k�k��;�M��5��[�i�FAPf�.T�A,���]��K;�{�/$+�,g�<� �1&�(���e���7��U+�K:�~��<(8�� ǚ���jŨl\��F�����vVI(�����S�m��^�;+=��NND��e�������]�;��TÀ>��
6� IH���f�� X�<��lm�~:X�P�T�^���Q�Y�
�*�����/
�_~����U�<����RL�-Tv �M��ަn�2�3� ���"����X}�b&���^k�+�6S>+�9�\u^���E���iF��Ȑ �Z�8s{H�܁kY���(�E��iٻk��&ғ�����S�b��ޚ(�� PD���h��b���D�ZT�;a��1%!�K
��a�$����o[`�M-m��h�R�^��o ���vPx�V��9�V��I�Uu���kH�BkKQ26 ����>���*�xY�SK��t�����m�VC���1�z)�Ӗغ�-��gM�{r�:���x�6�Ж��/����Y-� ξ��bn�Y�Q�!��W��8��5��Q-���E�jnC�r���w���w��Qɗfx*#^��)I�J�f��f��?K��d6~>ZM����� @Ș���]ɂ���v֤R���O)n��A�K5dv~"�Cx�,S�&N���E�f�K����Crw7LkTmv�b���N�kA�b��/#���x�9>����n����H����y����m�5�3=y���:�ѝ������Y�R�-��#�����&�t�� � ������Tͷi��!w�_8����B��v7�q�xSJkF������t�΄�C�G��-�A*����Z�!E��1���3j�@�F���fǝ ��������/O]�=-���t���V�zM�6���QC)Vo����LN��h�3.`�T�g���w�9J�(�塐�$����V�'!���1͉a�8'A4ǩǁOs�9����ݤ����<'
���aWg��W_n|9f#{�:´��ݕ^�+^R+��_iT36H���
6��/�~=�t^X�Feկ�������(�/��������I��endstream
endobj
77 0 obj
<< /Type /XObject /Subtype /Image /BitsPerComponent 8
/ColorSpace /DeviceRGB /Filter /FlateDecode /Height 335
/Length 13393 /SMask 98 0 R /Width 500 >>
stream
x��� |Mg����&$�NQ�Ѥ:���`P��V�0���������b��BkTB�����D$Bv��%���ܧy��7��������9s��{�{�����=��j ��є,�Z�J�U�h�. ��t�.̋��V��դi��?(�wݏ�%����"����h�; 0�l�/
u�]�/hÁ��nƭ֨�h�5Eɺ���ju�F����H[T���� �W�k�Kt�J�y�n��M�B�ߍ������;,lXA�~�*X����i54y����G @�VS���F��n����߉���]�_D�kTd�Ȉ.W�~9!3}ka����T�뚢TMQ�V�A�]� ����H���