Skip to content

Commit c54327e

Browse files
sven1977michaelschaarschmidt
authored andcommitted
Unify Runner classes via BaseRunner parent class (tensorforce#328)
* - Finished one TODO: n-step cumulated discounted reward (added horizon parameter with default=0 (no horizon) to the method). - Completed commenting on model.py and runner.py class(es). - Defined all Model variables in __init__ (mostly None) and commented their purpose and functionality. - Fixed pep8 compliance in the Runner classes. - ThreadedRunner: Renamed `max_timesteps` into `max_episode_timesteps` to match Runner's `run` implementation (all backwards compatible). * - corrected some comments in model.py regarding parallel RL (threaded and replica) and nature of distributed_spec. * - added returned game_name to `connect` in unreal_engine.py * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * - added returned game_name to `connect` in unreal_engine.py * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * - added returned game_name to `connect` in unreal_engine.py * - solved conflicts with TF master branch - fixed grayscale image capture for examples/unreal_engine.py * Fixed bug for distributed_spec != None. tf.train.Saver complained about duplicate variables. * Fixed warnings due to deprecated function parameters. * Avoid race-condition where order in episode_rewards won't match order in episode_lengths. * Add possiblity to pass in model saver-frequency by a) episode count, b) seconds, c) timesteps. * Introduced BaseRunner class. Restructured existing Runner classes (Runner and ThreadedRunner) to both inherit from BaseRunner. Unified the interfaces used to control these classes and to handle per-episode and per-n-episode reporting. Added multi-threaded testing to random and constant agent tests (configurable for all agents via run_type flag). * - added returned game_name to `connect` in unreal_engine.py * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * - solved conflicts with TF master branch - fixed grayscale image capture for examples/unreal_engine.py * Fixed warnings due to deprecated function parameters. * Avoid race-condition where order in episode_rewards won't match order in episode_lengths. * Add possiblity to pass in model saver-frequency by a) episode count, b) seconds, c) timesteps. * Introduced BaseRunner class. Restructured existing Runner classes (Runner and ThreadedRunner) to both inherit from BaseRunner. Unified the interfaces used to control these classes and to handle per-episode and per-n-episode reporting. Added multi-threaded testing to random and constant agent tests (configurable for all agents via run_type flag). * - added returned game_name to `connect` in unreal_engine.py * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * - added returned game_name to `connect` in unreal_engine.py * - solved conflicts with TF master branch - fixed grayscale image capture for examples/unreal_engine.py * Fixed warnings due to deprecated function parameters. * Avoid race-condition where order in episode_rewards won't match order in episode_lengths. * Add possiblity to pass in model saver-frequency by a) episode count, b) seconds, c) timesteps. * Introduced BaseRunner class. Restructured existing Runner classes (Runner and ThreadedRunner) to both inherit from BaseRunner. Unified the interfaces used to control these classes and to handle per-episode and per-n-episode reporting. Added multi-threaded testing to random and constant agent tests (configurable for all agents via run_type flag). * Added registry to Environment class (containing only MinimalTest). Added from_spec method to Environment class so that Envs can now be created through this factory. Changed a comment in UE4 script. * - solved conflicts with TF master branch - fixed grayscale image capture for examples/unreal_engine.py * Fix python 2.x incompatibility (inspect.signature not supported by 2.x) * Next attempt: Fix python 2.x incompatibility (inspect.signature not supported by 2.x) * Added run_mode bit-field to test cases. Different modes for any test-agent may be set: single, multi-threaded, and distributed-tf. The default is `single`. Also added num_parallel_worker setting to all test-agents. This setting determines how many parallel agents are used in non-single run_mode tests. * Bug fix: "No network-spec"-error in multi-threaded runs using the WorkerAgent factory. * Bug fix: Add episode run time to runner stats. Send network_spec to WorkerAgent-factory regardless (robust against non-learning agents that don't have a network) * Bug fix: Multithreaded runner would save many times per one save-interval.
1 parent 9d71ff3 commit c54327e

17 files changed

Lines changed: 581 additions & 254 deletions

examples/threaded_ale.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -175,18 +175,18 @@ def summary_report(r):
175175

176176
# Create runners
177177
threaded_runner = ThreadedRunner(
178-
agents, environments,
178+
agents,
179+
environments,
179180
repeat_actions=1,
180181
save_path=args.save,
181182
save_episodes=args.save_episodes
182183
)
183184

184185
logger.info("Starting {agent} for Environment '{env}'".format(agent=agent, env=environments[0]))
185186
threaded_runner.run(summary_interval=100, episode_finished=episode_finished, summary_report=summary_report)
187+
threaded_runner.close()
186188
logger.info("Learning finished. Total episodes: {ep}".format(ep=threaded_runner.global_episode))
187189

188-
[environments[t].close() for t in range(args.workers)]
189-
190190

191191
if __name__ == '__main__':
192192
main()

examples/unreal_engine.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def main():
6565

6666
args = parser.parse_args()
6767

68-
#logging.basicConfig(filename="logfile.txt", level=logging.INFO)
68+
# logging.basicConfig(filename="logfile.txt", level=logging.INFO)
6969
logging.basicConfig(stream=sys.stderr)
7070
logger = logging.getLogger(__name__)
7171
logger.setLevel(logging.DEBUG)
@@ -83,7 +83,7 @@ def main():
8383
if args.random_test_run:
8484
# Reset the env.
8585
s = environment.reset()
86-
img = Image.fromarray(s, "RGB")
86+
img = Image.fromarray(s, "RGB" if len(environment.states["shape"]) == 3 else "L")
8787
# Save first received image as a sanity-check.
8888
img.save("reset.png")
8989
for i in range(1000):

tensorforce/agents/agent.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def initialize_model(self):
110110
def reset(self):
111111
"""
112112
Reset the agent to its initial state (e.g. on experiment start). Updates the Model's internal episode and
113-
timestep counter, internal states, and resets preprocessors.
113+
time step counter, internal states, and resets preprocessors.
114114
"""
115115
self.episode, self.timestep, self.next_internals = self.model.reset()
116116
self.current_internals = self.next_internals

tensorforce/contrib/unreal_engine.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ class UE4Environment(RemoteEnvironment, StateSettableEnvironment):
2828
"""
2929
A special RemoteEnvironment for UE4 game connections.
3030
Communicates with the remote to receive information on the definitions of action- and observation spaces.
31-
Sends UE4 Action- and Axis-mappings as RL-actions and receives observations back defined by ducandu plugin Observer
31+
Sends UE4 Action- and Axis-mappings as RL-actions and receives observations back defined by MLObserver
3232
objects placed in the Game
3333
(these could be camera pixels or other observations, e.g. a x/y/z position of some game actor).
3434
"""
@@ -86,6 +86,8 @@ def connect(self):
8686
raise TensorForceError("ERROR in UE4Environment.connect: no observation- or action-space-desc sent "
8787
"by remote server!")
8888

89+
# Game's name
90+
self.game_name = response.get("game_name") # keep non-mandatory for now
8991
# Observers
9092
self.observation_space_desc = response["observation_space_desc"]
9193
# Action-mappings

tensorforce/core/preprocessing/grayscale.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ def __init__(self, weights=(0.299, 0.587, 0.114), scope='grayscale', summary_lab
3434

3535
def tf_process(self, tensor):
3636
weights = tf.reshape(tensor=self.weights, shape=(tuple(1 for _ in range(util.rank(tensor) - 1)) + (3,)))
37-
return tf.reduce_sum(input_tensor=(weights * tensor), axis=-1, keep_dims=True)
37+
return tf.reduce_sum(input_tensor=(weights * tensor), axis=-1, keepdims=True)
3838

3939
def processed_shape(self, shape):
4040
return tuple(shape[:-1]) + (1,)

tensorforce/environments/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,5 +15,11 @@
1515

1616

1717
from tensorforce.environments.environment import Environment
18+
from tensorforce.environments.minimal_test import MinimalTest
1819

19-
__all__ = ['Environment']
20+
21+
environments = dict(
22+
minimal_test=MinimalTest,
23+
)
24+
25+
__all__ = ['Environment', 'MinimalTest']

tensorforce/environments/environment.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,9 @@
1818
from __future__ import print_function
1919
from __future__ import division
2020

21+
import tensorforce.environments
22+
import tensorforce.util
23+
2124

2225
class Environment(object):
2326
"""
@@ -84,3 +87,16 @@ def actions(self):
8487
8588
"""
8689
raise NotImplementedError
90+
91+
@staticmethod
92+
def from_spec(spec, kwargs):
93+
"""
94+
Creates an environment from a specification dict.
95+
"""
96+
env = tensorforce.util.get_object(
97+
obj=spec,
98+
predefined_objects=tensorforce.environments.environments,
99+
kwargs=kwargs
100+
)
101+
assert isinstance(env, Environment)
102+
return env

tensorforce/execution/__init__.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
# limitations under the License.
1414
# ==============================================================================
1515

16-
from tensorforce.execution.runner import Runner
17-
from tensorforce.execution.threaded_runner import ThreadedRunner
16+
from tensorforce.execution.base_runner import BaseRunner
17+
from tensorforce.execution.runner import Runner, SingleRunner, DistributedTFRunner
18+
from tensorforce.execution.threaded_runner import ThreadedRunner, WorkerAgentGenerator
1819

19-
__all__ = ['Runner', 'ThreadedRunner']
20+
__all__ = ['BaseRunner', 'SingleRunner', 'DistributedTFRunner', 'Runner', 'ThreadedRunner', 'WorkerAgentGenerator']
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Copyright 2017 reinforce.io. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
# ==============================================================================
15+
16+
from __future__ import absolute_import
17+
from __future__ import print_function
18+
from __future__ import division
19+
20+
21+
class BaseRunner(object):
22+
"""
23+
Base class for all runner classes.
24+
Implements the `run` method.
25+
"""
26+
def __init__(self, agent, environment, repeat_actions=1, history=None):
27+
"""
28+
Args:
29+
agent (Agent): Agent object (or list of Agent objects) to use for the run.
30+
environment (Environment): Environment object (or list of Environment objects) to use for the run.
31+
repeat_actions (int): How many times the same given action will be repeated in subsequent calls to
32+
Environment's `execute` method. Rewards collected in these calls are accumulated and reported
33+
as a sum in the following call to Agent's `observe` method.
34+
history (dict): A dictionary containing an already run experiment's results. Keys should be:
35+
episode_rewards (list of rewards), episode_timesteps (lengths of episodes), episode_times (run-times)
36+
"""
37+
self.agent = agent
38+
self.environment = environment
39+
self.repeat_actions = repeat_actions
40+
41+
self.global_episode = None # the global episode number (across all (parallel) agents)
42+
self.global_timestep = None # the global time step (across all (parallel) agents)
43+
44+
self.start_time = None # TODO: is this necessary here? global start time (episode?, overall?)
45+
46+
# lists of episode data (rewards, wall-times/timesteps)
47+
self.episode_rewards = None # list of accumulated episode rewards
48+
self.episode_timesteps = None # list of total timesteps taken in the episodes
49+
self.episode_times = None # list of durations for the episodes
50+
51+
self.reset(history)
52+
53+
def reset(self, history=None):
54+
"""
55+
Resets the Runner's internal stats counters.
56+
If history is empty, use default values in history.get().
57+
58+
Args:
59+
history (dict): A dictionary containing an already run experiment's results. Keys should be:
60+
episode_rewards (list of rewards), episode_timesteps (lengths of episodes), episode_times (run-times)
61+
"""
62+
if not history:
63+
history = dict()
64+
65+
self.episode_rewards = history.get("episode_rewards", list())
66+
self.episode_timesteps = history.get("episode_timesteps", list())
67+
self.episode_times = history.get("episode_times", list())
68+
69+
def close(self):
70+
"""
71+
Should perform clean up operations on Runner's Agent(s) and Environment(s).
72+
"""
73+
raise NotImplementedError
74+
75+
def run(self, num_episodes, num_timesteps, max_episode_timesteps, deterministic, episode_finished, summary_report,
76+
summary_interval):
77+
"""
78+
Executes this runner by starting to act (via Agent(s)) in the given Environment(s).
79+
Stops execution according to certain conditions (e.g. max. number of episodes, etc..).
80+
Calls callback functions after each episode and/or after some summary criteria are met.
81+
82+
Args:
83+
num_episodes (int): Max. number of episodes to run globally in total (across all threads/workers).
84+
num_timesteps (int): Max. number of time steps to run globally in total (across all threads/workers)
85+
max_episode_timesteps (int): Max. number of timesteps per episode.
86+
deterministic (bool): Whether to use exploration when selecting actions.
87+
episode_finished (callable): A function to be called once an episodes has finished. Should take
88+
a BaseRunner object and some worker ID (e.g. thread-ID or task-ID). Can decide for itself
89+
every how many episodes it should report something and what to report.
90+
summary_report (callable): Deprecated; Function that could produce a summary over the training
91+
progress so far.
92+
summary_interval (int): Deprecated; The number of time steps to execute (globally)
93+
before summary_report is called.
94+
"""
95+
raise NotImplementedError
96+
97+
# keep backwards compatibility
98+
@property
99+
def episode(self):
100+
"""
101+
Deprecated property `episode` -> global_episode.
102+
"""
103+
return self.global_episode
104+
105+
@property
106+
def timestep(self):
107+
"""
108+
Deprecated property `timestep` -> global_timestep.
109+
"""
110+
return self.global_timestep
111+
112+

0 commit comments

Comments
 (0)