Models

GCP-HOLO uses Stable Baselines 3 for reinforcement learning training, but includes some customization to make sure that it works for the Mech Gym environment. Stable Baselines 3 is a popular library for RL training that provides a set of pre-implemented algorithms, such as Proximal Policy Optimization (PPO) and Deep Q-Network (DQN).

GCP-HOLO customizes the Stable Baselines 3 algorithms to work specifically with the Mech Gym environment, which is a custom environment designed for path synthesis of linkage systems. The Mech Gym environment includes a specific action space to enhance the efficiency, the customization also include maksing invalid actions determined from the scaffold nodes and makes sure that each of the models is selecting actions non-deterministically.

A2C

This is the custom A2C that GCP-HOLO uses.

class models.a2c.CustomActorCriticPolicy(observation_space: ~gym.spaces.space.Space, action_space: ~gym.spaces.space.Space, lr_schedule: ~typing.Callable[[float], float], net_arch: ~typing.Optional[~typing.List[~typing.Union[int, ~typing.Dict[str, ~typing.List[int]]]]] = None, activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, *args, **kwargs)[source]

Bases: ActorCriticPolicy

Custom Actor Critic Policy for GCP-HOLO

evaluate_actions(obs: Tensor, actions: Tensor)[source]

Evaluate actions according to the current policy, given the observations.

Parameters

obs –
actions –

Returns

estimated value, log likelihood of taking those actions and entropy of the action distribution.

forward(obs: Tensor, deterministic: bool = False)[source]

Forward pass in all the networks (actor and critic)

Parameters

obs – Observation
deterministic – Whether to sample or use deterministic actions

Returns

action, value and log probability of the action

get_distribution(obs: Tensor)[source]

Get the current policy distribution given the observations.

Parameters: obs –
Returns: the action distribution.

training: bool

DQN

This is the custom DQN that GCP-HOLO uses.

class models.dqn.CustomDQN(policy: Union[str, Type[DQNPolicy]], env: Union[Env, VecEnv, str], learning_rate: Union[float, Callable[[float], float]] = 0.0001, buffer_size: int = 1000000, learning_starts: int = 50000, batch_size: int = 32, tau: float = 1.0, gamma: float = 0.99, train_freq: Union[int, Tuple[int, str]] = 4, gradient_steps: int = 1, replay_buffer_class: Optional[ReplayBuffer] = None, replay_buffer_kwargs: Optional[Dict[str, Any]] = None, optimize_memory_usage: bool = False, target_update_interval: int = 10000, exploration_fraction: float = 0.1, exploration_initial_eps: float = 1.0, exploration_final_eps: float = 0.05, max_grad_norm: float = 10, tensorboard_log: Optional[str] = None, create_eval_env: bool = False, policy_kwargs: Optional[Dict[str, Any]] = None, verbose: int = 0, seed: Optional[int] = None, device: Union[device, str] = 'auto', _init_setup_model: bool = True)[source]

Bases: DQN

predict(observation: ndarray, state: Optional[Tuple[ndarray, ...]] = None, episode_start: Optional[ndarray] = None, deterministic: bool = False)[source]

Overrides the base_class predict function to include epsilon-greedy exploration.

Parameters

observation – the input observation
state – The last states (can be None, used in recurrent policies)
episode_start – The last masks (can be None, used in recurrent policies)
deterministic – Whether or not to return deterministic actions.

Returns

the model’s action and the next state (used in recurrent policies)

train(gradient_steps: int, batch_size: int = 100)[source]: Sample the replay buffer and do the updates (gradient descent and update target networks)

class models.dqn.CustomDQNPolicy(observation_space: ~gym.spaces.space.Space, action_space: ~gym.spaces.space.Space, lr_schedule: ~typing.Callable[[float], float], net_arch: ~typing.Optional[~typing.List[int]] = None, activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, features_extractor_class: ~typing.Type[~stable_baselines3.common.torch_layers.BaseFeaturesExtractor] = <class 'stable_baselines3.common.torch_layers.FlattenExtractor'>, features_extractor_kwargs: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None, normalize_images: bool = True, optimizer_class: ~typing.Type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adam.Adam'>, optimizer_kwargs: ~typing.Optional[~typing.Dict[str, ~typing.Any]] = None)[source]

Bases: DQNPolicy

Policy class with Q-Value Net and target net for DQN

Parameters

observation_space – Observation space
action_space – Action space
lr_schedule – Learning rate schedule (could be constant)
net_arch – The specification of the policy and value networks.
activation_fn – Activation function
features_extractor_class – Features extractor to use.
features_extractor_kwargs – Keyword arguments to pass to the features extractor.
normalize_images – Whether to normalize images or not, dividing by 255.0 (True by default)
optimizer_class – The optimizer to use, th.optim.Adam by default
optimizer_kwargs – Additional keyword arguments, excluding the learning rate, to pass to the optimizer

make_q_net()[source]

training: bool

class models.dqn.CustomQNetwork(observation_space: ~gym.spaces.space.Space, action_space: ~gym.spaces.space.Space, features_extractor: ~torch.nn.modules.module.Module, features_dim: int, net_arch: ~typing.Optional[~typing.List[int]] = None, activation_fn: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.ReLU'>, normalize_images: bool = True)[source]

Bases: QNetwork

Action-Value (Q-Value) network for DQN

Parameters

observation_space – Observation space
action_space – Action space
net_arch – The specification of the policy and value networks.
activation_fn – Activation function
normalize_images – Whether to normalize images or not, dividing by 255.0 (True by default)

forward(obs: Tensor)[source]

Predict the q-values.

Parameters: obs – Observation
Returns: The estimated Q-Value for each action.

training: bool

GCN

This is the graph convolution policy network adopted from You et al.

class models.gcpn.GNN(observation_space, max_nodes, num_features, hidden_channels=64, out_channels=64, normalize=False, batch_normalization=False, lin=True, add_loop=False)[source]

Bases: BaseFeaturesExtractor

Graph Convolution network: adopted from Zhao et. al “Robogrammar”

Args:: observation_space (gym.observation): The observation space of the gym environment max_nodes (int): maximum number of nodes for linkage graph num_features (int): number of points in the trajectory to describe the node features hidden_channels (int, optional): hidden channels for the Dense SAGE convolutions. Defaults to 64. out_channels (int, optional): number of output features. Defaults to 64. normalize (bool, optional): normalization used in Dense SAGE. Defaults to False. batch_normalization (bool, optional): Batch Normalization used. Defaults to False. lin (bool, optional): Add linear layer to the end. Defaults to True. add_loop (bool, optional): Add self loops. Defaults to False.

bn(i, x)[source]

forward(observations)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

Random Search

This is the random search method for applying random actions for generating linkages.

models.random_search.random_search(env, episodes=100)[source]

random search for linkage graph generation

Parameters

env (gym.env) – linkag_gym
episodes (int, optional) – number of linkage graphs to generate. Defaults to 100.

Returns

Best designs from search, all rewards, all designs, all episode lengths

Return type

(dict, list, list, list)