Basic categorization of RL

We can gain cool insights by knowing types of RL algorithm:


Policy gradient: Learn by directly adjusting policy to maximize the reward. E.g. Shoot the basketball. If you did not make it (low reward), then you’ll adjust how you shoot the ball.

Value-based: Learn by core values. E.g. we know it is good to be optimistic. So we can still trust ourselves in difficult situations because we have this value function to help us know what to do.

Actor-Critic: Learn by having teachers to teach us how to do well. E.g. when we grow up, our parents teach us to do some things (e.g. to treat others well) and not to do some things (e.g. to bully others). We are actors. Parents are our critics. When being criticized, we might adjust our policy/value function/model.

Model-based: Learn the model of our world. E.g. We know if we release our cup in the air, it will fall to the ground due to gravity on Earth.

It is not difficult to know we actually use all these methods to learn.


The pictures above come from Deep RL course in UCB.


Why RL can learn from discrete reward?

One of the reasons is the expectation is reward (r) x probability (p) . Because p is continuous, it can smooth r.

For example, r = +1/-1 in the driving scheme below, but p might be 0.899 (if you drive 1000 times, you will fall 101 times). As a result, you get a smoothed expectation.


The picture above comes from Deep RL course taught by Sergey Levine.


Dive into TensorFlow (3) – Init the variable

The tf.Variable class creates a tensor with an initial value that can be modified, much like a normal Python variable. This tensor stores its state in the session, so you must initialize the state of the tensor manually.

We use the tf.global_variables_initializer() function to initialize the state of all the Variable tensors.

init = tf.global_variables_initializer()
with tf.Session() as sess:

The tf.global_variables_initializer() call returns an operation that will initialize all TensorFlow variables from the graph. You call the operation using a session to initialize all the variables as shown above.


Dive into TensorFlow (2) – Feeding dataset to session

x = tf.placeholder(tf.string)

with tf.Session() as sess:
    output =, feed_dict={x: 'Hello World'})

The feed_dict parameter in help us set the placeholder tensor. The above example shows the tensor x being set to the string "Hello, world".

It’s also possible to set more than one tensor using feed_dict as shown below.

x = tf.placeholder(tf.string)
y = tf.placeholder(tf.int32)
z = tf.placeholder(tf.float32)

with tf.Session() as sess:
    output =, feed_dict={x: 'Test String', y: 123, z: 45.67

How to install and setup MuJoCo 1.31?

OK, first you want to make sure you have this kind of setting by following older installation guide in MuJoCo repo:


Next, you can run the test by going to the bin and run./test ../model/humanoid.xml xs 10  :

ros@ros-K401UB:~/.mujoco/mjpro131/bin$ ./test ../model/humanoid.xml xs 10

Comparison of original and saved model
Max difference : 2.61e-06
Field name : body_mass

Simulation ..........
Simulation time : 0.64 s
Realtime factor : 15.59 x
Time per step : 0.128 ms
Contacts per step : 9
Constraints per step : 39
Degrees of freedom : 27

In my case, I have to copy my mjkey.txt to the ~/.mujoco/mjpro131/bin in order to run successfully:




Dive into TensorFlow (1) – Tensor and Session


In TensorFlow, data isn’t stored as integers, floats, or strings. These values are encapsulated in an object called a tensor. Tensors come in a variety of sizes as shown below:

# A is a 0-dimensional int32 tensor
A = tf.constant(1234) 
# B is a 1-dimensional int32 tensor
B = tf.constant([123,456,789]) 
# C is a 2-dimensional int32 tensor
C = tf.constant([ [123,456,789], [222,333,444] ])

tf.constant() is one of many TensorFlow operations. The tensor returned by tf.constant() is called a constant tensor, because the value of the tensor never changes.


TensorFlow’s API is built around the idea of a computational graph, a way of visualizing a mathematical process. A “TensorFlow Session” is an environment for running a graph. The session is in charge of allocating the operations to GPU(s) and/or CPU(s), including remote machines.

Hello World

import tensorflow as tf

# Create TensorFlow object called hello_constant
hello_constant = tf.constant('Hello World!')

with tf.Session() as sess:
    # Run the tf.constant operation in the session
    output =

The code first creates the tensor, hello_constant, from the previous lines. The next step is to evaluate the tensor in a session.

The code creates a session instance, sess, using tf.Session. The function then evaluates the tensor and returns the results.

Robotics, ROS

How to solve ResourceNotFound: gazebo_worlds error when running simulated PR2 for gps?

When I tried to launch the simulated PR2 by

roslaunch gps_agent_pkg pr2_gazebo.launch

This error occurred:



ros@ros-K401UB:~/research/gps/src/gps_agent_pkg$ roslaunch gps_agent_pkg pr2_gazebo.launch
... logging to /home/ros/.ros/log/10b10e84-bbe8-11e7-944c-82ea96c4b45e/roslaunch-ros-K401UB-7307.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

Traceback (most recent call last):
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 307, in main
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 268, in start
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 217, in _start_infrastructure
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 132, in _load_config
 roslaunch_strs=self.roslaunch_strs, verbose=self.verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 451, in load_config_default
 loader.load(f, config, verbose=verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 746, in load
 self._load_launch(launch, ros_config, is_core=core, filename=filename, argv=argv, verbose=verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 718, in _load_launch
 self._recurse_load(ros_config, launch.childNodes, self.root_context, None, is_core, verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 682, in _recurse_load
 val = self._include_tag(tag, context, ros_config, default_machine, is_core, verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 95, in call
 return f(*args, **kwds)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 625, in _include_tag
 default_machine, is_core, verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 682, in _recurse_load
 val = self._include_tag(tag, context, ros_config, default_machine, is_core, verbose)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 95, in call
 return f(*args, **kwds)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 587, in _include_tag
 inc_filename = self.resolve_args(tag.attributes['file'].value, context)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 183, in resolve_args
 return substitution_args.resolve_args(args, context=context.resolve_dict, resolve_anon=self.resolve_anon)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 316, in resolve_args
 resolved = _resolve_args(resolved, context, resolve_anon, commands)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 329, in _resolve_args
 resolved = commands[command](resolved, a, args, context)
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 142, in _find
 File "/opt/ros/indigo/lib/python2.7/dist-packages/roslaunch/", line 188, in _find_executable
 full_path = _get_executable_path(rp.get_path(args[0]), path)
 File "/usr/lib/python2.7/dist-packages/rospkg/", line 203, in get_path
 raise ResourceNotFound(name, ros_paths=self._ros_paths)
ResourceNotFound: gazebo_worlds
ROS path [0]=/opt/ros/indigo/share/ros
ROS path [1]=/home/ros/rosbuild_ws/package_dir
ROS path [2]=/opt/ros/indigo/share
ROS path [3]=/opt/ros/indigo/stacks
ROS path [4]=/home/ros/research/gps
ROS path [5]=/home/ros/research/gps/src/gps_agent_pkg

This happened because gazebo_worlds only exists in the version before ROS Groovy. So,  you can modify the launch file by rosed gps_agent_pkg pr2_gazebo_no_controller.launch :

 <!-- Use the following for ROS hydro or later: <include file="$(find gazebo_ros)/launch/empty_world.launch"> -->
 <include file="$(find gazebo_ros)/launch/empty_world.launch">
 <include file="$(find pr2_gazebo)/launch/pr2_no_controllers.launch" />

After modification, the error changes. Obviously, this is another problem (import glob error).



Robotics, ROS

How to solve caffe.hpp: No such file or directory error when compiling gps_agent_pkg?

When I was make -j the gps_agent_pkg package, I met this error:


[100%] Building CXX object CMakeFiles/gps_agent_lib.dir/src/util.cpp.o
Building CXX object CMakeFiles/gps_agent_lib.dir/src/neuralnetworkcaffe.cpp.o
Building CXX object CMakeFiles/gps_agent_lib.dir/src/caffenncontroller.cpp.o
/home/ros/research/gps/src/gps_agent_pkg/src/caffenncontroller.cpp:1:27: fatal error: caffe/caffe.hpp: No such file or directory
#include "caffe/caffe.hpp"
compilation terminated.
make[2]: *** [CMakeFiles/gps_agent_lib.dir/src/caffenncontroller.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from /home/ros/research/gps/src/gps_agent_pkg/src/neuralnetworkcaffe.cpp:1:0:
/home/ros/research/gps/src/gps_agent_pkg/include/gps_agent_pkg/neuralnetworkcaffe.h:10:27: fatal error: caffe/caffe.hpp: No such file or directory
#include "caffe/caffe.hpp"
compilation terminated.
In file included from /home/ros/research/gps/src/gps_agent_pkg/include/gps_agent_pkg/caffenncontroller.h:10:0,
from /home/ros/research/gps/src/gps_agent_pkg/src/robotplugin.cpp:16:
/home/ros/research/gps/src/gps_agent_pkg/include/gps_agent_pkg/neuralnetworkcaffe.h:10:27: fatal error: caffe/caffe.hpp: No such file or directory
#include "caffe/caffe.hpp"
compilation terminated.

And this error should be solved bymake distribute in the caffe dir. Aftermake distribute successfully in the caffe dir, I compiledgps_agent_pkg.