Add deep perception tutorial

2 years ago · 9a7ae2d269
--- a/align_to_aruco.md
+++ b/align_to_aruco.md
@ -1,4 +1,4 @@
 ## Align to ArUco
 # Align to ArUco
 ArUco markers are a type of fiducials that are used extensively in robotics for identification and pose estimation. In this tutorial we will learn how to identify ArUco markers with the ArUco detection node and enable Stretch to navigate and align itself with respect to the marker.

 ## ArUco Detection
@ -23,7 +23,7 @@ Since we want Stretch to align with respect to the marker we define a 0.5m offse

 Luckily, we know how to command Stretch to execute a trajectory using the joint trajectory server. If you are just starting, have a look at the tutorial to know how to command Stretch using the Joint trajectory Server.

 # Warnings
 ## Warnings
 Since we won't be using the arm for this demo, it's safer to stow Stretch's arm in. Execute the command:
 ```bash
 stretch_robot_stow.py
@ -37,6 +37,8 @@ ros2 launch stretch_core align_to_aruco.launch.py
 ```

 ## Code Breakdown
 Let's jump into the code to see how things work under the hood. Follow along here to have a look at the entire script.

 We make use of two separate Python classes for this demo. The FrameListener() class is derived from the Node class and is the place where we compute the TF transformations. For an explantion of this class, you can refer to the TF listener tutorial.
 ```python
 class FrameListener(Node):
--- a/deep_perception.md
+++ b/deep_perception.md
@ -1,21 +1,192 @@
 # Deep Perception
 Ever wondered if there is a way to make a robot do awesome things without explicitly having to program it to do so? Deep Perception is a branch of Deep Learning that enables sensing the elements that make up an environment with the help of artificial neural networks without writing complicated code. Well, almost. The most wonderful thing about Stretch is that it comes preloaded with software that makes it a breeze to get started with topics such as Deep Learning. In this tutorial, we will deploy deep neural networks on Stretch using two popular Deep Learning frameworks, namely, PyTorch and OpenVino.

 ## YOLOv5 with PyTorch
 PyTorch is an open source end-to-end machine learning framework that makes many pretrained production quality neural networks available for general use. In this tutorial we will use the YOLOv5s model trained on the COCO dataset.

 YOLOv5 is a popular object detection model that divides the supplied image into a grid and detects objects in each cell of the grid recursively. The YOLOv5s model that we have deployed on Stretch has been pretrained on the COCO dataset which allows Stretch to detect a wide range of day to day objects. However, that’s not all, in this demo we want to go a step further and use this extremely versatile object detection model to extract useful information about the scene.
 YOLOv5 is a popular object detection model that divides a supplied image into a grid and detects objects in each cell of the grid recursively. The YOLOv5s model that we have deployed on Stretch has been pretrained on the COCO dataset which allows Stretch to detect a wide range of day to day objects. However, that’s not all, in this demo we want to go a step further and use this extremely versatile object detection model to extract useful information about the scene.

 ## Extracting Bounding Boxes and Depth Information
 Often, it’s not enough to simply identify an object. Stretch is a mobile manipulator and its job is to manipulate objects in its environment. But before it can do that, it needs information of where exactly the object is located with respect to itself so that a motion plan to reach the object can be generated. This is possible by knowing which pixels correspond to the object of interest in the image frame and then using that to extract the depth information in the camera frame. Once we have this information, it is possible to compute a transform of these points in the end effector frame for Stretch to generate a motion plan. 

 Often, it’s not enough to simply identify an object. Stretch is a mobile manipulator and its job is to manipulate objects in its environment. But before we do that, it needs information of where exactly the object is located with respect to itself so that a motion plan to reach the object can be generated. This is possible by knowing which pixels correspond to the object of interest in the image frame and then using that to extract the depth information in the camera frame. Once we have this information, it is possible to compute a transform of these points in the end effector frame for Stretch to generate a motion plan. 
 For the sake of brevity, we will limit the scope of this tutorial to drawing bounding boxes around objects of interest to point to pixels in the image frame, and drawing a detection plane corresponding to depth pixels in the camera frame.

 For the sake of brevity, we will limit the scope of this tutorial to drawing bounding boxes around objects of interest to point to pixels in the image frame and drawing a detection plane corresponding to depth pixels in the camera frame. Go ahead and execute the following command to run the inference and visualize the detections in RViz:
 ## See It In Action
 Go ahead and execute the following command to run the inference and visualize the detections in RViz:

 ```
 ros2 launch stretch_deep_perception stretch_detect_objects.launch.py
 ```

 Voila! You just executed the first deep learning model on Stretch!
 Voila! You just executed your first deep learning model on Stretch!

 ## Code Breakdown
 Luckily, the stretch_deep_pereption package is extremely modular and is designed to work with a wide array of detectors. Although most of the heavy lifting in this tutorial is being done by the neural network, let's attempt to breakdown the code into funtional blocks to understand the detection pipeline.

 The control flow begins with executing the detect_objects.py node. In the main() function, we create an instance of the ObjectDetector class from the object_detect_pytorch.py script where we configure the YOLOv5s model. Next, we pass this detector to an instance of the DetectionNode class from the detection_node.py script and call the main function.
 ```python
 def main():
    confidence_threshold = 0.0
    detector = od.ObjectDetector(confidence_threshold=confidence_threshold)
    default_marker_name = 'object'
    node_name = 'DetectObjectsNode'
    topic_base_name = 'objects'
    fit_plane = False
    node = dn.DetectionNode(detector, default_marker_name, node_name, topic_base_name, fit_plane)
    node.main()
 ```

 Let's skim through the object_detect_pytorch.py script to understand the configuration. The constructor is where we load the pretrained YOLOv5s model using the torch.hub.load() PyTorch wrapper. We set the confidence threshold to be 0.2, which says that a detection is only considered valid if the probability is higher than 0.2. This can be tweaked, although lower numbers often result in false positives and higher numbers often disregard blurry or smaller valid objects.
 ```python
 class ObjectDetector:
    def __init__(self, confidence_threshold=0.2):
        # Load the models
        self.model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom
        self.confidence_threshold = confidence_threshold
 ```
 The apply_to_image() method passes the stream of RGB images from the realsense camera to the YOLOv5s model and returns detections in the form of a dictionary consisting of class_id, label, confidence and bouding box coordinates. The last part is exactly what we need for further computations.
 ```python
 def apply_to_image(self, rgb_image, draw_output=False):
        results = self.model(rgb_image)
 
        ...

        if draw_output:
            output_image = rgb_image.copy()
            for detection_dict in results:
                self.draw_detection(output_image, detection_dict)

        return results, output_image
 ```

 This method calls the draw_detection() method to draw bounding boxes with the object labels and confidence thresholds over detected objects in the image using OpenCV.
 ```python
 def draw_detection(self, image, detection_dict):
        ...

        cv2.rectangle(image, (x_min, y_min), (x_max, y_max), color, rectangle_line_thickness)

        ...

        cv2.rectangle(image, (label_x_min, label_y_min), (label_x_max, label_y_max), (255, 255, 255), cv2.FILLED)
        cv2.putText(image, output_string, (text_x, text_y), font, font_scale, line_color, line_width, cv2.LINE_AA)
 ```

 Next, the script detection_node.py contains the class DetectionNode which is the main ROS node that subscribes to the RGB and depth images from the realsense camera and feeds them to the detector to run inference. The image_callback() method runs in a loop to subscribe to synchronized RGB and depth images. The RGB images are then rotated 90 degrees and passed to the apply_to_image() method. The returned output image is published on the visualize_object_detections_pub publisher, while the detections_2d dictionary is passed to the detections_2d_to_3d() method for further processing and drawing the detection plane. For detectors that also return markers and axes, it also publishes this information.
 ```python
 def image_callback(self, ros_rgb_image, ros_depth_image, rgb_camera_info):
        ...

        detection_box_image = cv2.rotate(self.rgb_image, cv2.ROTATE_90_CLOCKWISE)

        ...
        
        detections_2d, output_image = self.detector.apply_to_image(detection_box_image, draw_output=debug_output)

 That’s not it. Detecting objects is just one thing Stretch can do well, it can also detect people and their faces. We will be using Intel’s OpenVINO toolkit with OpenCV to achieve this. Like PyTorch, OpenVINO is a toolkit to optimize and deploy machine learning inference popularized by Intel that can utilize hardware acceleration dongles such as the Intel Neural Compute Stick with Intel based compute architectures. More convenient is the fact that most of the neural network models in the Open Model Zoo are accessible and configurable using the familiar OpenCV API with the opencv-python-inference-engine library extension. Fortunately, these packages come preinstalled with Stretch to make it easy for us to hit the ground running!
        ...

 With that, let’s jump right into it! The cool thing about the model we are using is that it not only detects human faces, but also detects important features of the human face such as the eyes, nose and the lips. This is important in the context of precise assistive tasks such as feeding and combing the hair where we want to know the exact location of the facial features the end effector must reach. Alright! Let’s execute the following command to see what it looks like:
        if output_image is not None:
            output_image = ros2_numpy.msgify(Image, output_image, encoding='rgb8')
            if output_image is not None:
                self.visualize_object_detections_pub.publish(output_image)

        detections_3d = d2.detections_2d_to_3d(detections_2d, self.rgb_image, self.camera_info, self.depth_image, fit_plane=self.fit_plane, min_box_side_m=self.min_box_side_m, max_box_side_m=self.max_box_side_m)

        ...
        
        if self.publish_marker_point_clouds: 
            for marker in self.marker_collection:
                marker_points = marker.get_marker_point_cloud()
                self.add_point_array_to_point_cloud(marker_points)
                publish_plane_points = False
                if publish_plane_points: 
                    plane_points = marker.get_plane_fit_point_cloud()
                    self.add_point_array_to_point_cloud(plane_points)
            self.publish_point_cloud()
        self.visualize_markers_pub.publish(marker_array)
        if axes_array is not None: 
            self.visualize_axes_pub.publish(axes_array)
 ```

 ## Face Detection, Facial Landmarks Detection and Head Pose Estimation with OpenVINO and OpenCV
 OpenVINO is a toolkit popularized by Intel to optimize and deploy machine learning inference that can utilize hardware acceleration dongles such as the Intel Neural Compute Stick with Intel based compute architectures. More convenient is the fact that most of the deep learning models in the Open Model Zoo are accessible and configurable using the familiar OpenCV API with the opencv-python-inference-engine library.

 With that, let’s jump right into it! Detecting objects is just one thing Stretch can do well, it can do much more using pretrained models. For this part of the tutorial, we will be using Intel’s OpenVINO toolkit with OpenCV. The cool thing about this demo is that it uses three different models in tandem to not just detect human faces, but also important features of the human face such as the eyes, nose and the lips with head pose information. This is important in the context of precise manipulation tasks such as assisted feeding where we want to know the exact location of the facial features the end effector must reach. 

 ## See It In Action
 First, let’s execute the following command to see what it looks like:

 ```
 ros2 launch stretch_deep_perception stretch_detect_faces.launch.py
 ```
 ```

 ## Code Breakdown
 Ain't that something! If you followed the breakdown in object detection, you'll find that the only change if you are looking to detect faces instead of objects is in using a different deep learning model that does just that. For this, we will explore how to use the OpenVINO toolkit. Let's head to the detect_faces.py node to begin. 

 In the main() method, we see a similar structure as with the object detction node. We first create an instance of the detector using the HeadPoseEstimator class from the head_estimator.py script to configure the deep learning models. Next, we pass this to an instance of the DetectionNode class from the detection_node.py script and call the main function.
 ```python
    ...
        
    detector = he.HeadPoseEstimator(models_directory,
                                    use_neural_compute_stick=use_neural_compute_stick)
    default_marker_name = 'face'
    node_name = 'DetectFacesNode'
    topic_base_name = 'faces'
    fit_plane = False
    node = dn.DetectionNode(detector,
                            default_marker_name,
                            node_name,
                            topic_base_name,
                            fit_plane,
                            min_box_side_m=min_head_m,
                            max_box_side_m=max_head_m)
    node.main()
 ```

 In addition to detecting faces, this class also enables detecting facial landmarks as well as estimating head pose. The constructor initializes and configures three separate models, namely head_detection_model, head_pose_model and landmarks_model,  with the help of the renamed_cv2.dnn.readNet() wrappers. Note that renamed_cv2 is simply the opencv_python_inference_engine library compiled under a different namespace for using with Stretch so as not to conflict with the regular OpenCV library and having functionalities from both available to users concurrently.
 ```python
 class HeadPoseEstimator:
    def __init__(self, models_directory, use_neural_compute_stick=False):
        ...

        self.head_detection_model = renamed_cv2.dnn.readNetFromCaffe(head_detection_model_prototxt_filename, head_detection_model_caffemodel_filename)
        dm.print_model_info(self.head_detection_model, 'head_detection_model')
        
        ...

        self.head_pose_model = renamed_cv2.dnn.readNet(head_pose_weights_filename, head_pose_config_filename)
        
        ...

        self.landmarks_model = renamed_cv2.dnn.readNet(landmarks_weights_filename, landmarks_config_filename)
 ```

 The apply_to_image() method calls individual methods like detect_faces(), estimate_head_pose() and detect_facial_landmarks() that each runs the inference using the models we configured above. The bounding_boxes from the face detection model are used to supply the cropped image of the faces to head pose and facial landmark models to make their job way more efficient.
 ```python
 def apply_to_image(self, rgb_image, draw_output=False):
        ...

        boxes = self.detect_faces(rgb_image)
        
        facial_landmark_names = self.landmark_names.copy()
        for bounding_box in boxes:
            if draw_output: 
                self.draw_bounding_box(output_image, bounding_box)
            yaw, pitch, roll = self.estimate_head_pose(rgb_image, bounding_box, enlarge_box=True, enlarge_scale=1.15)
            if yaw is not None: 
                ypr = (yaw, pitch, roll)
                if draw_output: 
                    self.draw_head_pose(output_image, yaw, pitch, roll, bounding_box)
            else:
                ypr = None
            landmarks, landmark_names = self.detect_facial_landmarks(rgb_image, bounding_box, enlarge_box=True, enlarge_scale=1.15)
            if (landmarks is not None) and draw_output: 
                self.draw_landmarks(output_image, landmarks)
            heads.append({'box':bounding_box, 'ypr':ypr, 'landmarks':landmarks})

        return heads, output_image
 ```

 The detecion_node.py script then takes over as we saw with the object detection tutorial to publish the detections on pertinent topics. 

 Now go ahead and experiment with a few more pretrained models using PyTorch or OpenVINO on Stretch. If you are feeling extra motivated, try creating your own neural networks and training them. Stretch is ready to deploy them.
--- a/obstacle_avoider.md
+++ b/obstacle_avoider.md
@ -1,11 +1,11 @@
 ## Obstacle Avoider
 # Obstacle Avoider
 In this tutorial we will work with Stretch to detect and avoid obstacles using the onboard RPlidar A1 laser scanner and learn how to filter laser scan data. If you want to know more about the laser scanner setup on Stretch and how to get it up and running, we recommend visiting the previous tutorials on filtering laser scans and mobile base collision avoidance.

 A major drawback of using any ToF (Time of Flight) sensor is the inherent inaccuracies as a result of occlusions and weird reflection and diffraction phenomena the light pulses are subject to in an unstructured environment. This results in unexpected and undesired noise that can get in the way of an otherwise extremely useful sensor. Fortunately, it is easy to account for and eliminate these inaccuracies to a great extent by filering out the noise. We will do this with a ROS package called laser_filters that comes prebuilt with some pretty handy laser scan message filters.

 By the end of this tutorial, you will be able to tweak them for your particular use case and publish and visualize them on the /scan_filtered topic using RViz. So let’s jump in! We will look at three filters from this package that have been tuned to work well with Stretch in an array of scenarios.

 # LaserScan Filtering
 ## LaserScan Filtering
 LaserScanAngularBoundsFilterInPlace - This filter removes laser scans belonging to an angular range. For Stretch, we use this filter to discount points that are occluded by the mast because it is a part of Stretch’s body and not really an object we need to account for as an obstacle while navigating the mobile base.

 LaserScanSpeckleFilter - We use this filter to remove phantom detections in the middle of empty space that are a result of reflections around corners. These disjoint speckles can be detected as false positives and result in jerky motion of the base through empty space. Removing them returns a relatively noise-free scan.
@ -14,7 +14,7 @@ LaserScanBoxFilter - Stretch is prone to returning false detections right over t

 However, beware that filtering laser scans comes at the cost of a sparser scan that might not be ideal for all applications. If you want to tweak the values for your end application, you could do so by changing the values in the laser_filter_params.yaml file and by following the laser_filters package wiki. Also, if you are feeling zany and want to use the raw unfiltered scans from the laser scanner, simply subscribe to the /scan topic instead of the /scan_filtered topic.

 # Avoidance logic
 ## Avoidance logic
 Now, let’s use what we have learned so far to upgrade the collision avoidance demo in a way that Stretch is able to scan an entire room autonomously without bumping into things or people. To account for dynamic obstacles getting too close to the robot, we will define a keepout distance of 0.4 m - detections below this value stop the robot. To keep Stretch from getting too close to static obstacles, we will define another variable called turning distance of 0.75 m - frontal detections below this value make Stretch turn to the left until it sees a clear path ahead.

 Building up on the teleoperation using velocity commands tutorial, let's implement a simple logic for obstacle avoidance. The logic can be broken down into three steps:
@ -22,7 +22,7 @@ Building up on the teleoperation using velocity commands tutorial, let's impleme
 2. If the minimum value from the frontal scans is less than 0.75 m then turn to the right until this is no longer true
 3. If the minimum value from the overall scans is less than 0.4 m then stop the robot

 # Warnings
 ## Warnings
 If you see Stretch try to run over your lazy cat or headbutt a wall, just press the bright runstop button on Stretch's head to calm it down. For pure navigation tasks, it's also safer to stow Stretch's arm in. Execute the command:
 ```bash
 stretch_robot_stow.py
@ -35,6 +35,8 @@ ros2 launch stretch_core rplidar_keepout.launch.py
 ```

 ## Code Breakdown:
 Let's jump into the code to see how things work under the hood. Follow along here to have a look at the entire script.

 The turning distance is defined by the distance attribute and the keepout distance is defined by the keepout attribute.
 ```python
        self.distance = 0.75 # robot turns at this distance