You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

344 lines
19 KiB

11 months ago
11 months ago
  1. # Tutorial Stretch OpenAI Chat
  2. This tutorial introduces the API from OpenAI and explains how to implement it in Stretch to make basic movement.
  3. ## Explore the API
  4. The OpenAI API is a sophisticated language model, design to assist users with a range of [models](https://platform.openai.com/docs/models) with different capabilities and price points, as well as the ability to fine-tune custom models. There are some key concepts to understand better what we can do, lets focus only in the text generation models and the tokens.
  5. OpenAI's text generation models, such as GPT-4 and GPT-3.5, have undergone extensive training to comprehend both natural and formal language. These models, like GPT-4, are capable of generating text based on given inputs, often referred to as "prompts". To effectively utilize a model like GPT-4, the process involves designing prompts which essentially serve as instructions or examples to guide the model in successfully completing a given task. GPT-4 can be applied to a wide range of tasks including content or code generation, summarization, conversation, creative writing, and more.
  6. This text generation, process text in chunks called tokens. Tokens represent commonly ocurring sequences of characters. You can checkout the OpenAI's [tokenizer tool](https://platform.openai.com/tokenizer) to test specific strings and see how they are translated into tokens. Why are tokens so important? It's simple, because depending in the number of tokens you use, as well as the model (text generation, image or audio models), it will cost money. The good news, it's not that expensive and it's really useful, just be careful when dealing with image generation, as the cost is calculated per image, in contrast to text, which is priced per 1,000 tokens, or audio, which is billed per minute. You can take a look at the [pricing](https://openai.com/pricing) page from OpenAI for more information.
  7. In this tutorial, we are using the GPT-3.5-turbo model, one of the newer text generation models alongside GPT-4 and GPT-4-turbo, we will use this model alongside the [Chat Completion API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api), this will help us with our model,so that we can "chat" with Stretch and command some basic movements using natural language. If you want to know more about this and maybe create some applications, take a look at [this examples](https://platform.openai.com/examples).
  8. ## Chat Completion API
  9. Before jumping into the Stretch Chat code, there are some things to know well about the [Chat Completion API](https://platform.openai.com/docs/guides/text-generation/chat-completions-api), take a look at this example from the documentation:
  10. ```python
  11. from openai import OpenAI
  12. client = OpenAI()
  13. response = client.chat.completions.create(
  14. model="gpt-3.5-turbo",
  15. messages=[
  16. {"role": "system", "content": "You are a helpful assistant."},
  17. {"role": "user", "content": "Who won the world series in 2020?"},
  18. {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
  19. {"role": "user", "content": "Where was it played?"}
  20. ]
  21. )
  22. ```
  23. As you can see, there are different roles in the chat completion, the system, the user and the assistant, each one with it's own content. This is the base for the Chat Completion and even the base to create your own chatbot, take a look at the roles:
  24. 1. The system: You will write direct instructions, sometimes the shorter and clearer is better, but if you have a long context for the AI to understand you need to be more specific (you'll see it in the tutorial)
  25. 2. The user: This will be your input text for the model to do something, it can be questions or even normal conversations, it depends on the context of the system as well, if you want the model to know everything about robotics and you ask something about chemistry or biotechnology it will output a message that it cannot process your request.
  26. 3. The assistant: Here you can help the model to understand what you are going to do, this can also be a pre crafted bot response, take a look at the tutorial to understand this better.
  27. ## Stretch Mobility with OpenAI
  28. !!! note
  29. For your safety, put stretch in an open area when you try this tutorial.
  30. For this tutorial, we'll guide Stretch to move around and perform actions by writing our instructions in natural language within the terminal, copy the next python code and paste it in your own folder, we are only going to use Stretch body and the OpenAI python library, if you haven't installed it yet don't worry, follow [this link](https://platform.openai.com/docs/quickstart/developer-quickstart) and read the quickstart guide, there you can create an OpenAI account and setup your API key as well, this is important and it's only yours so be careful where you save it! To install the library just write down in your terminal:
  31. ```{.bash .shell-prompt}
  32. pip3 install --upgrade openai
  33. ```
  34. Now going to the code:
  35. ```python
  36. from openai import OpenAI
  37. from stretch_body import robot
  38. import time
  39. client = OpenAI(api_key=("OPEN_AI_KEY")) # <---------- USE YOUR API KEY HERE
  40. def move_forward(robot):
  41. robot.base.translate_by(0.2)
  42. robot.push_command()
  43. robot.base.wait_until_at_setpoint()
  44. time.sleep(0.1)
  45. def move_backward(robot):
  46. robot.base.translate_by(-0.2)
  47. robot.push_command()
  48. robot.base.wait_until_at_setpoint()
  49. time.sleep(0.1)
  50. def turn_right(robot):
  51. robot.base.rotate_by(-1.57)
  52. robot.push_command()
  53. robot.base.wait_until_at_setpoint()
  54. time.sleep(0.1)
  55. def turn_left(robot):
  56. robot.base.rotate_by(1.57)
  57. robot.push_command()
  58. robot.base.wait_until_at_setpoint()
  59. time.sleep(0.1)
  60. def arm_front(robot):
  61. robot.arm.move_by(0.1)
  62. robot.push_command()
  63. robot.arm.wait_until_at_setpoint()
  64. time.sleep(0.1)
  65. def arm_back(robot):
  66. robot.arm.move_by(-0.1)
  67. robot.push_command()
  68. robot.arm.wait_until_at_setpoint()
  69. time.sleep(0.1)
  70. def lift_up(robot):
  71. robot.lift.move_by(0.1)
  72. robot.push_command()
  73. robot.lift.wait_until_at_setpoint()
  74. time.sleep(0.1)
  75. def lift_down(robot):
  76. robot.lift.move_by(-0.1)
  77. robot.push_command()
  78. robot.lift.wait_until_at_setpoint()
  79. time.sleep(0.1)
  80. stretch_actions = {"move_forward" : "Move the robot forward 0.2m",
  81. "move_backward": "Move the robot backward 0.2m",
  82. "turn_right": "Turn the robot 90 degrees to the clockwise",
  83. "turn_left": "Turn the robot 90 degrees to the counter clockwise",
  84. "arm_front": "Move the arm to the front 0.1m",
  85. "arm_back": "Move the arm to the back 0.1m",
  86. "lift_up": "Move the lift up 0.1m",
  87. "lift_down": "Move the lift down 0.1m"}
  88. stretch_actions_fn = {"move_forward" : move_forward,
  89. "move_backward": move_backward,
  90. "turn_right": turn_right,
  91. "turn_left": turn_left,
  92. "arm_front": arm_front,
  93. "arm_back": arm_back,
  94. "lift_up": lift_up,
  95. "lift_down": lift_down}
  96. def chatter(input_text):
  97. # Populate Assistance prompt
  98. assistance_msg = "Here is the description for each robot motion"
  99. for k in stretch_actions.keys():
  100. assistance_msg = assistance_msg + f"\n - {k} : {stretch_actions[k]} "
  101. # Define System prompt (Personality of the system)
  102. system_prompt = f"Assume you are a mobile robot and you are able to receive a natural language instrunctions regarding the robot's movements. Based on undestanding the instructions return a sequence of discrete actions from this list {list(stretch_actions.keys())}. The only output must be in two parts. The first part should explain the sequence and second part should only be the list the actions seperated by comma. The first part you will explain the list of actions to follow and the reason behind it. The second one must be the list of movements that we are going to use, separate them only with the '[]'. This is not an explanation, it must be only the list of movements"
  103. response = client.chat.completions.create(
  104. model="gpt-3.5-turbo",
  105. messages=[
  106. {"role": "system", "content": system_prompt},
  107. {"role": "assistant", "content": assistance_msg},
  108. {"role": "user", "content": input_text},
  109. ]
  110. )
  111. response_text = response.choices[0].message.content.strip().lower()
  112. print(f"CHATGPT RESPONSE: {response_text}")
  113. return response_text
  114. def extract_action_sequence(response_text):
  115. # In the response text find the list that starts and ends with []
  116. start_index = response_text.find("[")
  117. end_index = response_text.find("]", start_index)
  118. if start_index != -1 and end_index != -1:
  119. movements_str = response_text[start_index + 1:end_index]
  120. # Split the comma-separated movements into a list. The .strip("'\x22") is used to remove both single and double quotes from the beginning and end of each movement.
  121. formated_list = [movement.strip().strip("'\x22") for movement in movements_str.split(",")]
  122. print(formated_list)
  123. return formated_list
  124. else:
  125. print("List of movements not found in the response.")
  126. return []
  127. def execute_robot_motions(final_motion_list):
  128. rb = robot.Robot()
  129. rb.startup()
  130. print("Starting to execute motions....")
  131. for motion_key in final_motion_list:
  132. print(f"Executing motion: {motion_key}")
  133. stretch_actions_fn[motion_key](rb)
  134. print("Completed Executing motions...")
  135. rb.stop()
  136. def stretch_chatter(input_text):
  137. response = chatter(input_text)
  138. motion_list = extract_action_sequence(response)
  139. execute_robot_motions(motion_list)
  140. while True:
  141. response = input("What motion can I do for you?\n")
  142. stretch_chatter(response)
  143. ```
  144. ### The code explained
  145. Now let's break the code down
  146. ```python
  147. from openai import OpenAI
  148. from stretch_body import robot
  149. import time
  150. client = OpenAI(api_key=("OPEN_AI_KEY")) # <---------- USE YOUR API KEY HERE
  151. ```
  152. You need to import openai if you are going to use the API. Import robot from Stretch body for the movement and don't forget to use your secret key, if you don't use it, it will not work.
  153. ```python
  154. def move_forward(robot):
  155. robot.base.translate_by(0.2)
  156. robot.push_command()
  157. robot.base.wait_until_at_setpoint()
  158. time.sleep(0.1)
  159. def move_backward(robot):
  160. robot.base.translate_by(-0.2)
  161. robot.push_command()
  162. robot.base.wait_until_at_setpoint()
  163. time.sleep(0.1)
  164. def turn_right(robot):
  165. robot.base.rotate_by(-1.57)
  166. robot.push_command()
  167. robot.base.wait_until_at_setpoint()
  168. time.sleep(0.1)
  169. def turn_left(robot):
  170. robot.base.rotate_by(1.57)
  171. robot.push_command()
  172. robot.base.wait_until_at_setpoint()
  173. time.sleep(0.1)
  174. ```
  175. We will need to make methods for every movement, for the base we will need Stretch to move Forward, Backward, turn right 90 degrees or turn left 90 degrees. Keep in mind that rotations are measured in radians, if you wish to make adjustments, ensure to perform the necessary conversion.
  176. ```python
  177. def arm_front(robot):
  178. robot.arm.move_by(0.1)
  179. robot.push_command()
  180. robot.arm.wait_until_at_setpoint()
  181. time.sleep(0.1)
  182. def arm_back(robot):
  183. robot.arm.move_by(-0.1)
  184. robot.push_command()
  185. robot.arm.wait_until_at_setpoint()
  186. time.sleep(0.1)
  187. def lift_up(robot):
  188. robot.lift.move_by(0.1)
  189. robot.push_command()
  190. robot.lift.wait_until_at_setpoint()
  191. time.sleep(0.1)
  192. def lift_down(robot):
  193. robot.lift.move_by(-0.1)
  194. robot.push_command()
  195. robot.lift.wait_until_at_setpoint()
  196. time.sleep(0.1)
  197. ```
  198. Now for the arm and the lift, this is different from the base, with the base we needed translations and rotations but these 2 are part from the prismatic joints so we just need the command move_by.
  199. ```python
  200. stretch_actions = {"move_forward" : "Move the robot forward 0.2m",
  201. "move_backward": "Move the robot backward 0.2m",
  202. "turn_right": "Turn the robot 90 degrees to the clockwise",
  203. "turn_left": "Turn the robot 90 degrees to the counter clockwise",
  204. "arm_front": "Move the arm to the front 0.1m",
  205. "arm_back": "Move the arm to the back 0.1m",
  206. "lift_up": "Move the lift up 0.1m",
  207. "lift_down": "Move the lift down 0.1m"}
  208. stretch_actions_fn = {"move_forward" : move_forward,
  209. "move_backward": move_backward,
  210. "turn_right": turn_right,
  211. "turn_left": turn_left,
  212. "arm_front": arm_front,
  213. "arm_back": arm_back,
  214. "lift_up": lift_up,
  215. "lift_down": lift_down}
  216. ```
  217. We will need the Large Language Model (LLM) to know what are the actions for each movement with the description, we want an explanation from the LLM about the movement made, that's why we want this description.
  218. ```python
  219. def chatter(input_text):
  220. # Populate Assistance prompt
  221. assistance_msg = "Here is the description for each robot motion"
  222. for k in stretch_actions.keys():
  223. assistance_msg = assistance_msg + f"\n - {k} : {stretch_actions[k]} "
  224. # Define System prompt (Personality of the system)
  225. system_prompt = f"Assume you are a mobile robot and you are able to receive a natural language instrunctions regarding the robot's movements. Based on undestanding the instructions return a sequence of discrete actions from this list {list(stretch_actions.keys())}. The only output must be in two parts. The first part should explain the sequence and second part should only be the list the actions seperated by comma. The first part you will explain the list of actions to follow and the reason behind it. The second one must be the list of movements that we are going to use, separate them only with the '[]'. This is not an explanation, it must be only the list of movements"
  226. clockwise \n - arm_front: Move the arm to the front by 0.1m \n - arm_back: Move the arm to the back by 0.1m \n - lift_up: Move the lift up by 0.1m \n - lift_down: Move the lift down by 0.1m"
  227. response = client.chat.completions.create(
  228. model="gpt-3.5-turbo",
  229. messages=[
  230. {"role": "system", "content": system_prompt},
  231. {"role": "assistant", "content": assistance_msg},
  232. {"role": "user", "content": input_text},
  233. ]
  234. )
  235. response_text = response.choices[0].message.content.strip().lower()
  236. print(f"CHATGPT RESPONSE: {response_text}")
  237. return response_text
  238. ```
  239. Let's commence with the chatter method. Here we initialize the LLM by specifying the system message and the assistant message. Precision in these initializations is crucial for the correct execution of our code. For the user role, we provide input via the terminal, to print the response, we utilize `response.choices[0].message.content` as outlined in the documentation. To ensure uniformity and ease of handling, we employ the `strip()` and `lower()` methods. The `strip()` function removes any trailing whitespaces, including spaces or tabs, simultaneously, `lower()` converts the response to lowercase, for instance, if the LLM outputs "MOVE_FORWARD," we transform it into "move_forward." This way we enhance consistency in handling the model's outputs.
  240. ```python
  241. def extract_action_sequence(response_text):
  242. # In the response text find the list that starts and ends with []
  243. start_index = response_text.find("[")
  244. end_index = response_text.find("]", start_index)
  245. if start_index != -1 and end_index != -1:
  246. movements_str = response_text[start_index + 1:end_index]
  247. # Split the comma-separated movements into a list. The .strip("'\x22") is used to remove both single and double quotes from the beginning and end of each movement.
  248. formated_list = [movement.strip().strip("'\x22") for movement in movements_str.split(",")]
  249. print(formated_list)
  250. return formated_list
  251. else:
  252. print("List of movements not found in the response.")
  253. return []
  254. ```
  255. Going into the method of extraction, our goal is to identify the action list printed by the API. To achieve this, we search for square brackets. If the content is enclosed within these brackets at both the start and end, we recognize this as the desired list. Subsequently, we proceed to split and create a newly formatted list. If the model fails to locate the list of movements, it returns an empty list.
  256. ```python
  257. def execute_robot_motions(final_motion_list):
  258. rb = robot.Robot()
  259. rb.startup()
  260. print("Starting to execute motions....")
  261. for motion_key in final_motion_list:
  262. print(f"Executing motion: {motion_key}")
  263. stretch_actions_fn[motion_key](rb)
  264. print("Completed Executing motions...")
  265. rb.stop()
  266. ```
  267. To execute the robot movements we need to initialize it first, then it will execute these movements based on the provided list and then stops the robot, we need the startup and the stop for the `stretch_body` to work.
  268. ```python
  269. def stretch_chatter(input_text):
  270. response = chatter(input_text)
  271. motion_list = extract_action_sequence(response)
  272. execute_robot_motions(motion_list)
  273. while True:
  274. response = input("What motion can I do for you?\n")
  275. stretch_chatter(response)
  276. ```
  277. The `stretch_chatter` will take the user input, this input is a natural language instruction regarding the robot movements and finaly we have the while loop, this ensures that the program keeps running, allowing us to input multiple requests without restarting the program.
  278. Now that you know how this works let's try it! Run your code in the terminal and try to input the next instructions:
  279. ```
  280. move forward 0.2m, turn right and move the lift up 2 times, turn left and move the arm front 1 time.
  281. ```
  282. <p align="center">
  283. <img src="https://github.com/hello-robot/stretch_tutorials/blob/master/stretch_body/images/stretch_openai.gif"/>
  284. </p>
  285. ## Potential Issues
  286. - Ocassionally, the model can skip one movement (normally the final ones), like those involving the arm or lift. For instance, if you instruct it to move up/down or front/back twice and then return to the starting position, there's a possibility it might execute only one backward movement instead of two. Unfortunately, there isn't a real solution to this, however, you can mitigate this type of error by providing more specific instructions.
  287. - Sometimes the list of actions can appear different, what the code does is to look inside the response text and find the list that starts and ends with the square brackets ‘[]’ and then proceeds to make the movements, but it can occur a case like this:
  288. ![image](https://github.com/hello-robot/stretch_tutorials/assets/141784078/3ff79191-fbbd-496c-9b21-8e25dcb78a76)
  289. In this particular scenario, observe that the list of movements appears as [‘arm_front”, if our code searches for the list of movements inside brackets it will not find anything, While this is a rare occurrence, it happened once. In such cases, the only thing to do is to stop the code and run it again.
  290. - As mentioned in the tutorial, issues may arise regarding the movement of Stretch when you want to make a geometric figure for example, it can take a wrong turn and keep moving, keep in mind this when you try moving Stretch around, try having it in an open area just in case.