Training TensorFlow for CENTERSTAGE
Hopefully, you have already tried using TensorFlow to detect the pixel on the field using one of the TensorFlow sample programs. The next stage in using TensorFlow is to train a model to detect a Team Prop. This can earn you up to 20 bonus points so it’s worth doing. However it is a rather involved process that can take hours to complete.
FIRST has already described using TensorFlow for CENTERSTAGE. You should review that page.
Prerequisites
Visit the following pages to create your team props and then take videos of them.
- Team Prop construction rules and advice on creating objects that will be easier for TensorFlow to detect
- Take videos of your team props for TensorFlow. Try to follow as many recommendations here as possible, it’s easy to create a model that doesn’t work if you don’t have good videos.
FIRST Machine Learning Toolchain
The FIRST Tech Challenge Machine Learning toolchain (FTC-ML), allows FIRST Tech Challenge teams to create custom TensorFlow models for use in the game challenge.
You should READ the entire set of documentation on using the FIRST Machine Learning Toolchain before starting. It’s easy to make mistakes or do something in an early step that creates a bad model. That starts in the beginning with the choice of team props and how you take video of those props.
Logging On To FTC-ML
FIRST has provided a website where you can train new TensorFlow models. The first step is to login to the ftc-ml website. An adult associated with the team (coach or mentor), will have to add the student(s) so they can also log onto the website. This is something you might want to do prior to actually wanting to use the toolchain in case the adult or student has login issues.
FTC-ML Overview
The basic steps in the process are to take videos of your team props, upload them to the ftc-ml website, label your team props in the videos, create a datasest of the video images, then actually run the TensorFlow training on the dataset to create a model.
- Teams create short videos of the objects that they would like the model to be trained to recognize. As mentioned ensure your videos are taken at 640×480 resolution.
- Videos are uploaded to the ftc-ml tool, and individual objects to be recognized in each video frame are labeled by the users.
- Datasets are created composed of one or more labeled videos. Unlabeled videos, if used in a dataset, must be combined with labeled videos.
- One or more datasets can be combined to create a model. The model is trained using Google TensorFlow cloud training services using the selected datasets as training resources.
- Download the model from the ftc-ml tool, and installed either onto the Robot Controller (for OnBotJava or Blocks) or within the Android Studio assets for use on the robot.
- Robot code is modified to use the new model file and the labels created during the model creation process.
Upload Videos to FTC-ML
Each video must be individually uploaded to FTC-ML. As each video is uploaded a background process will be initiated on the FTC-ML to extract all the video frames. Please wait for each video to show that all frames are extracted before the next video. This can take a minute or two depending on how many frames are in the video.
Adding labels to frames in a video
The next step is to label all the frames of video. Check out that link, as you can see you draw a bounding box around each object and give it a label. Assuming you have shot static videos, then you can easily use the tracking feature to have the bounding box and label copied to all the frames.
Things to note:
- According to this forum post by Danny Diaz, color is not always important to TensorFlow training (unless you have a colored pattern on your object). See point #3. Also Danny has another forum post that says similar things in point #5. Therefore when you label the objects in your videos you can use the same name. e.g. TeamProp. For example, I trained some models to recognize redDuplo and blueDuplo as separate objects, but I probably should have labeled both as just “Duplo” or TeamProp. This assumes both team props are identical except for color. If you have two different found objects, like a blue duck and a red cowbell, then you will need to label each one separately.
- Ensure you ALWAYS use the same label or labels in all your videos.
- If a team prop is in a frame you should label it or exclude the frame, you don’t want video frames with a team prop that are not labeled as that will confuse TensorFlow.
- You may have objects in the videos that you DO NOT label. That’s ok, you probably should not label the spike marks. You may have even added other objects into the background of your videos. TensorFlow will learn to consider all unlabeled things as background.
- It’s actually ok to have some videos with no labels. eg. if you have taken close up videos of team props on spike marks, you should also take close of videos of the spike marks by themselves and not label them. That way TensorFlow will consider bare spike marks to be background.
Producing Datasets
After all videos are labeled (that need to be labeled), select all your videos and Create a Dataset. The FTC-ML tool will combine all the video frames and labels into a dataset. When the Dataset is ready, check that only the expected labels are there. If you see one that is unexpected, say because one was misspelled, you’ll have to find the video with the labelling error and correct all the frames that have the error.
Training Models
Once you have a Dataset, you can train a model. If you created a recommended set of videos you can probably leave the default number of training steps. Generally, you should have about 100 epochs of training but don’t get fussy about adjusting these numbers, 200 epochs would be fine, but 10 is too few. There’s a sweet spot of about 1000 frames of video and 3000 steps that should work well.
Evaluating Metrics
You can review the model’s metrics and it can be interesting to look at that, sometimes you’ll see potential problems. However, even “well trained” models can perform badly in practice if there are problems in the videos or choice of team prop.
Download the Model
The next step is to download the model. The FTC-ML tool will typically give the model a filename like model_20240104_075711.tflite. You could give the file a more meaningful name like team123_props.tflite, or even MyCustomModel.tflite.
Using the TensorFlow Model
FTC-docs has some pages on using custom TensorFlow models. Go to the Java page if you are using OnBot Java or Android Studio. Got to the Blocks page if you are using Blocks. You will upload the model file to the Robot Controller if you are using OnBot Java or Blocks, or copy the model file into an assets folder in your Android Studio project.
In both cases, you end up changing the TensorFlow initialization process in order to reference the model file and to tell the SDK what labels are in the model. The default resolution for Vision Portal is 640×480 and that should help when TensorFlow downscales the webcam image to fit 300×300.
Ideally, you should create an opMode that does simple telemetry display of the objects in your model. Run that opMode and verify it can detect your team props both in the camera stream and in the Telemetry. The image below shows that the model is able to detect the team props from the robot’s starting position.
Notes
- This model was trained with labels redDuplo and blueDuplo but should have been trained using a label like “DuploProp” applied to both team props.
- If you have more than one label in the model and the opMode is showing the wrong label for a detected object it is likely the problem is that the Labels provided to the TensorFlow initialization are in the wrong order. Change the order of the labels in the initialization until all objects you are able to detect are correctly labeled.
Next Steps
If you trained a model to detect the team prop up close, you can make simple changes your TFOD-pixel program to have it detect the team props instead.
If you trained the model to detect the team prop from the starting position you can create an autonomous program that checks the bounding box of the TensorFlow detection and depending on where the detection is in the image the program can determine which spike mark has the team prop and drive directly there.