The first step in training TensorFlow to recognize an object is to create videos. This is a step that can easily be done wrong. There are almost too many points to list them all. Please review all the material on this page and the pages it links to and make a plan of what videos you need to take. I’ll try to summarize important points.
How to Record Video
Important points from the create videos page are:
- use the same camera resolution to take the videos as you will use on the robot, as indicated in TensorFlow Quirks you probably want to use 640×480 resolution.
- ideally just unplug the robot’s webcam from the control hub and plug the webcam (still on the robot) into a laptop
- Your laptop likely includes a camera app, the Windows app is called Camera.
- you want to use the same webcam to take the videos as the robot will use to avoid difference is camera lenses.
- they suggest you reduce the frame rate, I didn’t bother with that, but I did adjust the length of my video clips and typically cut them to one second, or maybe two seconds depending how how many videos I needed to take. The more videos the shorter the clips.
- There’s no need to move the camera or object during the video, just take video with nothing moving. That will make labeling the objects in the video much easier. I tried moving the team props between the spike marks while the video was running, but as soon as you touch the prop the “tracking” feature that helps label all the frames loses tracking and you have to redraw the bounding boxes, which is time consuming.
TensorFlow Quirks
Resolution – don’t use hi-res cameras
On the FTC-docs web page suggestions on selecting objects for the Team Prop they mention a TensorFlow “quirk” that the webcam image is downscaled to match the TensorFlow model size of 300×300 pixels. This is very important! I’ve trained a number of models for the Logitech C920 as 1920×1080 resolution and they did not work well in practice, though the training and initial testing looked ok. If you use a high resolution you should use the TensorFlow digital zoom to crop the image tight to the object, or move the robot close to the object.
My recommendation: set a resolution of 640×480 for TensorFlow usage and for the videos that you record. If you don’t set a resolution for Vision Portal, the default is to use 640×480.
Video Length – take lots of short videos
The FTC-ML TensorFlow tool will not accept videos with a file size greater than 100mb, or a single video file with more than 1000 video frames. I suggest you take lots of short videos, 1 to 2 seconds each. Then vary the position of a team prop, the background, or the lighting.
There is a “sweet spot” for training FTC TensorFlow models which uses about 1000 frames of video in total. At 24 frames per second (fps) that’s about 42 seconds of video total. So if you plan to take 20 videos, you can make them 2 seconds long. It’s worth testing your video capture method. My Windows laptop at 640×480 captured video at 24fps. You don’t need to use a higher fps, but if you do happen to capture at 30fps, then just adjust the length of your videos. eg. you only need 1000/30 = 33 seconds of video. I’ve tried training with as few as 500 frames and it didn’t seem to work as well.
Light Level – Use a Light Meter (or find an app for that)
Having variations in Lighting is mentioned as good for TensorFlow videos, but this probably needs more description. Ideally you should include some videos with light levels that match what you would see at an event in your region. Many FTC events are held in school gyms where the lighting level is fairly bright and even, with the light illumination level about 300-500 lux. See the Wikipedia page on Lux, which indicates that 320-500 lux is typical office lighting.
However, without a light meter it’s hard to know what the light level actually is. The human eye is VERY good at adjusting for different light levels, webcams and TensorFlow are not so good. What you can do is use a light meter phone app like the Android Light Meter – Lux Meter. In my garage the light level is mixed at best with a 4 foot fluorescent fixture over the field. I have to add supplemental lighting to get to the 300 lux range. Likely you should add extra lighting to some or all of your videos.
Note: you should have some variation in lighting and NOT take all the videos in good lighting because all events are different. At FIRST Championships the lighting is done via spotlights (more like theatre or stage lighting) which means the light on the field can be uneven and there can be shadows on the field which is not something you would see at an event held in a gym. Your event might be held in a theatre with stage lighting. Scrimmages and league meets can be held in odd locations like school cafeterias or the lobby of a large building.
There is probably no point in taking videos in BAD lighting. FTC events usually have pretty good lighting. However, if the room you are testing your autonomous programs in has BAD lighting then you probably have to take videos in that room with that lighting. Just be sure to include a second set of videos with good lighting (even if you have to do those videos elsewhere) so that when you go to an event TensorFlow should still detect your team props.
Videos for TensorFlow
FTC has created an Optimizing Videos for increased TensorFlow Model Performance page that you should review carefully. Some highlights from that page:
- #3 and #4 – The TensorFlow model is designed around 300×300 pixels and modern webcams have much higher resolutions. Downscaling a high resolution to 300×300 loses a lot of detail. The implication is that you should set a low resolution.
- #5 the more variations in object size, rotation, angle, and orientation the better the recognition will be
- #8 backgrounds are important, it would be good to use different backgrounds in some of the videos
- #9 lighting effects can cause issues with Object Detection. They warn about training with only well lit videos, but I find it’s usually the reverse. It’s hard to create a well lit video to train with. So if you train with poorly lit videos and then go to an event in a gym with good lighting object detection can suffer. See the Light Level section above.
- #10 multiple objects. You might want to add some objects to the field of view that are not labelled, that can help train the model to ignore false positives. e.g. that a red spike mark on the field is a red team prop. Or that something like a blue colored robot on the other side of the field is recognized as your blue prop.
TensorFlow False Positives
One recommendation is that you include extra elements that are NOT your team props so the model doesn’t label something incorrectly. I think this is more of a problem if the webcam points down and only has grey tiles for a background. Then anything NOT a grey tile might be considered a detectable object. So if you do close up videos, be sure to include some shots of just spike marks on grey tiles.
In a model trained on red/blue Duplo, any thing blue on a spike mark is consider the Team Prop.
The blueDuplo on the right spike mark is detected, and nothing in the upper background area is considered a team prop, so the TensorFlow model isn’t bad.
False positives like the above won’t happen in practice because only your own Team Props will be on the spike marks. But this does show that if possible you want things in the background that might be red or blue and that don’t get labelled so that TensorFlow learns which objects it needs to detect. e.g there’s a red bucket in the upper right corner in the image above off the field that didn’t get labeled in the videos and was not detected as a red duplo prop. In my case for the videos of the Duplo props I could have put blue and red solo cups in the background, perhaps on the spike marks on the other side of the field.
Make A Video Plan
Prior to taking the videos should should be clear on exactly how you plan to use TensorFlow. For example:
- if you are going to use TensorFlow with the robot at the starting positions then you should take videos from starting positions.
- If you created autonomous programs that drive up to the spike marks and look for a pixel you might want to take videos of the team props up close so that you make only minimal changes to your autonomous programs.
If you’re not sure how you plan to use TensorFlow it might be a good idea to take both sets of videos and combine them in training. The more images of different sizes and orientations the better TensorFlow will be in terms of general detection. This would be especially important if you wanted to drive around the field and then try and recognize an object.
If possible, take video on an actual field or bare concrete or grey carpet. In fact, taking videos on different background and floors helps TensorFlow to focus on the objects and not assume they have to be on that particular background. If you have a field, you should probably take videos from both the red and blue side of the field. You might even want to include videos from all four starting positions.
Try and include variations in lighting. Ideally have one set taken with lighting similar to that where your event will be held. Likely you will need extra supplemental lighting to get bright and even illumination.
Also try and include some red/blue objects that are NOT your team props in the background as well as shots of the red/blue spike marks so that TensorFlow will learn not to label anything red or blue as a team prop.
Finally, if your team prop has different profiles like the chess rook example or my Duplo props, then you should include videos that show different profiles/orientations of the object.
The various combinations can add up to a lot of videos. Here’s are a couple of video plan examples.
Red/Blue Duplo Up Close Video Plan
If you already have working autonomous programs that use TensorFlow on the pixel like TFOD-pixel, you might want to take videos of your Team Props up close. In the TFOD-Pixel program the robot drives up to each spike mark so that it is close and looking down to optimize the TensorFlow recognition of the pixel. So we want videos of our team props on the spike marks from the spots where the robot stops to look for the pixel.
What I did is take the TFOD-pixel1 program that was the intermediate TFOD-pixel program that just did the robot moves to the spike marks. Then add a large Sleep block where it would normally have stopped to look at each spike mark. Then run the program and stop it when it paused to look at the spike mark. Then unplug the webcam from the robot and plug it into the laptop to record the video.
As you can see the blue prop is about the same size in the three images, but we are getting variations in orientation and position. Combined with rotating the team prop on each spike mark we’re getting lots of good images of the team prop. Looking down at the Duplo also shows lots of good surface texture. What’s also good about these images is that the Team Props are large in the image so we don’t have to worry about downscaling to 300×300.
One thing I learned about close up videos is that I had to add a video of a blank spike mark, so TensorFlow won’t think that they could be a team prop. This probably relates to the suggestion in FTC-docs that you include other objects in the background that are NOT labeled. TensorFlow needs to learn the spike marks themselves are not an object to detect.
At each spike mark we also rotated the team prop, then repeated for the other prop, so there are five videos at each red spike mark. This resulted in six videos of the blue Duplo and six videos of the red Duplo, and three videos of just the spike marks.
You can include videos that you don’t label anything. TensorFlow will learn to treat the things in those videos as background, so in this case TensorFlow won’t get confused and think a red spike mark is the red Duplo team prop.
Then we repeated the above videos on the blue spike marks. Ideally, when you take the second set of videos you should have a variation in lighting or background. Changing spike marks is the minimum you should do. If taking videos off a field, you should change what floor you use if possible. If you can’t change the floor, at least change the lighting, add one or more supplemental lights.
5 videos per red spike mark, times 3 spike marks, then repeat for blue for 5 x 3 x 2 = 30 videos. That’s a LOT of videos. In that case you can take really short videos, one second is enough. At 30fps times 30 videos is 900 frames. This would be easier with a team prop like the Crown or Tower, we wouldn’t need to rotate them at each spike mark, therefore we’d get 3x3x2 = 18 videos. For 18 videos, at 24fps each video should be 1000/18×32 or about 2.3 seconds long.
Red/Blue Duplo Starting Position Video Plan
Another plan is that we use TensorFlow from the starting position. That way we know at the start of autonomous which spike mark has the team prop and can drive directly there.
This is easily done with a Logitech C920 as it has a wide angle of view that easily includes all three spike marks. It is also possible with a Logitech C270 if you are careful. With the Pushbot robot and a C270 I was just able to see the black line on the outside spike marks if I was very careful to place the robot in the center of the tile (the webcam was mounted in the center of the robot). At an event you should enable the camera stream and verify you can see the center black lines on the spike marks during robot set up.
The Logitech C270 is JUST able to see the center of the outside spike marks as you can see above, but a team prop on the outside marks will be cropped. TensorFlow will detect team props even if they are partially visible (assuming you trained the model that way). So if we label the props TensorFlow at the edges, TensorFlow will still detect them.
As you can see the Duplo props are not that large in the image when shot from the starting position. This is where using 640×480 resolution is important. The image will be downscaled to 300×300, which is just under half. The redDuplo in the center image is about 128 pixels tall in the original image, even cutting that in half is maybe 50 pixels tall. So that should be good enough for TensorFlow.
In this case we have want to have each team prop on each blue spike mark and have a video of each orientation of the prop. So we need six videos (two orientations and three spike marks) assuming we include both props in each video. We then take another set of videos on the red spike marks for a total of 12 videos. We don’t need any videos of just spike marks because as you can see above there is always a blank spike mark in the image.
The top part of these images include the field perimeter in the background. It’s probably would have been a good idea to add some red and blue items (like the Solo cups I have) in the background to help TensorFlow learn to ignore things on the other side of the field. At an event there would be team props or white pixels on the far spike marks, and robots in the starting positions.
This second set of videos should have a different lighting level and or floor/background.
For 12 videos at 24fps we can set the length of each video to be 12×24/1000 = 3.5 seconds.
12 videos is probably near the minimum you would want to use. It would be better to take another complete set of videos either off the field, or to use all four starting positions. This is where you could also have one set of 12 in good even lighting and 12 where the lighting is not quite so good.
What if you Don’t Have a Field
From the Centerstage Field Assembly and Setup Guide you can determine how the field is setup and where everything should be. For example, the four starting positions always have the spike marks on the tile in front of the starting position. You can then measure where the spike marks should be and mark them with tape then take your videos.
The image above shows a set up for taking videos on a floor where you’ve measured where the spike marks should be and placed your team props, then connected a laptop to your robot’s webcam and take the video as shown. Then you can vary the positions and orientations of the team props and get some videos. Then likely should should lighting levels and change floors or backgrounds and take more videos.
What if you have poor lighting
My garage initially only had two little CFL lights at one end which would be really bad lighting if that’s all I had. I had then added a four foot fluorescent light fixture above the field (though it’s not centered). I then added another light fixture I had that had four little lights I could point into the far corners of the field. But even that didn’t result in bright lighting, to get to 300 Lux I added a photoflood light I have where I could bounce light off the ceiling to create a brighter more even light. Note: I could only get to 300 Lux for one quarter of the field at a time.
You might not be able to change the lighting at the location where your team has a practice field. For convenience you might want to take a set of videos in that lighting so that TensorFlow can be used where you will be testing your autonomous programs.
But you really SHOULD add a second set of videos that includes better lighting, since it’s very likely you’ll compete at an event with good lighting, likely a school gym or theatre. Bring extra lights to your location if only for one meeting and aim to get a set of videos where the light level is at or close to 300 Lux. I would avoid spot lighting of your team prop, say for example by shining a flashlight at it. That will create shadows and be at an angle that you would not see at an event.
If needed, take those videos at another location with good lighting. If you’re at a school, maybe you can take videos on the gym floor. Not only will you have different background/floor in the video you should have good lighting.
Backup Plan
It may be a good idea to have a backup plan in case TensorFlow doesn’t work for your team props at your event. For example, keep your TFOD-pixel program and if you find TensorFlow isn’t working on your team props, TensorFlow should work for the pixel. Or keep a program that just pushes the yellow and purple pixels to the backstage area and parks, 11 points is better than zero.