An ML Data Annotation Tutorial Series By Logan Spears, Innovation Chief at Sixgill, LLC
Introduction
Image classification, the “Hello World” of computer vision, assigns classifications (or categories) to images. For example when we create a machine learning (ML) model that inputs a picture of a shark and outputs its species, this is image classification. HyperLabel enables Image Classification workflows with its “Select” label. Because HyperLabel is a downloadable data labeling toolset built by Sixgill, LLC, we’ll use our namesake, the sixgill shark as a subject of this tutorial. So, let’s dive in and train a shark classifier to see it in action!
Data Gathering
Step one of training our shark classifier is to gather data from which we will subsequently label and train, but where do we find pictures of sharks? Microsoft exposes Bing’s Image Search API to developers for limited free use and this should cover our needs.
A frequent question from developers is: “Can I use copyrighted material to train machine learning models?” The answer, from the US Supreme Court, is yes! So it’s not necessary to specify license criteria in your API calls!
After signing up for Bing’s Image Search API, we can use their API to download images for specific search terms. At this point, we’re most of the way there with the Node SDK:
Find the gist here. Remember to set the imageType parameter to Photo so we don’t get clip art or drawings. The count parameter also needs to be set to 150 so we get the maximum number of images per request. After querying the API and getting the results, we need to save each image’s thumbnailUrl to a directory. Once the directory contains images from the search terms “great white shark”, “sixgill shark”, and “hammerhead shark” we can use HyperLabel to label the data.
Data Labeling
After gathering the raw dataset from the Bing Image Search API, we can use HyperLabel to label it for ML. Although the image search returned images that are mostly their respective types, it also contains false positives, irrelevant examples, and other problematic images that we can use HyperLabel to filter out. If you don’t already have HyperLabel, download it from the Mac App Store or get it from Microsoft.
After installing HyperLabel and signing up for an account, let’s create a project by hitting Create Project on the project screen. Enter “Shark Classifier” for the Name and hit Next.
Now, add the directory of images created in the Data Gathering section as a HyperLabel data source. Hit Local Storage, set the Name to “Images”, use Finder to select the Folder Path created in Data Gathering, hit Save, and then hit Label Schema on the sources page.
Now, create a schema for labeling. Delete the examples labels by hitting the trash can icons on each label. In the Add New Label section, put “Shark” for the Name field, select Select for the Type, and add the Options: “Sixgill”, “Great White”, and “Hammerhead”. Then hit Save and Start Labeling.
Now we are ready for labeling! For each image, select the correct classification from the labels drawer on the left and then hit Submit (or Cmd+S) to save it and move on to the next label.
Since we obtained these images via image search results, they don’t always return desired images. When we get incorrect or unusable images, we can just select Skip on the bottom right of the labeler view. Skipped images will not be included in the final dataset.
After labeling all the images in our dataset, we can view the Dashboard to view our statistics. We skipped 36 images and each image only took us 1s to label!
Once we are happy with our labeling, we can go to Review > Export > Create ML > Export now to export our data for training. Just save the output folder to your desktop.
Training
To train our model, we are going to use Apple’s Create ML tool. It can accept the image classification output format from HyperLabel and train an image classification model automatically. To use Create ML, you must have Xcode installed. With Xcode open, Control-click Xcode’s Dock icon and choose Open Developer Tool > Create ML
After starting Create ML, choose File > New and select the Image Classifier template and hit Next. Name the project “SharkClassifier” on the next screen and hit Next.
If you named your HyperLabel project “Shark Classifier” and saved it to your desktop, drag the “~/Desktop/Shark Classifier-classification/Shark” folder into the Create ML app and it will automatically load the data. Then press Run to train the model.
Our labeled dataset contained around 400 images and training only took a few minutes. You can see the results in the Validation section. Validation data is unseen by the training process and just used to measure unseen performance.
A 94% accuracy is great for a first pass! If we use another image of a Great White outside of our dataset, the model classifies it easily.
Conclusion
HyperLabel enables developers to create ML datasets with little or no code very quickly (this project took about an hour from idea-to-trained model). HyperLabel supports many other use cases and labeling procedures. Give it a shot for free! Download it from the App Store or get it from Microsoft.
About this blog series: The HyperLabel Fundamentals posts are intended to provide a practical guide to using HyperLabel for real-world machine learning problems. In my next post, I’ll show you how to use HyperLabel and object detection to compose poker hands from an image by detecting the individual cards’ rank and suit!
About Sixgill: Sixgill, LLC provides powerful end-to-end unified data automation, vision AI, device management, MLOps, data authenticity, and cloud-to-edge deployment for the Internet of Everything.