Last year I enjoyed a series of episodes of the Howard Stern radio show on Sirius XM when he featured a smart phone app called the Ugly Meter. The application allows users to snap an image of their face, and then scans it for symmetry, contours, and other elements to determine how good- (or bad-) looking a person is. Ratings are handed out on a scale of 1-10, and include some not-so-nice quips about a person’s looks. The reactions were hilarious, and the app quickly became ranked second in the U.S. listing of top paid iPhone apps and has reportedly generated more than $500,000 for its developers.
The main reason I took note of this silliness was because the app is really a machine learning classifier that learned about “ugliness” through use of a large training set of images with associated labels: integer values from 1-10.
But the question arises, who made these original classifications (judgements about a person’s appearance) in the first place, i.e. where did the training set come from?
Enter Mechanical Turk by Amazon (AMT). AMT is a clearinghouse for performing Human Intelligence Tasks, i.e. things best done by humans equipped with the most powerful computer of all – the brain. AMT is a facility used by machine learning developers to get results from Mechanical Turk workers, and also allows average human computers (you and me) to earn a small stipend for each classification completed. It is the perfect collaboration to tap into the power of the human brain in making classifications. Even astronomers use crowdsourced methods to classify the types of distant galaxies with Galaxy Zoo. A few years ago I spent a lot of my own cycles on this project.
AMT is designed to generate data sets for machine learning applications. Many machine learning algorithms require a large amount of training data. For example, to build natural language systems researchers traditionally pay linguistic experts for millions of annotations while search engine companies employ hundreds or thousands of annotators for their classification, ranking, and other statistically trained systems.
AMT uses its online crowdsourcing service to give humans simple tasks. For example, there might be a set of images that need to be labeled as “happy” or “sad.” These labels could then be used as the basis of a training set for a supervised learning algorithm. The algorithm could then be trained on these human-labeled images to automatically label new images. This is likely how the Ugly Meter operates.
Often classifications, like ugliness, are subjective (beauty is in the eye of the beholder), but happy vs. sadness is less so. Nevertheless, AMT makes it easy to ask many people for judgments, and accuracy can sometimes be gauged. Researchers have reviewed accuracy rates for how well the averaged Turker judgments correlate to the expert gold standard. With more judgments per example, accuracy increases or convergence to a consensus is seen. Enough non-experts can match or often beat experts’ reliability.