Can Artificial Intelligence Identify Pictures Better than Humans?It's taken computers less than a century to learn what it took humans 540 million years to know.

ByOphir Tanz

Opinions expressed by Entrepreneur contributors are their own.

Shutterstock.com

Computer-based artificial intelligence (AI) has been around since the 1940s, butthe current innovation boomaround everything fromvirtual personal assistantsandvisual search enginestoreal-time translationanddriverless carshas led to new milestones in the field. And ever since IBM's Deep Blue beat Russian chess champion Garry Kasparov in 1997, machine versus human milestones inevitably bring up the question of whether or not AI can do things better than humans (it's the the inevitable fear around Ray Kurzweil'ssingularity).

As image recognition experiments have shown, computers can easily and accuratelyidentify hundreds of breeds of cats and dogs faster and more accurately than humans, but does that mean that machines are better than us at recognizing what's in a picture? As with most comparisons of this sort, at least for now, the answer is little bit yes and plenty of no.

Less than a decade ago, image recognition was a relatively sleepy subset of computer vision and AI, found mostly in photo organization apps, search engines and assembly line inspection. It ran on a mix of keywords attached to pictures and engineer-programmed algorithms. As far as the average user was concerned, it worked as advertised: Searching for donuts under "Images" in Google delivered page after page of doughy pastry-filled pictures. But getting those results was enabled only by laborious human intervention in the form of manually inputting said identifying keyword tags for each and every picture and feeding a definition of the properties of said donut into an algorithm. It wasn't something that could easily scale.

More recently, however, advances using an AI training technology known asdeep learningare making it possible for computers to find, analyze and categorize images without the need for additional human programming. Loosely based on human brain processes, deep learning implements large artificial neural networks -- hierarchical layers of interconnected nodes -- that rearrange themselves as new information comes in, enabling computers to literally teach themselves.

As with human brains, artificial neural networks enable computers to get smarter the more data they process. And, when you're running these deep learning techniques on supercomputers such asBaidu's Minwa, which has 72 processors and 144 graphics processors (GPUs), you can input a phenomenal amount of data. Considering that more than three billion images are shared across the internet every day --Google Photos alone saw uploads of 50 billion photosin its first four months of existence -- it's safe to say that the amount of data available for training these days is phenomenal. So, is all this computing power and data making machines better than humans at image recognition?

There's no doubt that recent advances in computer vision have been impressive . . . and rapid. As recently as 2011,humans beat computers by a wide margin当识别图像,在测试approximately 50,000 images that needed to be categorized into one of 10 categories ("dogs," "trucks" and others). Researchers at Stanford University developed software to take the test: It was correct about 80 percent of the time, whereas the human opponent, Stanford PhD candidate and researcherAndrej Karpathy, scored 94 percent.

Then, in 2012, a team at theGoogle X research lab approached the task a different way, by feeding 10 million randomly selected thumbnail images from YouTube videos into an artificial neural network with more than 1 billion connections spread over 16,000 CPUs. After this three-day training period was over, the researchers gave the machine 20,000 randomly selected images with no identifying information. The computer looked for the most recurring images and accurately identified ones that contained faces 81.7 percent of the time, human body parts 76.7 percent of the time, and cats 74.8 percent of the time.

At the 2014ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2014, Google came in first place with aconvolutional neural networkapproach that resulted in just a6.6 percent error rate, almost half the previous year's rate of 11.7 percent. The accomplishment was not simply correctly identifying images containing dogs, butcorrectly identifying around 200 different dog breeds in images, something that only the most computer-savvy canine experts might be able to accomplish in a speedy fashion. Once again, Karpathy, a dedicated human labeler who trained on 500 images and identified 1,500 images,beat the computer with a 5.1 percent error rate.

This record lasted until February 2015, when Microsoft announced it had beat the human record with a4.94 percent error rate. And then just a few months later, in December, Microsoft beat its own record with a3.5 percent classification error rateat the most recentImageNet challenge.

Deep learning algorithms are helping computers beat humans in other visual formats. Last year, a team of researchers at Queen Mary University London developed a program calledSketch-a-Net, which identifies objects in sketches. The program correctly identified 74.9 percent of the sketches it analyzed, while the humans participating in the study only correctly identified objects in sketches 73.1 percent of the time. Not that impressive, but as in the previous example with dog breeds, the computer was able to correctly identify which type of bird was drawn in the sketch 42.5 percent of the time, an accuracy rate nearly twice that of the people in the study, with 24.8 percent.

These numbers are impressive, but they don't tell the whole story. "Even the smartest machines are still blind," said computer vision expert Fei-Fei Li at a2015 TED Talk on image recognition. Yes, convolutional neural networks and deep learning have helped improve accuracy rates in computer vision – they've even enabledmachines to write surprisingly accurate captions to images-- but machines still stumble in plenty of situations, especially when more context, backstory, or proportional relationships are required. Computers struggle when, say, only part of an object is in the picture – a scenario known asocclusion– and may have trouble telling the difference between an elephant's head and trunk and a teapot. Similarly, they stumble when distinguishing between a statue of a man on a horse and a real man on a horse, or mistake a toothbrush being held by a baby for a baseball bat. And let's not forget, we're just talking about identification of basic everyday objects – cats, dogs, and so on -- in images.

Computers still aren't able to identify some seemingly simple (to humans) pictures such as thispicture of yellow and black stripes, whichcomputers seem to think is a school bus. This technology is, unsurprisingly, still in its infant stage. After all, it took the human brain 540 million years to evolve into its highly capable current form.

What computers are better at is sorting through vast amounts of data and processing it quickly, which comes in handy when, say, a radiologist needs to narrow down a list of x-rays with potential medical maladies or a marketer wants to find all the images relevant to his brand on social media. The things a computer is identifying may still be basic -- a cavity, a logo -- but it's identifying it from a much larger pool of pictures and it's doing it quickly without getting bored as a human might.

Humans still get nuance better, and can probably tell you more a given picture due to basic common sense. For everyday tasks, humans still have significantly better visual capabilities than computers.

That said, the promise of image recognition and computer vision at large is massive, especially when seen as part of the larger AI pie. Computers may not have common sense, but they do have direct access to real-time big data, sensors, GPS, cameras and the internet to name just a few technologies. From robot disaster relief and large-object avoidance in cars tohigh-tech criminal investigationsandaugmented reality (AR) gamingleaps and bounds beyondPokemon GO,计算机视觉的未来很可能在于事情that humans simply can't (or won't) do. One thing we can be certain of is this: It won't take 540 million years to get there.

Ophir Tanz

CEO and Founder of GumGum

Ophir Tanz is an entrepreneur, technologist and the CEO and founder of GumGum, a digital-marketing platform for the visual web. Tanz is an active member of the Los Angeles startup and advertising community, serving as a mentor and advisor to other startups around Silicon Beach. He holds a B.S. and a M.S. from Carnegie Mellon University.

Editor's Pick

Related Topics

Business News

'Typically Gone Within a Few Hours': This $1,900 Costco Product Is Flying Off Shelves

Costco only offers the exclusive product online to members.

Business News

Katy Perry Is Fighting the Founder of 1-800-Flowers for a $15 Million California Mansion He Doesn't Want to Sell Her

The eight-bedroom, 11-bathroom estate sits on nearly nine acres in the Santa Ynez foothills in Montecito.

Business News

'No Question, We Probably Went Too Far': Delta Airlines CEO Backtracks on Sweeping Changes to SkyMiles Accounts, Sky Club Access

The unpopular changes set to roll out in 2025 were announced earlier this month.

Employee Experience & Recruiting

3 Truths About Unlimited PTO — Why Employees Are Worse Off With Endless Vacation Days

三个事实,可能会让你重新考虑美联社peal of unlimited PTO.

Growing a Business

Want to Sound Smarter? This Stanford Professor's Simple 3-Point Technique Will Help

With a little structure, you can impress audiences with your ad-libbing all day.

Business News

The Justice Department Is Suing eBay, Alleging Unlawful Sales of Over 371,000 Products

The lawsuit alleges that eBay violated several environmental laws.