Have you ever felt as if life is streaming by you? Have you wished that the never-ending scroll of tweets, Facebook posts, 24-hour news cycles, videos and GIFs could be streamlined into a highlight reel—maybe even read to you?
Your wish is on the brink of coming true.
Scientists are working on “text analysis,” (also known as “deep learning” and “machine reading”) to find a way to teach computers to scan articles—much faster than a human could read them—summarize them, and, most importantly, make new conclusions and suggestions based on them.
You’ve probably seen an early stage of text analysis in the form of tag/word clouds or sentiment analysis. Word clouds read an article, count the instances of each word, and display the words that are used more often in a larger font. Sentiment analysis goes a little further—determining whether a given post, sentence, or article is expressing positive or negative feelings about the subject.
In an era where snark and sarcasm are prized, this isn’t always easy. For instance, have you heard this joke: A teacher tells the class, “In English, a double negative forms a positive. However, in some languages, such as Russian, a double negative remains a negative. But there isn’t a single language, not one, in which a double positive can express a negative.” To which a voice from the back of the room pipes up, “Yeah, right. ”
News organizations and political analysts use text analysis to analyze items such as a collection of speeches about 9/11 and the 800,000+ comments the Federal Communications Commission received about net neutrality, the growth of government regulation, or the impact of tobacco company lobbying efforts on lawmakers. Other companies look at places like Twitter and Facebook to see what people are saying about firms, such as which airlines have the best and worst reputation online.
However, using advanced artificial intelligence systems such as IBM’s Watson, text analysis is now going even further, analyzing everything from the New York Times to movie reviews to scientific papers, just to give the systems more practice with different kinds of articles and material. In an effort to determine what people are actually saying, to identify the context, and to clear up ambiguities, organizations ranging from Google to the Defense Advanced Research Projects Agency (DARPA) are also conducting research. “Depending on context, much of the information that could support [Department of Defense] missions may be implicit rather than explicitly expressed,” DARPA writes on its website. “Having the capability to automatically extract operationally relevant information that is only referenced indirectly would greatly assist analysts in efficiently processing data.”
Context makes a big difference, notes the Allen Institute for Artificial Intelligence, funded by Microsoft co-founder Paul Allen. For example, if someone is talking about “apples,” do they mean the fruit or the computer? asks CEO Oren Etzioni. And while an AI system could pick up a large amount of information about apples from reading articles and other sources—a technique known as open information extraction—it could end up with a great deal of random, unfocused information. “Open Information Extraction suffers from Attention Deficit Disorder!” he concludes.
For its part, IBM is using a version of Watson to develop a system it calls the Debater, which quickly scans knowledge banks to come up with arguments for or against a particular statement. IBM calls it the machine that can argue. But it’s more than just a parlor trick. “Medical researchers could use its services to get a big head start when they’re trying to find cures and treatments for diseases,” writes Rick Newman for Yahoo! Finance. “Its ability to summarize terabytes of data in ways that are logical to humans could allow it to summarize all the research that’s been done on a given disease, while also identifying new and unexplored avenues of inquiry that could yield promising discoveries. The same goes for legal research, military intelligence and many types of science and technology.”
For example, Baylor College of Medicine recently worked with Watson on medical research papers involving a certain tumor suppressing protein called p53. There are more than 70,000 scientific papers written about this protein—more than any scientist could read in a lifetime. But the Knowledge Integration Toolkit (KnIT) read them all through the year 2003, looking for conclusions about kinases, or a type of enzyme that could activate the protein.
The KnIT system found 74 kinases that could be modifiers. Of these, 10 were already known and nine more were discovered almost a decade later. The KnIT system not only came up with those 19, but it also accurately predicted seven more that researchers were currently investigating.
In addition to analyzing a large body of written work, such systems could go even further. For example, future computer systems could analyze things written in real time, which might lead to your computer asking you if you really want to send that nasty email to your boss, writes Jack Schofield for ZDnet.
And leaves us wondering, what would one of those systems make of this blog post?