Mind Over Subject Matter: Researchers Develop A Better Understanding of How Human Brains Manage So Much Information

"Synapse", Image by Allan Ajifo

“Synapse”, Image by Allan Ajifo

There is an old joke that goes something like this: What do you get for the man who has everything and then where would he put it all?¹ This often comes to mind whenever I have experienced the sensation of information overload caused by too much content presented from too many sources. Especially since the advent of the Web, almost everyone I know has also experienced the same overwhelming experience whenever the amount of information they are inundated with everyday seems increasingly difficult to parse, comprehend and retain.

The multitudes of screens, platforms, websites, newsfeeds, social media posts, emails, tweets, blogs, Post-Its, newsletters, videos, print publications of all types, just to name a few, are relentlessly updated and uploaded globally and 24/7. Nonetheless, for each of us on an individualized basis, a good deal of the substance conveyed by this quantum of bits and ocean of ink somehow still manages to stick somewhere in our brains.

So, how does the human brain accomplish this?

Less Than 1% of the Data

A recent advancement covered in a fascinating report on Phys.org on December 15, 2015 entitled Researchers Demonstrate How the Brain Can Handle So Much Data, by Tara La Bouff describes the latest research into how this happens. I will summarize and annotate this, and pose a few organic material-based questions of my own.

To begin, people learn to identify objects and variations of them rather quickly. For example, a letter of the alphabet, no matter the font or an individual regardless of their clothing and grooming, are always recognizable. We can also identify objects even if the view of them is quite limited. This neurological processing proceeds reliably and accurately moment-by-moment during our lives.

A recent discover by a team of researchers at Georgia Institute of Technology (Georgia Tech)² found that we can make such visual categorizations with less than 1% of the original data. Furthermore, they created and validated an algorithm “to explain human learning”. Their results can also be applied to “machine learning³, data analysis and computer vision4. The team’s full findings were published in the September 28, 2015 issue of Neural Computation in an article entitled Visual Categorization with Random Projection by Rosa I. Arriaga, David Rutter, Maya Cakmak and Santosh S. Vempala. (Dr. Cakmak is from the University of Washington, while the other three are from Georgia Tech.)

Dr. Vempala believes that the reason why humans can quickly make sense of the very complex and robust world is because, as he observes “It’s a computational problem”. His colleagues and team members examined “human performance in ‘random projection tests'”. These measure the degree to which we learn to identify an object. In their work, they showed their test subjects “original, abstract images” and then asked them if they could identify them once again although using a much smaller segment of the image. This led to one of their two principal discoveries that the test subjects required only 0.15% of the data to repeat their identifications.

Algorithmic Agility

In the next phase of their work, the researchers prepared and applied an algorithm to enable computers (running a simple neural network, software capable of imitating very basic human learning characteristics), to undertake the same tasks. These digital counterparts “performed as well as humans”. In turn, the results of this research provided new insight into human learning.

The team’s objective was to devise a “mathematical definition” of typical and non-typical inputs. Next, they wanted to “predict which data” would be the most challenging for the test subjects and computers to learn. As it turned out, they each performed with nearly equal results. Moreover, these results proved that “data will be the hardest to learn over time” can be predicted.

In testing their theory, the team prepared 3 different groups of abstract images of merely 150 pixels each. (See the Phys.org link above containing these images.) Next, they drew up “small sketches” of them. The full image was shown to the test subjects for 10 seconds. Next they were shown 16 of the random sketches. Dr. Vempala of the team was “surprised by how close the performance was” of the humans and the neural network.

While the researchers cannot yet say with certainty that “random projection”, such as was demonstrated in their work, happens within our brains, the results lend support that it might be a “plausible explanation” for this phenomenon.

My Questions

  • Might this research have any implications and/or applications in virtual reality and augment reality systems that rely on both human vision and processing large quantities of data to generate their virtual imagery? (These 13 Subway Fold posts cover a wide range of trends and applications in VR and AR.)
  • Might this research also have any implications and/or applications in medical imaging and interpretation since this science also relies on visual recognition and continual learning?
  • What other markets, professions, universities and consultancies might be able to turn these findings into new entrepreneurial and scientific opportunities?

 


1.  I was unable to definitively source this online but I recall that I may have heard it from the comedian Steven Wright. Please let me know if you are aware of its origin. 

2.  For the work of Georgia’s Tech’s startup incubator see the Subway Fold post entitled Flashpoint Presents Its “Demo Day” in New York on April 16, 2015.

3.   These six Subway Fold posts cover a range of trends and developments in machine learning.

4.   Computer vision was recently taken up in an October 14, 2015 Subway Fold post entitled Visionary Developments: Bionic Eyes and Mechanized Rides Derived from Dragonflies.

Can Scientists Correlate the Language Used in Tweets with Twitter Users’ Incomes?

Tweet100515

In the centuries since William Shakespeare wrote one of Juliet’s most enduring lines in Romeo and Juliet that “A rose by any other name would smell as sweet”, it has been almost always been interpreted as meaning that the mere names of people, by themselves, have no real effect upon who and what they are in this world.

This past week, the following trio of related articles was published that brought this to mind, specifically about the modern meanings, values and analytics of words as they appear online:

All of these are highly recommended and worth reading in their entirety for their informative and thought-provoking reports containing so many words about, well, so many words.

Then to reframe and update the original quote above to serve as a starting point here, I would like to ask whether a post by any other name in Twitter’s domain would smell as [s/t]weet? To try to answer this, I will focus on the first of these articles in order to summarize and annotate it, and then ask some of my own non-theatrical questions.

According to the Phys.org article, which nicely summarizes the study of a team of US and UK university scientists that was published on PLOS|ONE.org entitled Studying User Income through Language, Behaviour and Affect in Social Media by Daniel Preotiuc-Pietro, Svitlana Volkova, Vasileios Lampos, Yoram Bachrach and Nikolaos Aletras, a link exists between the language used in tweets and the authors’ income. (These additional ten Subway Fold posts covered other applications of demographic analyses of Twitter traffic.)

Methodology

Using only the actual tweets of Twitter users, that often contain “intimate details” despite the lack of privacy on this social media platform, the two researchers on the team from the University of Pennsylvania’s World Well-Being Project are actively investigating whether social media can be used as a “research tool” to replace more expensive surveys that can be “limited and potentially biased”.  (The work of the World Well-Being Project, among others, was first covered in a closely related Subway Fold post on March 20, 2015 entitled Studies Link Social Media Data with Personality and Health Indicators.)

The full research team began this study by examining “Twitter users’ self-described occupations”. Then they gathered a “representative sampling”  of 10 million tweets from 5,191 users spanning each of the nine distinct groups classified in the UK’s official Standard Occupational Classification guide and calculated the average income for each group. Using this data, they built an algorithm upon “words that people in each code use distinctly”.  That is, the algorithm parsed what words had the highest predictive value for determining which of the classification groups the users were in the sample were likely fall within.

Results

Some of the team’s results “validated what’s already known”, such as a user’s words can indicate “age and gender” which, in turn, are linked to income. The leader of the researchers, Daniel Preoţiuc-Pietro, also cited the following unexpected results:

  • Higher earners on Twitter tend to:
    • write with “more fear and anger”
    • more often discussed “politics, corporations and the nonprofit world”
    • use it to distribute news
    • use it more for professional than personal purposes, while
  • Lower earners on Twitter tend to:
    • be optimists
    • swear more in their tweets
    • use it more for personal communication

This study will be used as the basis for future efforts to evaluate the correlations between user incomes with other data from the real world. (Please see also these eight Subway Fold posts on the distinctions between correlation and causation.)

My Questions

  • Might the inverse of these findings, that certain language could draw users with certain income levels, be used by online marketers, advertisers and content specialists to attract their desired demographic group(s)?
  • How could anyone concerned with search engine optimization (SEO) policies and results make use if this study in their content creation and meta-tagging strategies?
  • Does this type of data on the particularly sensitive subject of income, risk segmenting users in some form of de facto discriminatory manner? If this possibility exists, how can researchers avoid this in the future?
  • Would a follow-up study perhaps find that certain words used in tweets by authors who aspire to move up from one income level to the next one? If so, how can this data be used by the same specialists mentioned in the first two questions above?