April 25 #SMTLive Twitter chat recap: How Does Geotargeting Fit Into Your Customer Outreach Strategy? 

The Difference Between Structured and Unstructured Data in Social Media

ImageThe Guardian recently used a useful analogy to explain the difference between content and metadata. Content being the letter and metadata the envelope.

So, for example, in an email the “to”, “from” and “cc” fields are metadata, but the subject line is content.

Essentially, the metadata is structured and the content is unstructured.

In social media research, the distinction between the two is not always made clear.

While you can glean information from the structured data, analysing the unstructured data is the only way to uncover insights. It’s also the most difficult part to do (and do well).

Senior European consumer insight manager at Avery Dennison, Edward Appleton, defines insight as:

1. Invariably below the surface. It isn’t immediately visible or apparent.

2. Not already common knowledge or part of prevailing wisdom.

3. Leading to a new opportunity or growth potential that can be effectively exploited.

The structured data can provide the what, where and when, but not the how or the why.

Unfortunately, attempts to standardise the measurement of social media often focus on the structured data. The quantitative metrics like re-tweets, pins and likes. The kind of data that allows you to do a network analysis of who’s talking to whom, or (attempt to) measure ‘influence’ and ‘engagement’.

All of which can be interesting and provides a valuable context to any subsequent analysis of the content. However, when it comes to establishing a single framework for how to approach the research, treating the structured and unstructured data as the same thing makes about as much sense as having a single framework for running a focus group and a survey.

It also betrays a digital dualism in viewing social media as a single entity, in which a throwaway tweet about a trending hashtag is treated the same as an Instragrammed photo, or an in-depth discussion on a message board.

It further fails to differentiate in the many different ways different people use different sites.

Simply put, there will never be one way to interpret a conversation. It will always depend entirely on the context of how the discussion is being categorised and to what purpose.

You cannot always make the unstructured data from social media do what you want it do, which is why examining it in isolation does not always work.

Pursuing a single framework strikes me as erroneous as pursuing a single metric to measure influence. And you don’t find too many credible people continuing to advocate the latter.

Rather than imposing rigid standards, I think we should aim to explain clearly and transparently how we went about collecting, organising and interpreting the data in a way that makes sense to the original objective each time.

Creating broad guidelines, rather than a standardised framework, will also enable us to respond more quickly to emerging types of social media.

We should also always aim to distinguish between the structured data and the conversations that are not (until we categorise them in some manner) data as such.

You can bring order to unstructured data but you cannot impose order on it.

image: data/shutterstock

Join The Conversation

  • Jul 7 Posted 3 years ago IF4IT

    Hi Gareth,

    You may want to start to look at things like Data Drive Web Site Compilers/Builders that marry, both, structured and unstructured data.  Such compilers know how consume structured templates that allow users to embed unstructured content, within them.  For example...

    • Product templates
    • Project templates
    • Service templates
    • Organization templates
    • Human Resource templates
    • etc. (the list is endless)

    As a result of structuring raw content within templated frameworks, such compilers can derive metrics against things like quantity, completion, density comparisons, and much more.

    Even more importantly, if you think of sites like Wikipedia, it can take months to create one loosely structured article, where you have no consistency between articles, low link densities, poor content and link quality, and very expensive to generate content.  The opposite end of the spectrum is being able to dump data from a source system (like your Human Resources database), feed it to Web Site Compiler, and have the compiler generate tens of thousands of neatly organized, consistent, high quality, pages, with things like heavy link density and many other advanced features that you could never code by hand on sites like wikis.  Advanced Web Site Compilers like NOUNZ will even goes so far as to autogenerate complex interactive visualizations from the content.

    Treating raw content like structured content could be the answer to the problem at the heart of your article.

    My Best,


Webinars On Demand