The U.S. Geological Survey (USGS) is a government organization that tracks earthquakes and their impacts across the world. The USGS provides information about seismic activity through the National Earthquake Information Center (NEIC), which is the culmination of data gleaned from around 2,000 earthquake sensors, the majority of which are based in the U.S.
But in 2008, when a massive, 8.0 magnitude quake hit Sichuan Province in China, it was Twitter that provided the first indicators to the wider world of the event. Of course, with most of their data coming from within the U.S., USGS was not overly surprised by the data coming from Twitter, but it got them thinking - maybe Twitter data could be used as an additional indicator in their research to help better detect and assess the impacts of quakes around the world, enabling them to respond faster to such events.
This started USGS on an investigation to see how Twitter users discuss earthquakes and related activities in order to determine the applicability of such data for their research. What they found was Twitter's real-time pulse of activity could, in fact, refine and improve their data collection processes - and their learnings provide some interesting food for thought for anyone looking to utilize Twitter data, as outlined in a new post on Twitter's data blog.
Refining the Data
Two staffers from USGS, Paul Earle and Michelle Guy, started working with Twitter's Public API - the data that's available for all to use - to determine whether Twitter data could be used to detect and verify reports of quakes and their impacts, matching the Twitter activity with their own sources to form definitive links between tweets and actual seismic events. Through their research, they found that people in earthquake-affected regions tend to keep their messaging short - they don't have as much time to construct detailed missives.
Based on this, the team were able to narrow down their Twitter search, filtering out tweets with more than seven words. The team also recognized that people sharing links or the size of the earthquake were less likely to be offering first-hand reports, so they filtered out those tweets too (any tweets including a link or a number). The final, filtered stream, based upon this work, has ended up being a significant and valuable indicator of when earthquakes occur, globally.
What's of most interest here is the way in which USGS have filtered their data streams to find the most relevant info. One of the biggest impediments to success in social media marketing is that despite the ever-expanding streams of data flowing through social networks, the ability to filter that information into workable, focused insights can seem impossible, can be overwhelming to consider, particularly for people just starting out in the field. All that information's useless if you can't do anything with it, and this is particularly true for Twitter, which has had trouble communicating its value proposition to non-users who often see it as an uncontrollable, never ending flow of mostly non-relevant content.
But it's in the ability to filter than info that true social media success lies. And what's more, using USGS as an example, the means of doing just that are not as complex as they may initially appear. The key lies in finding the right trigger point, the relevant conversations amidst the noise in order to highlight relevant conversations - the information you need to be tuning into and acting upon. USGS have found a few intuitive and simple data refinements to enable just that, and it's important to note that these are not highly complicated. The process they've followed is logical, examining the specifics of the data they need - not all the data available. And it works.
When assessing your own data, this example provides an interesting reference point to follow, to start with the wider conversation then narrow that down to the points which are really relevant, focussing on the references that relate to what you really need to know. Social data can be overwhelming, but as shown here, there are ways to rationalize it and locate the most important conversations to fuel your reporting and tracking needs.
As noted in the Twitter post:
"While I was at the USGS office in Golden, Colo. interviewing Michelle and Paul, three earthquakes happened in a relatively short time. Using Twitter data, their system was able to pick up on an aftershock in Chile within one minute and 20 seconds - and it only took 14 Tweets from the filtered stream to trigger an email alert. The other two earthquakes, off Easter Island and Indonesia, weren't picked up because they were not widely felt."
Again, it's not the volume of data the team rely on, it's the relevance of the information - it only takes 14 tweets to provide an indicative measure of earthquake activity. When considering that 500 million tweets are sent per day, it can be easy to get swept up in the scale of it, to get lost in the impossibility of trying to translate all that into something actionable. But the key is not in translating everything, it's in refining it down to the smallest focus to enable effective analysis.
Through their research, USGS have been able to do just that, proving, over time, that they can glean the data they need from very specific matches - that's the key that all marketers and analysts should be looking to achieve. How do you sharpen your data down to its most relevant and actionable elements? What commonalities exist within your data that signal the key information?
This refinement is again underlined in the further notes in the post:
"The USGS monitors for earthquakes in many languages, and the words used can be a clue as to the magnitude and location of the earthquake. Chile has two words for earthquakes: terremoto and temblor; terremoto is used to indicate a bigger quake. This one in Chile started with people asking if it was a terremoto, but others realizing that it was a temblor.
As the USGS team notes, Twitter data augments their own detection work on felt earthquakes. If they're getting reports of an earthquake in a populated area but no Tweets from there, that's a good indicator to them that it's a false alarm."
The detail matters terribly in data analysis, and the actionable insights you glean from social networks will come down to how well you can refine the conversations into relevant data. In this sense, the term 'big data' may be misleading in an analytical sense, because it's more the 'small data' contained within that's relevant (though without the large scale, much of it would be irrelevant).
Harnessing the Power of Insight
The USGS case study provides an insightful example of how social data can be used, and the importance of tracking the right information to convert social data into something practical. The first step in the process is ascertaining what it is, exactly, you need to know. In this case, the team have started with mentions of earthquakes then narrowed down the data by cross-matching posts for relevance. For your business, maybe you're tracking all mentions of your brand name (which you should be), maybe all mentions of your industry, all of your target keywords in some capacity. Track too much and you'll likely never glean much insight, as you'll be setting yourself a mammoth task in monitoring all those mentions every day. But through refining, though working out the key messages you need to track that are actually actionable and relevant to your business interests, you can focus your efforts onto the conversations and mentions that matter the most.
Maybe you sell car tyres - is it worth tracking mentions of 'car tyres', or of more value to find conversations about 'road trips'? If you sell coffee, do more people buy coffee based on specific mentions of that term, or is it better to focus on people who start work early? And if it is, what terms do they use on social that highlight them from the pack? While these are only rudimentary examples, you get the point - there's more data available to you than ever before regarding behaviors and actions. Finding the key conversations that lead to your brand could be of immense value to your overall marketing efforts.