• Skip to main content
  • Skip to primary sidebar
Merrill_Logo_CMYK_tag
  • Home
  • About
    • Team
  • Services
    • Protecting Client Investments Through Research
    • Complete Range of Research Services
    • New Product Development
    • Custom Panel Development
    • Audiences
  • Work Examples
  • Clients
  • Blog
  • Ecosystem Partners
  • The Merrill Institute
  • Contact
×
  • Home
  • About
    • Team
  • Services
    • Protecting Client Investments Through Research
    • Complete Range of Research Services
    • New Product Development
    • Custom Panel Development
    • Audiences
  • Work Examples
  • Clients
  • Blog
  • Ecosystem Partners
  • The Merrill Institute
  • Contact

Data

THROW THE BUM OUT!!! What do Baseball and Quality Data Have in Common? More Than You Think!

August 12, 2019 by Rich Stimbra

By Michael Rinck – Vice President

The Oakland A’s brand of Billy Ball brought to light the use of data as a competitive advantage for baseball teams. But for any data to be useful, it needs to be clean. We’re pretty sure cleaning research data doesn’t conjure up thoughts of baseball.  Probably not right off the bat, anyway (pun intended).  However, a deeper dive into the methods we employ to clean data are not unlike the myriad strategies used in the game of baseball itself [1].  We guard against stolen bases, throw curveballs, and strike out those who are less than honest with their responses.  And yes, occasionally we consult the instant replay. How, you might ask? Allow us to throw out the first pitch and explain:

THROW THE BUM OUT

Cleaning data is the quality control art of removing bad data that simply cannot be trusted.  Just as a manager will pull his pitcher if he can’t throw strikes or thinks he can’t, don’t be afraid to toss erratic data.

You’ve invested a lot of time and money in creating a survey.  You built it, and hoped they would come (okay sorry, reaching here). And you’ve invested a lot of time in scoping out sample and making sure you are targeting the right respondents for your research objectives.  You collect the data.  Now it’s time to step up to the plate and clean it—that is, remove bad or incomplete records.

Good data cleansing is not just about eliminating data but also ensuring its consistency. Data cleaning leads to high quality data the way good pitching leads to strikeouts. When data is of excellent quality, it can be easily processed and analyzed, leading to insights that help the organization make better decisions.

If you’re a baseball fan (and even if you’re not) here are 4 easy-to-remember and important tips for cleaning data:

  1. Stealing Bases: Speeding is considered problematic survey behavior because respondents are not providing thoughtful, accurate answers. Consequently, the data they provide may be of poor quality, and in turn, may have to be discarded so survey estimates are not adversely affected. One rule is that the length of interview needs to be at least 40% of the average length of the survey. Thus, if the average length is 10 minutes, we might cut off a speeder at 4 minutes. Of course, there are exceptions to the rule. For example, we need to adjust time requirements for skip patterns, open-ends, loading time for images, etc.
  2. A Curve Ball for Straight-Liners: We can see if a respondent straight-lines a series of rating scale questions. For example, if there are 20 statements being rated, we will check to see how many of the statements were given the same code.  We can run a distribution of the count of codes to calculate what percentage of straight-lining we will use to remove a respondent (i.e., terminate a respondent out if they straight-lined (same code) 18 of the 20 statements).
  3. Three Strikes and You’re Out: For example, in the screener we may ask: What brands have you consumed in the past 3 months?  And later in the survey we may have a follow up question detailing when exactly did they last consume that brand (with a scale, Last Week/Last Month/2-3 Months ago/6-12 months ago/Over a year ago)?  If the respondent answers longer than past 3 months, we will flag that respondent.  Depending on the survey, we may have up to 10 flags programmed for various questions.  A red herring question may also be included that would also count as a flag. An example of a red herring question may be to ask about past 3-month consumption for a fictitious brand. Any respondents indicating consumption of a nonexistent brand will earn a flag.  During the soft launch phase of fielding, we can determine how many flags are occurring and looking at that distribution we can determine what level we should terminate a respondent (i.e., term a respondent if they have 3 flags, 2 flags, etc.).
  1. Going to Instant Replay—Checking Response in Verbatims Questions: We can review the open-ends to verify that the respondent is answering the survey in a thoughtful manner. Short answers or garbage responses will count as flags.

To summarize, make sure that that the data you are including in the final dataset is accurate and reflects thoughtful and engaged respondents.  Why spend all the time and money upfront and include garbage data points?  Garbage in = Garbage out.

Utilize these top tips and you’ll avoid any foul balls in your research?  Let Merrill Research help you pick up your game!

Merrill Research—Experience You Can Count On.

_________________________________________________________

[1] George Will, Men at Work: The Craft of Baseball (1990)

Filed Under: Michael Rinck, Research Tagged With: blog, Data, Research

Dead Data Tell no Tales: Can Your Survey Results Survive?

September 17, 2018 by Rich Stimbra

By David M. Schneer, Ph.D.

Sour Sample, Part 2: How to Prevent DOA Data

This blog entry continues the thoughts of our last post.  If you haven’t read it yet, you can see it here: When Data Goes Bad: How Sour Survey Samples Nearly Ruined the Research Industry

Yeah, we know. The subject of data cleaning is real yawner. But dirty data done dirt cheap can result in the death of any study. Part of good quality data is crafting a good survey. It is absolutely critical. These poorly constructed surveys – and not so much the respondents themselves – are mostly to blame for the rotten data that results. A well-crafted and clever survey, thoughtfully designed with the respondent experience in mind, should be everyone’s primary defense against deadly bad data. But this is a topic for another blog.

In this post, we seek to heed the words of Donato Diorio, renowned data scientist, who posited: “Without a systematic way to start and keep data clean, bad data will happen.”

Bad data is no good. We at Merrill Research identify bad data by evaluating individual responses and identifying “flags”. When people ask us what kind of business we are in, we tell them that we’re in the Red Flag business. Depending on the survey, we may have as few as three or as many as eight or more flags programmed, depending on the length and complexity of the survey. We then monitor respondent flags throughout fielding, and replace bad data along the way.

Some of our flags are old-school – things everyone should already be looking for. The power is in leveraging a variety of flags in clever ways to identify the zombies. Here are just a few flags we use to ensure our data are sound:

  1. Speeders: Individual respondents must complete the survey in at least 40% of the average overall length of interview. Of course, there are exceptions to this rule, as one needs to consider timing impact of skip patterns, open-ends, loading time for images, etc., but this is a good baseline.
  2. Straight-Liners: We check to see if a respondent straight-lines (provides same response to) a series of rating scale questions. We run a distribution of the codes to calculate what percentage of straight-lining will be used remove a respondent (e.g., term a respondent out if they straight-lined a certain percentage of the statements)
  3. Inconsistent Responses: We may ask in the screener what brands have respondents consumed in the past 3 months. Later in the survey we may have a follow up question asking when exactly did they last consume that brand. If the respondent answers longer than past 3 months, we will flag that respondent. Another way to use this method is to craft survey questions that test a respondent’s knowledge on the topic in question. If they are unable to accurately answer a question designed to test their basic knowledge of a topic, they earn yet another flag.
  4. Responses to Verbatim Questions: We review the open-ends to verify that the respondent is answering the survey in a thoughtful manner. Short answers or garbage responses are flagged.

We may never achieve 100% confidence that every respondent is genuinely engaged with our survey instrument, but unless more stringent data cleaning measures are employed, conclusions drawn from these data will be increasingly questionable – think of the IT professional who just finished your survey who was actually a 15-year-old in New Jersey.  At Merrill Research, we employ the most effective measures to make sure our data is as clean as possible, and we are always on the lookout for the latest methods to ensure data quality.

We all know data research isn’t sexy, but neither is staking your reputation on bad results. Let us help you keep your data from getting zombified.

Merrill Research, Experience You Can Count On

Filed Under: David Schneer, Research Tagged With: blog, Data, David Schneer, Donato Diorio, Red Flags, Surveys

Primary Sidebar

© 2023 Merrill Research. All Rights Reserved.