Introduction to Social Surveying: Pitfalls, Potential Problems and Preferred Practices
Agricultural and Resource Economics, University of Western Australia
Collecting valid information in a survey of peoples attitudes, beliefs, intentions and preferences is much more difficult than appreciated by most researchers outside the social sciences. The main problems are people-related, not statistical, and they include issues such as the ambiguity of communication by language, the attitudes of respondents to their participation in the survey, and the limits to human memory. This paper provides an overview of methodological issues in the process of conducting social surveys, other than those relating to sample selection and statistical analysis. There is a focus on the use of surveys in agriculture, horticulture and natural resource management. We outline common problems encountered in the conduct of surveys leading to poor validity of results. A detailed procedure for developing and conducting surveys is recommended. Issues in the design of valid and reliable survey questions are also outlined.
Suppose you wanted to know:
The most practical way (and perhaps even the only way) to attempt to answer these questions is through conducting a social survey. By a "social survey" we mean asking a consistent set of questions to a sample of people, recording and analysing their responses.
Surveys are the standard tool for professionals who are interested in peoples attitudes, beliefs, intentions and preferences. Biochemists have glass ware and electronic equipment, agronomists have field trials, economists have computer models and social scientists have surveys. As a tool, a survey can be every bit as difficult and complicated as the tools of biochemists, agronomists and economists.
Our objectives for the reader are for them to
(a) appreciate that you should not conduct a survey at all unless you are prepared to invest a lot of time and care into it, or to pay a professional to do it properly. Without this, the survey is probably useless at best, or even misleading and damaging.
(b) learn to recognise poorly conducted surveys done by others, so that you are able to avoid giving undue weight to results of doubtful validity.
2. Why is it so difficult to conduct a good survey?
Contrary to the practice adopted by many untrained surveyors, conducting a valid survey is not just a matter of deciding on a few questions and going out and asking them of some people. For a variety of reasons, finding out "the truth" in a survey can be incredibly difficult. Some of the reasons for this are outlined below. If you are at all interested in "the truth", you need to know and respect how elusive it is, and know what to do to get close to it.
Of course you may not be interested in the truth. For example, a couple of years ago there was survey sponsored by the Western Australian Liberal Party which asked people something like, "Are you happy to have your personal freedom over-ridden by interference from unions". Of course the survey response was very negative, but the question was so loaded that it did not prove a thing about peoples attitudes to unions despite what the press release said. Im assuming that most of you will be interested in surveys to get at the truth, rather than to generate propaganda.
Foddy (1993) outlines a set of reasons why surveys often do not get close to "the truth".
(a) Even simple factual questions are often answered incorrectly.
For example, one study found that 10 percent of respondents in a Philadelphia survey gave different answers to the question, "What is your age in years" when re-surveyed a week later. This is typical. In cases where it has been possible to check the answers to simple, factual questions, such as "do you have a driving license", 5-17 % of answers given are incorrect. Given this, one must wonder about the validity of answers to more subtle or complex questions.
(b) The relationship between what people say they do and what they actually do is sometimes poor.
A classic paper in this area is by Lapiere who traveled with a Chinese couple in the US in the 1930s (when racial prejudice was apparently very strong). He recorded which restaurants they ate at and which hotels they stayed in. Afterwards he wrote to them all and asked whether they accept members of the Chinese race as guests and 90 percent said no, even though the truth was that they all had.
(c) Peoples attitudes, beliefs, opinions, habits and interests often seem to be extraordinarily unstable.
Converse (1964) found that the correlation between peoples political attitudes two years apart was very low. Gritching (1986) found that 18 percent of people changed their attitude to having a casino built in their neighbourhood during the course of a short interview.
The instability may be due to actual instability of attitudes, but it may reflect other things, such as the way the question is asked.
(d) Small changes in wording sometimes produce major changes in responses
In July 1985 a survey of people in Melbourne found that 27 percent of people rated the performance of the leader of the opposition as good or very good. However when asked whether they "approve" of the performance of the opposition leader, 48 percent said that they did. In other words, of the people who did not rate his performance as at least "good", almost a third said that they did approve of his performance.
(e) Respondents commonly misinterpret questions.
Nuckols (1953) took nine questions used by professional surveyors and asked people to repeat the questions in their own words. 17 percent of responses were incorrect. They would have been answering a different question to the one the surveyors thought they were asking.
(f) Answers to earlier question can affect answers to later questions.
If you ask Germans to rate how "German" potatoes are, they rate them as more German if you first ask them how German is rice. Earlier questions can either reinforce or work against the response given without any preliminary questions.
(g) Changing the order in which response options are presented sometimes affects respondents answers.
If people are asked to read the options for themselves, they tend to go for the first one. This is called a "primacy" effect. If the options are presented verbally, they tend to go for the last one: a "recency" effect.
(h) Answers are sometimes affected by the question format.
An example of an open ended question: "which magazines do you buy?"
A closed question: "which of the following magazines do you buy?"
One study found that for a particular magazine, seven percent of respondents said they bought it if asked the open questions but 38 percent did so for the closed question.
(i) Cultural or ethnic differences can affect not only the interpretation of a question, but also peoples willingness to give accurate answers.
For example, in a culture where governments and/or business are perceived as being corrupt or exploitive, responses to questions from outsiders are likely to be affected by the risk that responses may be obtained and abused by government officials or others.
The first three factors (a), (b) and (c), are issues which are unavoidable to some extent. The other factors all have implications for how one should go about developing and conducting a survey.
For all of these reasons it is essential to invest a lot of care and effort into developing, testing, improving and re-testing your survey before you actually conduct it. You might need to spend three times as long preparing and testing your survey questions as you spend running the actual survey. Alternatively, employ a professional to do it properly.
Given that is so difficult, you might ask "why bother?" The reason is that it is really the only way of obtaining some types of information, and in some cases imperfect quality information is better than none at all.
3. Steps in the process of doing a survey
(a) Do I really need another survey?
Many of us are regularly surveyed for one purpose or another. For example,
Some segments of the population relevant to your industries endure the kind of surveys we all get but additionally are frequently surveyed in relation to their industry. This particularly applies to farmers but probably also to horticultural growers. The makers of products and suppliers of services to these groups are constantly faxing, phoning and arriving on their doorstep. People get sick of completing so many surveys, so its important to ask yourself: Do I really need to conduct a survey to obtain this information?
It is possible that the information may be available from another source, such as
- a previous survey (much information is collected routinely and regularly but not used).
- published data (someone in another state or even another country might have done research already which will at least partly answer your questions).
- reliable interpersonal feedback from contact with farmers or growers at field days and other extension activities. This may be less feasible for surveys related to environmental management, where the target audience is often a more general group who are not easily geographically or socially defined.
(b) Statement of information goals and uses
Precisely what information do I want to know and what will I do with this information when I get it? Get specific before you go on. If you cannot, do some exploratory work before commencing the survey. Try writing a statement of the information goals of the survey. For example,
Goal: To answer the following questions:
(c) Collect Background Information
This step consists of familiarising yourself with the issues you have decided to conduct the survey on so that you have an understanding grounded in reality and a feel for the issue.
For example, CALM has asked you to survey forest users to ascertain their understanding of bush fires and safety regulations. They aim to use this information to design a public information campaign. You would start by:
(d) Focus Groups
A focus group is a small groups of people (say 6 to 8) drawn from the population you will survey. You ask them open ended questions about the issue in question and record their responses. Focus groups are excellent for helping you ensure that you can ask about aspects of the issue which are most important to the relevant population, you can word questions using language which is appropriate to the audience, and you can pick up any new issues or problems which you werent aware of.
For example: One of your aims is to find out the reasons for low attendance at agricultural field days: you believe it is linked to poor presentation of information and so would have included several questions in the questionnaire in this aspect. However, during the focus group you find out that people are either highly impressed with the presentations or have never attended a field day, so they couldnt have based their decision on poor presentation of information. Instead the timing of the event is a salient problem. You would refocus you questionnaire onto issues related to timing.
Procedure for focus groups
(i). Select a subsample: You should think about the important characteristics of your target group and make sure they are represented in this small sample.
For example; a recent survey of horticultural growers on the Swan Coastal Plain selected a sample which included growers from a range of locations, crops, farm size and ethnic groups as these characteristics were thought to affect the information required on water and land management.
Sometimes due to lack of resources/money, researchers use a convenient sample for the focus group; for instance a Landcare group or a bush walking group who they have established links with, but this obviously reduces the representativeness of the data.
(ii) Create a prompt list. You will have formed a range of ideas which you think need to be included in the survey from your review of the literature and from talking to stakeholders, etc. Write down the key ideas to use as discussion starters with the focus group. Use the prompt list as a check list to make sure that these issues you have identified ahead of time are covered. The order of the issues in the discussion is not important. Be prepared to give attention to issues and ideas which are not a part of your prompt list.
(iii) Facilitating the group. Your role is to get a discussion going in the general topic area and then observe and record the discussion. Occasionally you will prompt the discussion by asking an open-ended question to address the issues on your list. Tips:
(iv). Tape observations and write observations down.
(v). Analyse the tape to provide major issues for questions and to identify suitable wording for the questions - wording close to that used by members of the focus group if possible.
(e) Select survey method (personal interview, phone, letter, fax)
Social survey data can be collected in a number of ways, as was illustrated in the opening examples; by post, fax, phone or in a face to face interview. There are a number of factors you will consider when determining which is the most suitable method:
Cost: Phone, fax or letter are cheaper; face to face is most expensive and time consuming.
Response rate: It is common to have a response rate of 30% or less in postal surveys. Even if you got a response rate of 60% there would be the question of what the distribution of replies would have looked like if every one had responded. Remember that those who do respond may well be self selecting on the basis on particular characteristics; amongst growers and farmers we might guess that the more highly educated give more priority to paper work and are thus more likely to return a postal survey. If education is also likely to be associated with the attitudes We are examining then you've got a biased sample. Response rates for phone or personal interviews are higher: around 70%.
Why worry about a low return rate? To illustrate, in a postal survey of vegetable growers with a 55% return rate it is found that 70% of those who returned the survey have already adopted a new water saving irrigation method. However, 70% of the returned surveys are from growers with a TAFE qualification or higher, and growers with higher education are always faster at adopting new technology. Of the 45% who failed to return the questionnaire, only 10% have TAFE or higher education and less than 20% have adopted the new technology.
Response rates for postal surveys are commonly increased by the use of incentives. A farmer we discussed this issue with had a variety of freebies he had accumulated from answering surveys: a cap from International, towels from Monsanto, etc.
A common and ethically sound way of increasing response rate and your public accountability as a researcher is to undertake to provide a summary of findings from the finished study.
Complexity of the information to be collected: If the survey requires complex information or large amounts of information, a personal interview may be the only feasible method. In all surveys, the length should be as short as possible, but in postal surveys brevity is especially important so as not to reduce the response rate.
Geographical factors: It may well be impossible to reach graziers in the far north in face to face interviews because their isolation increases the cost.
Telephone ownership: British research shows that telephone owners differ from non telephone owners in terms of education of head of household, income and number of persons in the household. We are not aware of any research on this in Australia, but would guess that by now a bigger issue would be unlisted numbers. This however, can be overcome by using a random number generator in your sampling strategy
Literacy Levels: Around 10 percent of the population in Australia has severe literacy deficits and if you count those with low levels of competency you are probably looking at about 30 percent. Again we do not have the data on this for farmers or growers in Australia but imagine trying to reach the labourers in the market gardening areas of Wanneroo with a postal or telephone survey. Many of them are recent non-English speaking migrants.
Time: Telephone surveys are favoured for their speed over mail outs and face to face interviews.
Validity: In face to face interviews and telephone interviews, the interviewer is a threat to validity. Inappropriate non-verbal behaviour, failure to clarify vague replies, failure to stick to the wording, failure to faithfully record the respondents reply are all common problems.
Biased samples through low response rate is the most worrisome aspect of postal surveys. You could address this by following up a sample of non-respondents with a face to face interview and then comparing their results to the postal ones to see if there is any systematic difference.
A number of comparative studies have found a broad similarity of results in telephone and face to face surveys.
(f) Determine sampling method and select sample
Sampling is a huge issue which we do not address here. The aim is to get a sample which is as representative and unbiased as possible. This is by no means easy.
(g) Draft questions
There are many issues here. Well talk more about how to design questions later.
(h) Pilot test the questionnaire
One important type of pilot testing is to run the draft survey with a small sample of the target population. This is useful for uncovering aspects of questions that will cause interviewers and respondents to have difficulty.
When doing this, consider the following questions.
1. Did any of the questions seem to make the respondent uncomfortable?
2. Did you have to repeat any of the questions?
3. Did the respondents misinterpret any of the questions?
Adapted from Foddy (1993).
Often this piloting is done with just one interviewer, trying to record answers and their impressions in addition to respondent comments. This is overloading the interviewer. Two interviewers should conduct each pilot interview. One should conduct the interview, the other record impressions.
During pilot testing it is desirable to conduct some in-depth testing of some selected questions. There are several methods:
Another type of pilot testing which can be valuable is to attempt to analyse a set of fictional results to your survey. Often people dont sufficiently consider which statistical method or what type of summaries they are going to use until after the data has been collected, by which time it is too late to realise that you didnt ask for the right information to do the planned analysis. Attempting to actually do an analysis with artificial data (e.g. made up out of your head) will reduce this problem substantially.
(i) Redraft the questionnaire
Reconstruct the questions based on your experience with the pilot interviews and pilot analysis.
(j) Train interviewers
Emphasise the things they could do which would reduce the validity of results, such as inconsistent wording or providing guidance or reinforcement for particular types of responses. These things are to be avoided at all costs.
(k) Collect data
Finally we get to the bit which people think of as "doing a survey". In practice you can see that in a properly conducted survey this is a relatively minor component.
(l) Code the collected data and enter it into computer
Usually responses are recorded in a way which is not suitable for direct entry into computer. Someone has to go through and convert them into a useable form - e.g. assign a number to each response option. Also you must check for cases where a typist might have trouble interpreting or reading a response. Make the important judgements yourself rather than leaving it to a typist.
(m) Analyse the data
There are many possible useful approaches including:
(n) Write report
If it has been worth investing all the time, money and effort to get this far, it is worth writing a proper report of what you have done. Given that most surveys are done very badly, you should include details of your methodology so that people can judge the validity of your results. Include a copy of your survey form in an appendix.
4. Designing valid and reliable questions
(a) What are validity and reliability?
These are issues which are probably taken for granted in the kind of measurement you deal with in biological and physical sciences but they can be hard to achieve in the social sciences. When we are measuring attitudes and beliefs we are not dealing with tangible, observable identities like the number of plants germinated. There is nothing to calibrate our instruments against except our own theories and concepts. Despite the many, many problems with accurately and reliably measuring attitudes, the most common thing is for researchers to ignore the problems and treat the data as if it were implicitly accurate. This makes the whole exercise worthless.
Reliability: This is whether a person would give consistent answers to your survey in different times, places or contexts. What we are getting at here is whether the answers you are getting are effected by things like the respondents mood on the day, their health state or the weather.
Validity: This is whether the questionnaire actually measures what it sets out to measure.
Note that a reliable measure is not necessarily valid. A question might return the same answers consistently but not be measuring the attitude you wish to measure. For example:
Rate your level of agreement or disagreement with the following statement:
"Spray drift is minimised with the use of fine nozzles."
strongly agree -------------------------------------------- strongly disagree
If you are trying to measure whether farmers believe that they should be using fine nozzles the question may be reliable but not valid. The farmer may consistently answer that they strongly agree that fine nozzles reduce spray drift but there is no guarantee that they want to reduce spray drift.
We'll discuss some of the main factors affecting question validity later when we discuss question construction.
(b) Should you use closed or open ended questions?
Example of an open ended (qualitative) question:
How do you feel about empowering rangers give spot fines for users of National Parks who are not abiding by park regulations.
A "closed question" might present the same issue as:
National Park Rangers should be empowered to give spot fines to people who are caught breaking park regulations. Do you:
strongly agree ___________ strongly disagree
1 2 3 4 5
Controversy has always raged over the collection of qualitative information from open ended questions. Researchers disagree over whether this kind of information is valuable. It has been argued that the open type of question fails to control what the respondent is supposed to be answering; that respondents wander from the topic and that answers from different respondents cannot be meaningfully compared.
On the other hand closed questions are said to impose a framework on respondents which may not be relevant to the respondent, and that fixed response options force the respondent to adopt the researchers frame of reference even when it is not meaningful to them.
In our view:
60% of surveyed farmers agreed to some extent with the notion that farmers ought to receive a government subsidy for fencing bush land. Typical of the comments made by farmers was the idea that: "The whole of society benefits by our Landcare efforts so we should receive financial assistance to carry it out."
For further discussion of the validity of group discussions in collecting qualitative information see Belson (1986) and Foddy (1993).
(c) If closed questions, which type of closed question format?
Our purpose here is firstly to alert you to the types of scales and formats used in closed questions and some advantages and disadvantages of each type.
The main types are:
The simplest kind of question is the agree/disagree question. This tends to be lower in validity than those with a scale of responses because it forces an extreme or cut-and-dried response when in fact most of us are not clearly polarised on many issues. You should normally avoid agree/disagree or yes/no questions.
In this approach, an example is given for each level in the scale. This can be useful for frequency of behaviours where each point is objectively numerically defined. For example:
Normally, I listen to the Country Hour on ABC radio
o every lunch time it is broadcast (5 days a week)
o about once a week
o about once a month
o a couple of times a year
However, for other behaviours it is a lengthy process to develop scales with behavioural or attitudinal pointers which accurately represent something.
This is quick and easy to administer and understood by most of the population. A long list of items is gathered which supposedly distinguish between individuals on the behaviour in question. The respondent runs through and ticks those that apply to him/her. It is useful to have an applicable, not applicable or don't know box, so that the respondent is forced to consider every item. For example:
I subscribe to Agriculture WA Farm Notes
I usually read the Agriculture WA "Ag Memos"
I make a point of discussing new farming technology/ideas with neighbours.
I subscribe to at least one Farmers' journal
I usually buy the Countryman
I sometimes attend field days.
Ranking requires a less sophisticated type of judgement than the rating scale methods because it only asks the respondent to put items in order. It becomes too complex where the number of items is large.
"Rank the following in order of your preference as a source of technical information":
o My farm management consultant
o My neighbour
o Other family members
o The Department of Agriculture
o Radio information
o Written information from other sources.
Summated scales (or Likert scales)
Summated rating scales have been used much more than any other technique for measuring attitudes. This is probably because they seem easy to prepare and to use. With this method an attitude is presented as a continuum with instructions for the respondent to indicate their position on the continuum.
Level of generality/specificity
If our goal is to predict behaviour, then it is important that the question we ask is specific to the behaviour we are concerned with. If our goal in asking about attitudes towards the amount of land reserved as national parks is to find out how many people would vote against a government which undertook to increase the land in national parks then this is what we should ask respondents to answer. "I would vote for a government who pledged to increase the amount of land in national parks"
Choosing the Number of Categories
Studies carried out to detect how many categories we can reliably discriminate between and what would be the optimal number of categories in a rating scale suggest the following:
Overall, seven to nine categories produce the most reliable and valid ratings
The Effects of Category Labels
You would hope that the labels used to indicate the categories or positions on the rating scale would not bias respondents or would at least affect them in an even handed way, but this is not the case. A number of studies indicate that responses to rating scales are usually biased towards the largest number or most positive verbal anchor. Also respondents are more influenced by middle category labels than end labels. One solution to this problem is to only label the end categories. It is suggested that if category labels create confusion, fewer labels result in less confusion.
Need to Clearly Define What is Being Measured
Take the following example:
Please indicate by circling a number how strongly you agree or disagree with the following statement:
Too much of our land in Australia is devoted to national parks:
1. Strongly agree
5. Strongly disagree
So what are we really measuring on this scale? If I tick "strongly disagree" what does that mean? Does it mean that:
(a) I feel very strongly about this statement (emotion/affect).
(b) I think that this is a very important issue? (intensity/cognitive)
(c) I am very sure that I disagree? (surety/cognitive)
(d) I would act to see that the amount of land in national parks is maintained or increased. (action/behaviour)
Research shows that these four dimensions are not necessarily in tune. For instance I might respond with a lot of negative emotion to the statement, because I love bush picnics but not see this matter as a very important one relative to say third world poverty or human rights abuses. Despite my negative emotional response I might not be really sure if I disagree with the statement because I haven't spent much time thinking about it. And if someone asked me to write a letter or attend a rally, I might be too busy. This problem gets to the heart of difficulty with defining attitudes.
Minimising the problem: Be clear in your own mind which dimension you are asking about and phrase the question to reflect this. You should emphasise feelings or action in the statement if these are the issues you are interested in. Also, you might do away with the middle neutral category and include a category to reflect the respondent's sureness and/or the importance to them.
Specifying the Standards of Comparison
It is possible for two different people to agree with a statement just as strongly as each other, but to give different ratings, such as one saying "agree" while the other says "strongly agree". There are no objective empirical values attached to these scales. Even within one persons rating of different questions, "agree" may mean something different in terms of intensity in different questions.
Added to this is the problem that the distance between categories probably differs between individuals and within individuals across questions. For example we might assume that people answer questions based on alternative responses being evenly spaced apart:
when in truth an individual might be answering according to:
Once way to avoid this is to allow people to mark a point on a line, rather than select from a small number of response options. Another is to ask people about their level of uncertainty or conviction about an answer.
How sure are your about your response to the above question?
very sure . . . . . . . . . . . . . . . very unsure
How important would you say this issue is to you relative to other public issues?
Include an additional category such as "don't know" or "have not had a chance to form an opinion about this."
Don't include the "don't know" response as the middle option on the Lickert scale. "Don't know" is not the same as "neutral", but you risk it being mis interpretted this by positioning it in the place where respondents will expect to find "neutral".
Batteries of Scales
It is usual for a social survey to include a number of rating scales which examine different aspects of the same question, rated on the same scale. Ratings on these various items might be summed to give an overall rating of how positive or negative the respondent feels. However, there are problems with this.
(a) Respondent do not know the range of issues to be included in this section before they answer. Typically this means that they will use an extreme category for an early question and then later on run into a question, which they feel more extreme about relative to the earlier one. They have nowhere to go on the scale to represent this stronger feeling.
Minimising this problem: Instruct the respondent to read through the whole range of items to be summed before you get them to answer and try using at least seven or nine categories for these summed questions.
(b) Totalling these responses means that you are adding together things as though they were all part of one dimension, when in fact totalling responses doesn't always make a lot of sense.
Minimising this problem: Don't sum across a great battery of scales which represent different dimensions. If you are interested in whether the farmer thinks roaded catchment dams are cheap, just sum across several questions asking about cheapness. If you are interested in whether the farmer feels they feel the new style of catchment is reliable , ask several questions about this aspect and sum across these.
(d) Did you ask what you intended to ask?
When researchers have taken the time to go back and examine the validity of questions the results have sometimes been surprising and dismaying. Belson (1981) looked at example problem questions derived from 24 different market researchers. When respondents were asked to restate the question in their own words an average of 30 percent of respondents had interpreted the question as intended. Further, Belson's analysis of key words such as "your, usually, weekday, children,. generally, regularly, etc." shows that many people interpret even these in an unintended way. What seems to happen, is that respondents try to answer all questions, and if they are having trouble with the question they modify it so that it suit them and then answer it .
While piloting and then redrafting a questionnaire is essential, it is not sufficient. You must do more than this to ensure that questions are working as intended. Some methods are reviewed here. The list below shows a set of recommendations for good survey practice which have been derived on a trial and error basis from many social survey researchers.
Designing Good Survey Questions
1. Make sure that the topic has been clearly defined for yourself.
2. Be clear both about the information that is required about the topic and the reason for wanting this information (for yourself).
3. Make sure the topic has been properly defined for the respondent
4. Make sure that the question is relevant to the respondents.
5. Make sure the question is not biased.
6. Eliminate complexities so that respondents can easily understanding the question.
7. Avoid ambiguities in the question.
8. Ensure that the respondents understand what kind of answer is required.
Adapted from Foddy (1993).
Belson, W.A. (1986), Validity in Survey Research, Gower, Cambridge.
Foddy, W. (1993), Constructing Questions for Interviews and Questionnaires, Cambridge University Press.
Citation: Pannell, P.B. and Pannell, D.J. (1999). Introduction to Social Surveying: Pitfalls, Potential Problems and Preferred Practices. SEA Working Paper 99/04, http://www.general.uwa.edu.au/u/dpannell/seameth3.htm
The SEA News home page is now at http://www.crcsalinity.com.au/newsletter/sea/