(07) Data Collection -- Overview - Omnibus

# An Overview of Data Collection In some ways, the term *data collection* gives the false impression that the data already exist in an accessible and usable form, and all the researcher needs to do is simply gather that data. In reality, survey data are usually produced or generated when a respondent completes a questionnaire or participates in an interview. So, in a way, data are a product of the data collection process. Recall the following diagram showing the general process for conducting research using survey methods. As this diagram shows, all design aspects flow through the data collection process on the way to analysis and reporting results. **Figure 1** *General Process for Conducting a Research Study Using Survey Methods* ![[__/survey_process.svg]] Data collection involves a number of key considerations: - ***Mode of data collection*** - The *mode of data collection* is the medium through which responses are obtained from survey participants. - There are also implications regarding various practical issues, such as logistical feasibility and cost. - Notice that the mode of data collection is relevant to all the other major considerations involved in data collection. - ***Instrumentation*** - Broadly speaking, an *instrument* is the materials and procedures used to observe and record specific phenomena in a systematic and consistent way. - In the social and behavioral sciences, this is typically either a questionnaire or an interview guide/script/protocol. - For any empirical research study, there are important decisions to be made regarding instrumentation. This could involve one or more of the following: - Selecting an instrument (if an appropriate instrument already exists) - There is also the option of creating and testing an instrument. - If interviews are being used and data collection will involve interviewers other than the researchers, they must be trained and supervised (and compensated). - This is of course closely tied to the mode of collection. - ***Recruitment of participants*** - Back to the sampling frame and sampling methods - Obviously, there are also implications for unit nonresponse. - The use of incentives or compensation can sometimes be used to motivate participation. - The mode of data collection will also have implications here. - ***Nonresponse*** - Anticipating likely types of nonresponse and planning mitigating features - Mode has an impact on nonresponse as well. # Modes of Data Collection The *mode of data collection* refers to the specific method, medium, or channel through which data is gathered from respondents. This can include the technology used, the level of interviewer involvement, and how respondents interact with the data collection process. The choice of mode plays a crucial role in the planning and executing of survey methods, impacting various aspects such as respondent behavior, response rates, data quality, cost of implementation, the feasibility of reaching specific populations, and various other logistical considerations. Hence, it is crucial that the choice of mode aligns with the general aims of a study (its purpose, research questions), target population, and resource constraints. The most commonly used modes of data collection in contemporary research include: - Online (web) questionnaires - Mobile questionnaires (smartphone apps) - Face-to-face (in-person) interviews - Telephone interviews - Mixed-mode surveys Issues that affect the choice of a data collection mode: 1. What is the most appropriate method to choose for a particular research question/design? 2. What is the impact of a particular method of data collection on survey errors? 3. Does the nature of the sampling frame limit the possible modes that could be used? 4. What are the costs involved with a particular mode? ## Types of Mode There are a few different modes of data collection commonly used in social/behavioral research^[There are various specialized modes of data collection for other types of research (e.g., agricultural, geographic).]: - *Paper questionnaires* (completed by respondents, an interviewer may or may not be present, postal or personal distribution) - *Telephone interviews* (live or automated interviewer) - *Face-to-face* (FTF) *interviews* - *Computerized self-administered questionnaires* (CSAQ), better known as *online questionnaires* or *web questionnaires* - This is a relatively new mode of data collection that has become increasingly popular due to the advent and proliferation of the internet and personal computers. - The questionnaire could accessed by respondents through a webpage; these are typically distributed via email or recruitment posts/ads. - Ideally, these are accessible and easily usable on both conventional desktop/laptop computers and mobile devices. Researchers may use a combination of any of these modes to minimize costs and/or errors. Outdated or rarely used modes of data collection: - *Disk by mail*: Respondents would receive a questionnaire on diskettes, complete them on a computer, and return the disk by mail. This mode became obsolete with the proliferation of the internet. - *Email questionnaires*: The questionnaire could be sent by e-mail to the respondent, completed, then emailed back to the researcher. - *Paper-and-pencil mail questionnaires*: Though still in limited use, these are rarely employed in contemporary research except in populations with limited access to technology. Their use is now largely restricted to certain demographic or rural studies. - *Touch-tone data entry (TDE) questionnaires*: This is a form of telephone survey where respondents would use the phone’s touch-tone keys to answer questions. This method has largely fallen out of favor as more advanced technology (such as web surveys) has become available. ## Properties of Modes of Data Collection Each of these modes can be characterized on a number of key features: - Channels of communication and medium of interaction - Degree of researcher interaction with respondents - Degree of privacy %% Complexity and nature of the questions/instrumentation: The ability of the mode to accommodate complex survey instruments, such as those involving visual aids or interactive components. %% ### Channels of communication *Channels of communication* refers to the various sensory modalities that humans use to obtain information from the external world. All sensory modalities are valid channels, but the focus is primarily on the *visual* and *aural* modalities in survey research. There are various considerations involved with visual/aural differences, such as the primacy effect and recency effect. - With a *primacy effect*, presenting an option first (or at least near the beginning of the list of possible response options) increases the chances that respondents will choose that option. - With a *recency effect*, the opposite happens; putting an option at or near the end of the list increases its popularity. - General rule: Primacy effects are more prevalent in visual modes of data collection, whereas recency effects are more common in auditory modes. This is closely tied to the medium of interaction—i.e., how researchers communicate the prompts and questions to the respondents, and in how the respondents communicate their answers back to the researchers. - Web and mail surveys are typically visual where the respondent reads the questions and writes or marks the answers on a form. - Researcher-administered surveys and telephone self-administered surveys are principally aural (the interviewer reads the questions out loud and the respondent answers in like fashion). - Any particular medium of interaction will also have specific technological requirements. - Further, methodological research on both mail and web surveys has also shown that the visual layout of the question and answer elements on the page or screen affects the answers provided. Thus, principles of visual design and visual communication are germane to understanding how measurement error can be reduced in self-administered instruments. - Of course, it is also possible that a study use a mix of different mediums of interaction. ### Degree of researcher interaction with respondents The degree of researcher interaction with respondents centers on whether the data-collection procedures involve *self-administration* (e.g., web questionnaires) or *researcher administration* (e.g., face-to-face interviews). - Some survey designs involve direct face-to-face interactions between an interviewer, on the one hand, and the respondent or informant, on the other. For example, in a face-to-face interview, the interviewer reads the survey questions to the respondent. - Other survey designs (e.g., a web survey) involve no interviewers at all. - There are modes that require an intermediate level of interviewer involvement such as self-administered questionnaires (SAQs) completed as part of an interviewer-administered survey, with the interviewer present during completion. - Interviewer presence does not necessarily imply physical presence—e.g., telephone and video conferencing (e.g., Zoom). - There is also the issue of who will actually record participant responses in the data collection form: - Form completed by the researcher (interviewer/observer) - Form completed by the participants (self-administered) ### Degree of privacy Naturally, the degree of researcher interaction leads to issues regarding privacy. There are also important issues involving the anonymity/confidentiality of participants (or a lack thereof). Each type of mode has implications for the extent to which the identities of the respondent and concealed or revealed (e.g., anonymous online surveys vs. in-person interviews). The particular mode of data collection dictates in part the possible settings data collection can be conducted, and these settings can differ in how much privacy they offer the respondent while completing the survey. - Public vs. private settings - Anonymity vs. confidentiality The presence of an interviewer implies that at least that person will be aware that the respondent was asked a question. With oral response by the respondent, the interviewer knows the answer. - In a public setting, it is always a possibility that others may overhear the questions or see the answers provided by the respondents. - Both the presence of the interviewer and the presence of other people may affect respondents' behaviors, such as... - Willingness to participate - Willingness to answer certain types of questions - Veracity of their responses ### Other aspects There are also other specific aspects of and considerations for the modes of data collection^[Adapted from Cohen et al., 2018, Ch. 17]: - *Time and logistical factors*: Different modes will naturally have differing time requirements for properly collecting responses. - *Cost of implementation*: Each type of mode also has various implications related to financial and material resources required (e.g., face-to-face interviews tend to be more expensive). - The *mode of distribution* refers to the method of delivery: - Online - Email - Postal - Telephone - Face-to-face (personal-visit) interview ## Differences in Mode Use Between Quantitative and Qualitative Research Quantitative research tends to employ modes of data collection that are better equipped to deal with issues related to response rates and nonresponse bias. In quantitative studies, modes like web surveys, telephone questionnaires, and self-administered paper surveys are common. These modes emphasize structured, standardized data collection to ensure uniform responses that can be statistically analyzed. The mode in qualitative research is typically selected based on the need for deep engagement with the respondent. Qualitative research often requires more interactive modes, such as face-to-face or telephone interviews, to allow for in-depth probing, clarification, and open-ended responses. Moreover, modes like focus groups (which require in-person or video interaction) are widely used, facilitating dynamic discussion and rich qualitative data. # Cognitive Aspects of Survey Methodology (CASM) Model The *cognitive aspects of survey methodology* (otherwise known as the *CASM model*) refers to a framework for understanding how respondents process and answer survey questions. It was originally developed in the 1980s as part of efforts to improve the quality and reliability of survey data by applying principles from cognitive psychology to survey methodology. The model breaks down the response process into distinct stages that a respondent goes through when answering a question, emphasizing the cognitive processes involved. The CASM model originated from interdisciplinary efforts that combined insights from cognitive psychology and survey research to better understand how respondents process survey questions. The following references represent the seminal contributions to the CASM model, reflecting the intersection of cognitive psychology and survey methodology research. > [!note] Original sources for CASM **Jabine, T. B., Straf, M. L., Tanur, J. M., & Tourangeau, R. (Eds.) (1984). *Cognitive aspects of survey methodology: Building a bridge between disciplines.* National Academy Press.** > - This edited volume presents the results of a landmark workshop on the cognitive aspects of survey methodology, bringing together survey methodologists and cognitive psychologists to explore how cognitive processes affect survey responses. > - Of particular interest is the chapter, "Cognitive Sciences and Survey Methods" (Tourangeau, 1984, pp. 73-100). In this chapter, Tourangeau outlines the theoretical underpinnings of the CASM model, discussing how cognitive psychology can be applied to understand the survey response process. > > **Tourangeau, R. (1987). Attitude measurement: A cognitive perspective. In H. Hippler, N. Schwarz, & S. Sudman (Eds.), Social Information Processing and Survey Methodology (pp. 141–164). Springer-Verlag.** > - This work further develops the CASM framework, applying cognitive principles to the specific domain of attitude measurement in surveys. The Cognitive Aspects of Survey Methodology (CASM) model was developed to understand the mental processes respondents engage in while answering survey questions (both open-ended and closed-ended). This model emphasizes that responses are influenced by complex cognitive and social processes, which can impact both the accuracy and the validity of survey data. The CASM model breaks down the response process into four major cognitive components: comprehension, retrieval, judgment, and response. Each stage involves unique cognitive demands, and together, they provide a framework for understanding how respondents interpret, recall, and report information. **Figure 1** *The Cognitive Aspects of Survey Methodology (CASM Model)* ![[__/casm_model.svg]] ## Comprehension In the comprehension stage, respondents interpret the question, understanding its wording, structure, and intent. Comprehension requires respondents to decode vocabulary, parse sentence structure, and apply context to interpret meaning. This stage is crucial because misunderstanding a question’s intent can lead to inaccurate answers. Key Factors in Comprehension: - Vocabulary and Language: Respondents interpret words based on their familiarity with the language used, which can vary by demographic factors such as age, education, and cultural background. - Question Clarity and Structure: Ambiguities or complex sentence structures can confuse respondents, leading to misinterpretation. - Contextual Inferences: Respondents might interpret questions based on what they perceive the survey’s purpose to be, which can influence how they understand and answer. ## Retrieval Once the question is understood, respondents move to the retrieval stage, where they attempt to recall relevant information from memory. This process depends on the respondent’s memory accuracy, recency of experiences, and the cognitive effort required to access relevant details. Key Factors in Retrieval: - Memory Accessibility: Recent or salient events are more easily recalled, while less memorable information may be overlooked or misremembered. - Search Strategy: Respondents employ different cognitive strategies to retrieve information, which may vary depending on the question type (e.g., episodic memories for specific events or semantic memories for general knowledge). - Effort and Motivation: Some respondents may be motivated to exert cognitive effort to retrieve detailed information, while others may provide an answer based on partial recall or guesses. ## Judgment In the judgment stage, respondents evaluate the information they have retrieved, processing it to form a judgment or opinion that will answer the question. This stage often involves weighing options, considering accuracy, and possibly adjusting the response based on social desirability or perceived survey expectations. Key Factors in Judgment: - Information Evaluation: Respondents judge the accuracy and relevance of retrieved information to fit the question context. - Heuristics and Biases: Respondents may rely on mental shortcuts or heuristics (such as rounding estimates) to answer questions more efficiently. - Social Desirability and Sensitivity: For questions on sensitive topics, respondents may adjust their answers to align with socially acceptable norms, which can introduce bias. ## Response In the response stage, respondents select or formulate their final answer based on their judgment. This includes deciding how to translate their judgment into the response format provided, such as a scale, category, or open-ended response. Errors at this stage can occur if the response format is ambiguous or if respondents feel uncertain about their answer’s accuracy. Key Factors in Response: - Response Mapping: Respondents must map their judgment onto the response options available, which can be challenging if the options do not adequately reflect their thoughts or experiences. - Editing and Final Adjustments: Some respondents may edit or adjust their answers based on additional self-evaluation, either to fit perceived survey expectations or to align with a socially desirable response. - Consistency Across Responses: Respondents may try to ensure consistency in their answers throughout the survey, which can lead to over- or underreporting of certain behaviors or attitudes. ## Interactions Among the Components These four stages are interconnected, and challenges in one stage can influence the others. For example: - Miscomprehension in the first stage can affect retrieval, as respondents might recall information that is irrelevant to the intended question. This can lead to inaccuracies in judgment and, ultimately, a distorted response. - Memory limitations in the retrieval stage may lead respondents to rely on heuristics during judgment, simplifying their response to compensate for incomplete recall. - Social desirability in the judgment stage may prompt respondents to edit or alter their final response to fit perceived social norms, impacting data quality. By examining these interactions, the CASM model helps researchers identify sources of error and improve survey design, such as by clarifying wording, providing clear response options, or developing questions that facilitate memory recall. %% # Other Models of the Response Process %% %% See Groves at al. (2009), pp. 223-224. %%