Data Preparation and Analysis Preparing Data

Once all of the participants have completed the study measures and all of the data has been collected, the researcher must prepare the data to be analyzed.  Organizing the data correctly can save a lot of time and prevent mistakes.  Most researchers choose to use a database or statistical analysis program (Microsoft Excel, SPSS) that they can format to fit their needs in order to organize their data effectively.  A good researcher enters all of the data in the same format and in the same database, as doing otherwise might lead to confusion and difficulty with the statistical analysis later on.  Once the data has been entered, it is crucial that the researcher check the data for accuracy.  This can be accomplished by spot-checking a random assortment of participant data groups, but this method is not as effective as re-entering the data a second time and searching for discrepancies.  This method is particularly easy to do when using numerical data because the researcher can simply use the database program to sum the columns of the spreadsheet and then look for differences in the totals.  Perhaps the best method of accuracy checking is to use a specialized computer program that cross-checks double-entered data for discrepancies (as this method is free from error), though these programs can be hard to come by and may require extra training to use correctly.



Self-report measures, whether in survey form or in interview form, are easily affected by several biases that the participant may exhibit, and for this reason must be designed carefully by the researcher.  The researcher also must be cautious about what kinds of conclusions are drawn from self-report measures.  Some potential issues with self-report are:

  • Social desirability bias: Participants are usually uncomfortable or unwilling to share information that does not reflect well on them in their social environment, even if they know their responses are entirely anonymous.  For example, participants may understate or overstate the extent to which they experience a certain feeling, depending on how socially appropriate or desirable that feeling.  Researchers must do their best to make it abundantly clear that anonymity will be preserved for the participant, and honesty must be encouraged. Researchers should also lead with less-intimidating questions to make the participant feel more comfortable before asking anything that might be more difficult to answer honestly.  Another option is to structure the question in such a way as to normalize the behavior: “As you know, many people do X… To what extent do you do X?”
  • Self-evaluation biases: Participants will sometimes bend their answers on self-report measures to better reflect how they “think they should be” rather than how they actually are.  This is similar to the social desirability bias, but is more difficult to overcome because anonymity is not the issue here. Instead, bias results from the participant’s evaluation of him or herself.  The researcher’s best course of action is to encourage honesty and normalize the behavior or feeling as reviewed above.

Forgetfulness: Sometimes researchers ask participants about their past experiences or feelings without considering the fact that human memory is very plastic.  People’s recollections may be inaccurate, and it is important for a researcher to consider this when designing study measures. Self-report measures do play an important role in research, although they should be used with caution.  They are essential in situations where the researcher is asking about a participant’s self-concept or seeking to study the specificities of a participant’s experience.  Self-report is also very useful for logistical reasons, as it is often the simplest of methods to implement and requires the least resources.


Study Design Measures

Study measures should:

  • Take into account the characteristics of the participant.
  • Use informal language that someone with no experience in the field can understand.
  • Be respectful of the cultural context in which the participant has shaped his or her worldview.
  • Provide neither too much nor too little information.  Too much information can be an unnecessary distraction, while too little information leads to ambiguity and potential misinterpretation of the study measure.
  • Be brief and specific.
  • Avoid negations, as they can lead to mistakes and can be difficult to understand.
  • Avoid double-barrelled questions (questions that ask two questions in one, such as “Do you support the government’s decision to cut spending to police training and after school programs?”).  If a participant would answer “yes” to one part but “no” to another, requesting an answer of “yes or no” to the pair as a whole invalidates the measure.

Use multiple questions to assess the same construct.  For example, simply asking a person if they feel “good” about themselves as a measure of self-esteem is not enough.  Instead, asking that person several questions about body image, self-worth, and self-evaluation can help paint a better picture of how they are really feeling.


Research Papers: Abstract

Abstract is what any research paper writing starts with. But there is a trick – it is better to write the abstract last, after you have completed the entire work. The reason for this is simple. Abstract is a brief, usually 200 words, summary of the research paper writing. It tells about the purpose of your research paper, experiment and the results of your research and, therefore, summarizes the work done. As with any conclusion, it is easier to summarize after the paper is written, isn’t it? As a result, your abstract has to be a single paragraph, written in a brief manner, and separate from other parts of the research paper writing.


Study Measures

Measures are the items in a research study to which the participant responds. They can be survey questions, interview questions, or constructed situations, to name a few. When constructing interviews and surveys, it is important that the questions directly relate to the research questions. Furthermore, it is important that the surveys and interviews are not extremely time-consuming (ideally within a 20-30 minute limit). If an interpreter will be used, simple questions are always better and easily interpreted questions that avoid ambiguity will lead to more accurate results. Lastly, instead of creating a survey it is better to do research to find out if a similar study has been conducted. If so, previous surveys should be used to yield standardized measures for comparison. Irrespective of the form that these measures take, there are several important design elements that are required to make the study effective.



Before we begin discussing the specifics of data analysis, let’s pause for a moment to discuss the validity of a study and what it means.  For the rest of this short course in research methods, we will be stopping to enumerate the various threats to validity that exist at each stage of the research process.  We have already seen one: failure to properly operationalize the variables of interest can lead to the researcher drawing inappropriate conclusions about the research question.  If, for example, the researcher had chosen to operationalize “economically productive” as “amount of money a person has in his or her savings,” the researcher would have observed an entirely different result.  It is possible for people to have other sources of income (gifts, spouses, inheritances, etc.) that may affect this variable, meaning that it is not a good measure of what is intended to be measure and therefore is not a good example of operationalization.  But what exactly is validity?

Generally speaking, validity refers to whether or not a study is well designed and provides results that are appropriate to generalize to the population of interest.   There is a lot more to validity that we will further discuss in this course, and Trochim’s “Research Methods Knowledge Base” provides a succinct and useful summary of each of the kinds of validity. There are three types of validity with which a researcher should be concerned.


Ensuring Validity Confounding Variables

A confounding variable is an extraneous variable that is statistically related to (or correlated with) the independent variable.  This means that as the independent variable changes, the confounding variable changes along with it.  Failing to take a confounding variable into account can lead to a false conclusion that the dependent variables are in a causal relationship with the independent variable.  Take, for example, a study that seeks to investigate the relationship between income levels and test scores.  Without controlling for other variables, the study finds that higher income correlates with better test scores and concludes that the two must be directly related. This is a flawed conclusion because there are many lurking confounding variables that may influence this supposedly clear-cut relationship. For example, perhaps individuals at one school received better education than those at another school.  Without controlling for the confounding variables of education level and quality of education, the relationship between income level and test scores cannot be assumed.


Sampling Challenges

Because researchers can seldom study the entire population, they must choose a subset of the population, which can result in several types of error.  Sometimes, there are discrepancies between the sample and the population on a certain parameter that are due to random differences.  This is known as sampling error and can occur through no fault of the researcher.

Far more problematic is systematic error, which refers to a difference between the sample and the population that is due to a systematic difference between the two rather than random chance alone.  The response rate problem refers to the fact that the sample can become self-selecting, and that there may be something about people who choose to participate in the study that affects one of the variables of interest.  It is very possible in this situation that the people who actively seek help happen to be more proactive than those who do not.  Because these two groups vary systematically on an attribute that is not the dependent variable (economic productivity), it is very possible that it is this difference in personality trait and not the independent variable (if they received corrective lenses or not) that produces any effects that the researcher observes on the dependent variable.  This would be considered a failure in internal validity.

Another type of systematic sampling error is coverage error, which refers to the fact that sometimes researchers mistakenly restrict their sampling frame to a subset of the population of interest.

First and foremost, a researcher must think very carefully about the population that will be included in the study and how to sample that population.Errors in sampling can often be avoided by good planning and careful consideration.However, in order to improve a sampling frame, a researcher can always seek more participants.In the case of the response rate problem, the researcher can actively work on increasing the response rate, or can try to determine if there is in fact a difference between those who partake in the study and those who do not.The most important thing for a researcher to remember is to eliminate any and all variables that the researcher cannot control.While this is nearly impossible in field research, the closer a researcher comes to isolating the variable of interest, the better the results.


Sampling Methods

Probability Sampling refers to sampling when the chance of any given individual being selected is known and these individuals are sampled independently of each other.  This is also known as random sampling.  A researcher can simply use a random number generator to choose participants (known as simple random sampling), or every nth individual (known as systematic sampling) can be included.  Researchers also may break their target population into strata, and then apply these techniques within each strata to draw conclusions.

Non-Probability Sampling, or convenience sampling, refers to when researchers take whatever individuals happen to be easiest to access as participants in a study.  This is only done when the processes the researchers are testing are assumed to be so basic and universal that they can be generalized beyond such a narrow sample.



Once the researcher has chosen a hypothesis to test in a study, the next step is to select a pool of participants to be in that study.  However, any research project must be able to extend the implications of the findings beyond the participants who actually participated in the study.  For obvious reasons, it is nearly impossible for a researcher to study every person in the population of interest. In the example that we have been using thus far, the population of interest is “the developing world.” The researcher must therefore make a decision to limit the research to a subset of that population, and this has important implications for the applicability of study results. The researcher must put some careful forethought into exactly how and why a certain group of individuals will be studied.