Try out PMC Labs and tell us what you think. Learn More. Whether spanking is helpful or harmful to children continues to be the source of considerable debate among both researchers and the public. This article addresses 2 persistent issues, namely whether effect sizes for spanking are distinct from those for physical abuse, and whether effect sizes for spanking are robust to study de differences. Meta-analyses focused specifically on spanking were conducted on a total of unique effect sizes representingchildren.

Thirteen of 17 mean effect sizes were ificantly different from zero and all indicated a link between spanking and increased risk for detrimental child outcomes. Effect sizes did not substantially differ between spanking and physical abuse or by study de characteristics. The question of whether parents should spank their children to correct misbehaviors sits at a nexus of arguments from ethical, religious, and human rights Little Horwood women spankers both in the U.

What has been learned from these hundreds of studies? These competing conclusions have left both social science researchers and the public at large confused about what outcomes can and cannot be attributed to spanking. As this body of work on spanking and physical punishment has accumulated, several nagging questions about the quality, consistency, and generalizability of the research have persisted. The goal of the current article is to address these two concerns with a new set of meta-analyses using the most recent research studies to date.

The majority of the studies discussed in our literature review use the term physical punishment which we define as noninjurious, open-handed hitting with the intention of modifying child behavior. In our meta-analyses, however, we focused on the most common form of physical punishment which is known in the U. The first and most widely cited of the meta-analyses was by Gershoff All 11 meta-analyses were ificant and all but one indicated an undesirable association.

Higher scores on any of these outcome measures indicated negative outcomes. The conclusion afforded by these meta-analyses is that physical punishment was associated ificantly, albeit modestly, with more affective, cognitive, and behavioral problems in children, broadly defined. Using 26 studies, separate meta-analyses were conducted by comparison group rather than by outcome type.

When these physical punishment were compared with other forms of discipline, conditional spanking was found to be associated with lower levels of noncompliance and antisocial behavior than disciplinary alternatives.

The authors concluded that, in general, physical punishment was no worse than other disciplinary techniques. This is of course also to say that physical punishment was no better than other disciplinary techniques in promoting beneficial outcomes for children. The fourth meta-analysis article by Ferguson focused solely on longitudinal studies and on the outcomes of externalizing behavior problems, internalizing behavior problems, and cognitive performance.

The main criticism of the Gershoff meta-analysis has been that it included harsh and potentially injurious behaviors, such as hitting children with objects, in its definition of physical punishment Baumrind et al. Baumrind, Larzelere, and Cowan reanalyzed the data from Gershoffseparating out what they deemed harsh or potentially abusive forms of physical punishment.

They concluded that only severe methods of physical punishment are harmful. However, both effect sizes are ificant and positive, indicating that both are associated with more undesirable child outcomes. To help resolve this debate, our first research question was thus, are past findings that physical punishment is associated with detrimental child outcomes driven by the inclusion of harsh or abusive methods, or is spanking on its own associated with these detrimental outcomes?

We addressed this question using two strategies.

This definition therefore excluded the use of objects, the use of methods that have a reasonable expectation of causing harm or injury e. By restricting our operationalization of physical punishment in this way, we were able to determine the extent to which ordinary spanking is linked with child outcomes. Our second strategy was to examine the ways in which the strength and direction of the associations between spanking and child outcomes compare with the strength and direction of the associations between clearly abusive methods and child outcomes.

We identified studies that assessed the same individuals for exposure to both ordinary spanking and to harsher methods in order to isolate the associations of one from the other.

A comparison of studies of spanking to studies of abuse would not be helpful in this regard, because there could be many selection factors that distinguish the individuals reporting spanking from those reporting harsher methods. Some have argued that parents who use harsh or abusive methods are fundamentally different from parents who use only spanking Baumrind et al.

By focusing on studies that assessed the extent to which individuals experience both spanking and abuse, we compared the unique association of spanking with child outcomes to the unique association of abusive behaviors with child outcomes for the same samples of children. The primary standard for determining causal relations among variables has been the randomized controlled experiment because potentially confounding selection factors that might distinguish naturally occurring groups e.

There also have been a few efforts to evaluate the effects of interventions deed to reduce spanking e. The circumstances of experimentally manipulated spanking thus are likely to be unusual, leading to concern that experiments with parental spanking may suffer from a lack of external validity. The next strongest approach to studying spanking are studies which examine whether it predicts changes in child outcomes over time. Such prospective longitudinal des meet one of the key criteria for establishing causality, namely temporal precedence of the spanking independent variable Shadish et al.

Longitudinal effect sizes of the bivariate links between spanking and later child outcomes do not rule out the potential for elicitation effect; however, so few studies report a coefficient that controls only for initial child behavior and not for a range of other covariates that we are unable to meta-analyze them. Thus, while not a perfect solution, longitudinal bivariate coefficients are decidedly stronger methodologically than within-time coefficients. Our second research question was thus: Are associations between spanking and child outcomes only found in methodologically weak studies?

In order to address this question, we conducted moderator analyses that examined whether the direction and ificance of the mean weighted effect sizes were similar across longitudinal, experimental, and cross-sectional studies. We also examined whether effect sizes varied according to five other dimensions of study de: measure of spanking, time period in which spanking was administered, index of spanking, whether the study assessed the associations of spanking with outcomes within a single group, or employed comparisons between two or more groups, and independence of raters of spanking and outcome.

Using these dimensions of study quality as moderators allowed us to examine whether spanking is Little Horwood women spankers associated with child outcomes in some types of studies and not others, a finding which would undermine the generalizability of spanking research. Given the pervasive use of spanking around the world, and in light of concerns raised about spanking by professional organizations American Academy of Pediatrics, and intergovernmental and human rights organizations Committee on the Rights of the Child,there is a need for definitive conclusions Little Horwood women spankers the potential consequences of spanking for children.

The purpose of the current study was to conduct a new set of meta-analyses to address the two unresolved debates described above and to do so while incorporating an additional 13 years of literature since the first meta-analysis was published Gershoff, The studies for the present meta-analyses were identified from two main sources.

The primary source for studies was a comprehensive literature review of articles listed in four academic abstracting databases ERIC, Medline, PsycInfo, and Sociological Abstracts that had been published before June 1, These two methods yielded a total of 1, unique articles to be considered for inclusion in the current meta-analyses. Coding of studies involved a two-step process. In the initial step, the titles, abstracts, or full text of the 1, studies identified through the sources above were subjected to an initial screening.

Studies were excluded at this stage if they were not relevant to or usable in the meta-analyses; examples of studies excluded at this stage were literature reviews, studies of beliefs about rather than use of spanking, and studies that were not available in English.

This initial screening process eliminated 1, studies and retained potential studies. In the second step of coding, each of these potential studies was coded independently by each of the authors. Any disagreements in coding were resolved through follow-up discussion.

The reasons for exclusion of all 1, studies are listed the Appendix. Only 75 studies met all four criteria and were retained for the meta-analyses. All of the unique studies used in the four ly published meta-analyses were considered for inclusion, but only 36 met all of our criteria.

Of the 88 studies in Gershoff23 were included in the present study. Paolucci and Violato analyzed 70 studies; 16 were included here. Of the 26 studies in Larzelere and Kuhn11 were included. Ferguson analyzed 45 studies; of these, 11 were included Little Horwood women spankers the current meta-analyses. Reasons for study exclusion are available from the first author. All study-level effect sizes were calculated independently by each of the authors; for all effect sizes, agreement was achieved to at least the third decimal place.

When discrepancies occurred in effect size calculations, the discrepancy was discussed, and then each author independently recalculated the effect size. This process was repeated, if necessary, until consensus was achieved. Because meta-analyses are focused on simple effects, only bivariate comparisons or correlations can be used Borenstein et al. When both longitudinal and cross-sectional were available, the appropriate longitudinal effect sizes were use in the meta-analyses in order to obtain the most methodologically robust effect size.

If a study reported multiple effect sizes for the same outcome, such as when bivariate associations were reported for subgroups but not the whole sample, the weighted average of these subgroup effect sizes was used as the effect size for that study for that outcome. We allowed studies that reported effect sizes for more than one of our target outcomes to contribute to each appropriate meta-analyses; however, each study or dataset, in the case of multiple articles from one dataset was permitted to contribute only one effect size to each analysis for a specific outcome, so that a single individual was only counted once in any given meta-analysis for a specific outcome.

Seven study characteristics were coded for each study to be used in moderator analyses: a study de experimental, longitudinal, cross-sectional, or retrospective ; b measure of spanking observation, parent report, child report, child retrospective, or both parent and child reports ; c index of spanking when used [either observed or in an experiment], frequency, frequency and severity, ever in time period, or ever in life ; d independence of the raters of spanking and the child outcome same rater or different raters ; e time period in which spanking was administered observed, last week, last month, last Little Horwood women spankers, ever, hypothetical, specific time period, or not specified ; f the country in which the study was conducted U.

The authors independently coded these characteristics for each study. Any discrepancies were resolved through discussion.

The meta-analyses reported in this paper utilized the random effects model Borenstein et al. The random effects model for meta-analysis does not assume that there is a single underlying effect size of the studies being analyzed and rather allows effect sizes to differ across studies to for the fact that study samples differ by characteristics such as age, gender, race, ethnicity and nationality.

The random effects model calculates the mean effect size, an estimate of statistical ificance, and a measure of the heterogeneity of effect sizes in terms of their variation around the estimated mean effect size. We conducted a separate meta-analysis for each child outcome as well as Little Horwood women spankers overall meta-analysis for all of the studies together. A total of unique effect sizes were derived from data representingchild measurement occasions; these studies included data from a total ofunique children.

The study-level effect sizes, confidence intervals, and sample sizes are listed in Table 1. For between-subjects des, the subsample sizes for the subgroup that were spanked and the subgroup that was not spanked are presented, whereas for within-subjects des a single sample size is presented.

As a means of graphically representing the effect sizes, this table also includes bar graphs of the effect sizes and their corresponding confidence intervals both for the individual studies and for the random effects mean effect size for each outcome category. For the purposes of comparison and aggregation across meta-analyses, all of the study-level effect sizes were coded so that larger positive values corresponded to more detrimental child outcomes. This meant that for studies in which the outcome variable was a beneficial outcome e.

As the effect sizes and bar graphs in Table 1 indicate, the findings across studies were highly consistent. Of the individual effect sizes, were in the direction of a detrimental outcome with 78 of these statistically ificant.

Table 2 summarizes the mean weighted effect sizes and confidence intervals for each outcome along with a Z test for ificant difference from zero and an I 2 statistic that estimates the amount of variation in the mean weight effect size that was attributable to underlying study heterogeneity. Spanking was ificantly associated with 13 of the 17 outcomes examined. In each case, spanking was associated with a greater likelihood of detrimental child outcomes.

In childhood, parental use of spanking was associated with low moral internalization, aggression, antisocial behavior, externalizing behavior problems, internalizing behavior problems, mental health problems, negative parent—child relationships, impaired cognitive ability, low self-esteem, and risk of physical abuse from parents. In adulthood, prior experiences of parental use of spanking were ificantly associated with adult antisocial behavior, adult mental health problems, and with positive attitudes about spanking.

The remaining four meta-analyses were not ificantly different from zero. The 13 statistically ificant mean effect sizes ranged in size from. Bolded effect sizes are ificantly different from zero. To address the concern that the findings of negative outcomes associated with spanking in past research were a result of the confounding of spanking with overly harsh or potentially abusive methods, we identified Little Horwood women spankers studies that reported bivariate associations for both spanking and physical abuse.

Each of these studies employed a within-subjects de; in each case, the same respondent either a parent or the adult child recalling the behavior reported both how often the parent used spanking and, in a separate question, how often the parent used abusive methods of discipline. Two of the studies contributed more than one effect size, yielding a total of 10 pairs of effect sizes for spanking and physical abuse. The effect sizes are presented in Table 3. In three cases, the effect size for spanking was larger than that for physical abuse.

Both were ificantly different from zero and both were positive inindicating that both spanking and physical abuse were associated with greater levels of detrimental child outcomes. We examined whether study-level effect sizes varied across seven study-level characteristics using meta-regression to calculate average effect sizes by study subgroup Borenstein et al.

The from these moderator analyses are presented in Table 4. All of the comparisons were nonificant, indicating that the effect sizes did not vary by study characteristic.

