A thesis, to be accepted, requires proof, and the thesis of the instant book—that in a modern society the sociological characteristics of the society determine the proscriptions and prescriptions of law on activities significant to the society—will thus be credible only to the extent that it is backed by empirical evidence. Although a not-insubstantial body of quantitative social science research has uncovered links between jurisdiction characteristics and jurisdiction law,1 the evidence to date leaves many questions unanswered. The studies reported in Chaps. 2, 3, 4, 5, and 6 of this book augment this evidence. In particular, the studies in Chaps. 2, 3, 4, 5, and 6 investigate whether and how sociological attributes of states are related to what state law says about the topics covered in Sects. 1.3 and 1.4 of Chap. 1 in the first volume, that is, Societal Agents in Law: A Macrosociological Approach. Because states are the cases in these studies, state-level numerical data were employed for the independent variables, and because each state was coded as either 0 or 1 on the dependent variable, relationships between the independent variables and the dependent variable were estimated with logistic regression and evaluated with certain post-estimation techniques of logistic regression.2 Consequently, the remainder of Chap. 1 is devoted to selected aspects of logistic regression and to several matters that arise when any form of regression is employed to detect relationships between variables. The subjects that are covered in the pages that follow are common to all of the studies reported in Chaps. 2, 3, 4, 5, and 6.
1.1 Probability, Odds, and the Odds Ratio
Three important statistical concepts—probability, odds, and odds ratio—are involved in logistic regression, because logistic regression estimates an odds ratio (as well as a regression coefficient) for every independent variable,3 because an odds ratio is built on odds, and because odds arise from probabilities.4 The basic concept, therefore, is probability. Precisely defined, the probability that a particular attribute is present, or that a particular event has occurred, is the number of cases (e.g., human individuals or governmental jurisdictions) in which the attribute or event is found among 100 cases. Conversely, the probability that the attribute is absent or that the event did not happen is the number of cases in which the attribute or event is not found among 100 cases. The foregoing probabilities, of course, are equal when the attribute or event is present in 50 of the 100 cases and is missing from the other 50. Because probability is measured as a proportion, each probability in such a situation is 0.5.
Odds are computed for a particular attribute or event from two probabilities: (i) the probability that the attribute exists or the event occurred, and (ii) the probability that the attribute does not exist or the event did not take place. Odds are formed by the ratio of the two probabilities. For example, the odds that the attribute is present or that the event has happened are the ratio of (i) to (ii). When the probability in (i) and the probability in (ii) are the same (i.e., 0.5), the odds are 1.000, and hence even, that the attribute is present or the event has happened. By contrast, the odds are below 1.000 to the degree that the probability in (i) is lower than the probability in (ii), and the odds are above 1.000 to the degree that the probability in (i) is higher than the probability in (ii).
In the context of logistic regression using a binary dependent variable on which every case (e.g., U.S. state) is coded either 0 or 1, an odds ratio for an independent variable is the amount of change that can be expected to occur in the odds that a case will be coded 1 rather than 0 on the dependent variable when the case rises one measurement unit or one measurement category on this independent variable and is kept stationary on the other independent variables being studied. Because an odds ratio is the numerical factor by which the odds are multiplied, it reveals how and by how much a single additional measurement unit or category of an independent variable alters the odds that a particular case will have the attribute of the dependent variable that is represented by the number 1.
Let me illustrate odds ratios using a study whose dependent variable was whether adults in the United States deemed morality to be a private matter (coded 1) or a public issue (coded 0).5 U.S. law generally excludes from regulation activities that, while believed to involve morality, are regarded as being of a private character.6 In line with this principle, the Constitution of the United States has been interpreted to safeguard “zones of privacy”7 even though it does not explicitly mention privacy. The study, which applied logistic regression to data from a nationwide sample survey, found that the sex of respondents had an odds ratio of 3.631 with the other independent variables held constant, and since women were coded 1 and men were coded 0, the odds that morality would be deemed a private subject rather than a public subject were higher for a woman than for a man by a factor of 3.631. Otherwise expressed, the odds that morality would be assigned to the private sphere were 263.1% greater for a woman than for a man.8
A point to be kept in mind is that an odds ratio is a factor that multiplies the odds. When both of two odds ratios are either above 1.000 or below 1.000, the numerical values of the two factors require no transformation and can be directly compared: The distance from 1.000 to the numerical value of a factor represents the magnitude of the impact on the dependent variable by a one-unit or one-category increase in the independent variable for which the factor was computed. A pair of odds ratios that are either higher than 1.000 or lower than 1.000, accordingly, provides easily interpretable multiplier factors. However, the interpretation of a pair of odds ratios is not straightforward when the odds ratio for one independent variable is above 1.000 and the odds ratio for the other independent variable is below 1.000. Being multipliers, a factor that is more than 1.000 and a factor that is less than 1.000 are construable only using the numerical value of one factor and the reciprocal of the numerical value of the other factor.9 The study described in the preceding paragraph furnishes an illustration. In that study, the factor for sex was 3.631 and the factor for years of formal schooling was 0.316. The former represents an effect that is marginally larger than the effect of the latter: The reciprocal of the numerical factor for sex is 1.0/3.631 = 0.275, which is slightly farther from 1.000 than 0.316 (the factor for years of formal schooling). The reciprocal of the numerical factor for years of formal schooling is 1.0/0.316 = 3.165, which is not quite as far from 1.000 as 3.631 (the factor for sex).
Effects like those in the preceding paragraphs, of course, are for a gain by an independent variable of one measurement unit or one measurement category in its empirical indicator. However, the empirical indicators for the independent variables do not have the same measurement units/categories, and with differences in measurement units/categories, the odds ratios for the independent variables in a regression model cannot be arranged in order of magnitude. To obtain a ranking of the impacts of the independent variables whose indicators employed interval scales or ratio scales (discussed later), the regression coefficients that logistic regression estimated for these variables are transformed so that each coefficient is based on the standard deviation of its indicator.10 In other words, the regression coefficient for every such independent variable is standardized using the standard deviation of its indicator, and after being ...