eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

Name: Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences
Author: John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Partager le livre

474 pages
English
ePUB (adapté aux mobiles)
Disponible sur iOS et Android

eBook - ePub

Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences

John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard

Détails du livre

Aperçu du livre

Table des matières

Citations

À propos de ce livre

This book reviews the latest techniques in exploratory data mining (EDM) for the analysis of data in the social and behavioral sciences to help researchers assess the predictive value of different combinations of variables in large data sets. Methodological findings and conceptual models that explain reliable EDM techniques for predicting and understanding various risk mechanisms are integrated throughout. Numerous examples illustrate the use of these techniques in practice. Contributors provide insight through hands-on experiences with their own use of EDM techniques in various settings. Readers are also introduced to the most popular EDM software programs. A related website at http://mephisto.unige.ch/pub/edm-book-supplement/ offers color versions of the book's figures, asupplemental paper to chapter 3, and R commands for some chapters.

The results of EDM analyses can be perilous – they are often taken as predictions with little regard for cross-validating the results. This carelessness can be catastrophic in terms of money lost or patients misdiagnosed. This book addresses these concerns and advocates for the development of checks and balances for EDM analyses. Both the promises and the perils of EDM are addressed.

Editors McArdle and Ritschard taught the "Exploratory Data Mining" Advanced Training Institute of the American Psychological Association (APA). All contributors are top researchers from the US and Europe. Organized into two parts--methodology and applications, the techniques covered include decision, regression, and SEM tree models, growth mixture modeling, and time based categorical sequential analysis. Some of the applications of EDM (and the corresponding data) explored include:

selection to college based on risky prior academic profiles

the decline of cognitive abilities in older persons

global perceptions of stress in adulthood

predicting mortality from demographics and cognitive abilities

risk factors during pregnancy and the impact on neonatal development

Intended as a reference for researchers, methodologists, and advanced students in the social and behavioral sciences including psychology, sociology, business, econometrics, and medicine, interested in learning to apply the latest exploratory data mining techniques. Prerequisites include a basic class in statistics.

Foire aux questions

Comment puis-je résilier mon abonnement ?

Il vous suffit de vous rendre dans la section compte dans paramètres et de cliquer sur « Résilier l’abonnement ». C’est aussi simple que cela ! Une fois que vous aurez résilié votre abonnement, il restera actif pour le reste de la période pour laquelle vous avez payé. Découvrez-en plus ici.

Puis-je / comment puis-je télécharger des livres ?

Pour le moment, tous nos livres en format ePub adaptés aux mobiles peuvent être téléchargés via l’application. La plupart de nos PDF sont également disponibles en téléchargement et les autres seront téléchargeables très prochainement. Découvrez-en plus ici.

Quelle est la différence entre les formules tarifaires ?

Les deux abonnements vous donnent un accès complet à la bibliothèque et à toutes les fonctionnalités de Perlego. Les seules différences sont les tarifs ainsi que la période d’abonnement : avec l’abonnement annuel, vous économiserez environ 30 % par rapport à 12 mois d’abonnement mensuel.

Qu’est-ce que Perlego ?

Nous sommes un service d’abonnement à des ouvrages universitaires en ligne, où vous pouvez accéder à toute une bibliothèque pour un prix inférieur à celui d’un seul livre par mois. Avec plus d’un million de livres sur plus de 1 000 sujets, nous avons ce qu’il vous faut ! Découvrez-en plus ici.

Prenez-vous en charge la synthèse vocale ?

Recherchez le symbole Écouter sur votre prochain livre pour voir si vous pouvez l’écouter. L’outil Écouter lit le texte à haute voix pour vous, en surlignant le passage qui est en cours de lecture. Vous pouvez le mettre sur pause, l’accélérer ou le ralentir. Découvrez-en plus ici.

Est-ce que Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences est un PDF/ePUB en ligne ?

Oui, vous pouvez accéder à Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences par John J. McArdle, Gilbert Ritschard, John J. McArdle, Gilbert Ritschard en format PDF et/ou ePUB ainsi qu’à d’autres livres populaires dans Psychology et Research & Methodology in Psychology. Nous disposons de plus d’un million d’ouvrages à découvrir dans notre catalogue.

Informations

Éditeur

Routledge

Année

2013

ISBN

9781135044084

Édition

Sujet

Psychology

Sous-sujet

Research & Methodology in Psychology

Part I

Methodological Aspects

1	Exploratory Data Mining Using Decision Trees in the Behavioral Sciences
	John J. McArdle

Introduction

This first chapter starts off with a discussion of confirmatory versus exploratory analyses in behavioral research, and exploratory approaches are considered most useful. Decision Tree Analysis (DTA) is defined in historical and technical detail. Four real-life examples are presented to give a flavor of what is now possible with DTA: (1) Predicting Coronary Heart Disease from Age; (2) Some New Approaches to the Classification of Alzheimer’s Disease; (3) Exploring Predictors of College Academic Performances from High School; and (4) Exploring Patterns of Changes in Longitudinal WISC Data. In each case, current questions regarding DTA are raised. The discussion that follows considers the benefits and limitations of this exploratory approach, and the author concludes that confirmatory analyses should be always be done first, but this should at all times be followed by exploratory analyses.

The term “exploratory” is considered by many as less than an approach to data analysis and more a confession of guilt—a dishonest act has been performed with one’s data. This becomes obvious when we reflexively recoil at the thought of exploratory methods, or when immediate rejections occur when one proposes research exploration in a research grant application, or when one tries to publish new results found by exploration. We need to face up to the fact that we now have a clear preference for confirmatory and a priori testing of well-formulated research hypotheses in psychological research. One radical interpretation of this explicit preference is that we simply do not yet trust one another.

Unfortunately, as many researchers know, quite the opposite is actually the truth. That is, it can be said that exploratory analyses predominate in our actual research activities. To be more extreme, we can assert there is actually no such thing as a true confirmatory analysis of data, nor should there be. Either way, we can try to be clearer about this problem. We need better responses when well-meaning students and colleagues ask, “Is it OK to do procedure X?” I assume they are asking, “Is there a well-known probability basis for procedure X, and will I be able to publish it?” Fear of rejection is strong among many good researchers, and one side effect is that rejection leaves scientific creativity only to the bold. As I will imply several times here, the only real requirement for a useful data analysis is that we remain honest (see McArdle, 2010).

When I was searching around for materials on this topic I stumbled upon the informative work by Berk (2009) where he starts out by saying:

As I was writing my recent book on regression analysis (Berk, 2003), I was struck by how few alternatives to conventional regression there were. In the social sciences, for example, one either did casual modeling econometric style, or largely gave up quantitative work … The life sciences did not seem quite as driven by causal modeling, but causal modeling was a popular tool. As I argued at length in my book, causal modeling as commonly undertaken is a loser.

There also seemed to be a more general problem. Across a range of scientific disciplines there was often too little interest in statistical tools emphasizing induction and description. With the primary goal of getting the “right” model and its associated p-values, the older and more interesting tradition of exploratory data analysis had largely become an under-the-table activity: the approach was in fact commonly used, but rarely discussed in polite company. How could one be a real scientist, guided by “theory” and engaged in deductive model testing, while at the same time snooping around in the data to determine which models to test? In the battle for prestige, model testing had won.

At the same time, I became aware of some new developments in applied mathematics, computer sciences, and statistics making data exploration a virtue. And with this virtue came a variety of new ideas and concepts, coupled with the very latest in statistical computing. These new approaches, variously identified as “data mining,” “statistical learning,” “machine learning,” and other names, were being tried in a number of natural and biomedical sciences, and the initial experience looked promising.

As I started to read more deeply, however, I was stuck by how difficult it was to work across writings from such disparate disciplines. Even when the material was essentially the same, it was very difficult to tell if it was. Each discipline brought it own goals, concepts, naming conventions, and (maybe worst of all) notation to the table . Finally, there is the matter of tone. The past several decades have seen the development of a dizzying array of new statistical procedures, sometimes introduced with the hype of a big-budget movie. Advertising from major statistical software providers has typically made things worse. Although there have been genuine and useful advances, none of the techniques have ever lived up to their original billing. Widespread misuse has further increased the gap between promised performance and actual performance. In this book, the tone will be cautious, some might even say dark …

(p. xi)

The problems raised by Berk (2009) are pervasive and we need new ways to overcome them. In my own view, the traditional use of the simple independent groups t-test should have provided our first warning message that something was wrong about the standard “confirmatory” mantras. For example, we know it is fine to calculate the classic test of the mean difference between two groups and calculate the “probability of equality” or “significance of the mean difference” under the typical assumptions (i.e., random sampling of persons, random assignment to groups, equal variance within cells). But we also know it is not appropriate to achieve significance by: (a) using another variable when the first variable fails to please, (b) getting data on more people until the observed difference is significant, (c) using various transformations of the data until we achieve significance, (d) tossing out outliers until we achieve significance, (e) examining possible differences in the variance instead of the means when we do not get what we want, (f) accepting a significant difference in the opposite direction to that we originally thought. I assume all good researchers do these kinds of things all the time. In my view, the problem is not with us but with the way we are taught to revere the apparent objectivity of the t-test approach. It is bound to be even more complex when we use this t-test procedure over and over again in hopes of isolating multivariate relationships.

For similar reasons, the one-way analysis of variance (ANOVA) should have been our next warning sign about the overall statistical dilemma. When we have three or more groups and perform a one-way ANOVA we can consider the resulting F-ratio as an indicator of “any group difference.” In practice, we can calculate ...