Statistics for Linguistics with R
eBook - ePub

Statistics for Linguistics with R

A Practical Introduction

  1. 512 pages
  2. English
  3. ePUB (mobile friendly)
  4. Available on iOS & Android
eBook - ePub

Statistics for Linguistics with R

A Practical Introduction

Book details
Book preview
Table of contents
Citations

About This Book

This is the third, newly revised and extended edition of this successful book (that has already been translated into three languages). Like the previous editions, it is entirely based on the programming language and environment R and is still thoroughly hands-on (with thousands of lines of heavily annotated code for all computations and plots). However, this edition has been updated based on many workshops/bootcamps taught by the author all over the world for the past few years: This edition has been didactically streamlined with regard to its exposition, it adds two new chapters – one on mixed-effects modeling, one on classification and regression trees as well as random forests – plus it features new discussion of curvature, orthogonal and other contrasts, interactions, collinearity, the effects and emmeans packages, autocorrelation/runs, some more bits on programming, writing statistical functions, and simulations, and many practical tips based on 10 years of teaching with these materials.

Frequently asked questions

Simply head over to the account section in settings and click on “Cancel Subscription” - it’s as simple as that. After you cancel, your membership will stay active for the remainder of the time you’ve paid for. Learn more here.
At the moment all of our mobile-responsive ePub books are available to download via the app. Most of our PDFs are also available to download and we're working on making the final remaining ones downloadable now. Learn more here.
Both plans give you full access to the library and all of Perlego’s features. The only differences are the price and subscription period: With the annual plan you’ll save around 30% compared to 12 months on the monthly plan.
We are an online textbook subscription service, where you can get access to an entire online library for less than the price of a single book per month. With over 1 million books across 1000+ topics, we’ve got you covered! Learn more here.
Look out for the read-aloud symbol on your next book to see if you can listen to it. The read-aloud tool reads text aloud for you, highlighting the text as it is being read. You can pause it, speed it up and slow it down. Learn more here.
Yes, you can access Statistics for Linguistics with R by Stefan Th. Gries in PDF and/or ePUB format, as well as other popular books in Filología & Lingüística. We have over one million books available in our catalogue for you to explore.

Information

Year
2021
ISBN
9783110718294
Edition
3

1 Some fundamentals of empirical research

When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind. It may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science. (William Thomson, Lord Kelvin, <http://hum.uchicago.edu/~jagoldsm/Webpage/index.html>)

1.1 Introduction

This book is an introduction to statistics. However, there are already very many introductions to statistics (and by now even a handful for linguists) – why do we need another (edition of this) one? Just like its previous two editions, this book is different from many other introductions to statistics in how it combines several characteristics, each of which is attested separately in other introductions, but not necessarily all combined:
  • it has been written especially for linguists: there are many many introductions to statistics for psychologists, economists, biologists etc., but much fewer which, like this one, explain statistical concepts and methods on the basis of linguistic questions and for linguists, and it does so starting from scratch;
  • it explains how to do many of the statistical methods both ‘by hand’ as well as with statistical functions (and sometimes simulations), but it requires neither mathematical expertise nor hours of trying to understand complex equations – many introductions devote much time to mathematical foundations (and while knowing some of the math doesn’t hurt, it makes everything more difficult for the novice), others do not explain any foundations and immediately dive into some nicely designed software which often hides the logic of statistical tests behind a nice GUI;
  • it not only explains statistical concepts, tests, and graphs, but also the design of tables to store and analyze data and some very basic aspects of experimental design;
  • it only uses open source software: many introductions use in particular SPSS or MATLAB (although in linguistics, those days seem nearly over, thankfully), which come with many disadvantages such that (i) users must buy expensive licenses that might be restricted in how many functions they offer, how many data points they can handle, how long they can be used, and/ or how quickly bugs are fixed; (ii) students and professors may be able to use the software only on campus; (iii) they are at the mercy of the software company with regard to often really slow bugfixes and updates etc. – with R, I have written quite a few emails with bug reports and they were often fixed within a day!
  • while it provides a great deal of information – much of it resulting from years of teaching this kind of material, reviewing, and fighting recalcitrant data and reviewers – it does that in an accessible and (occasionally pretty) informal way: I try to avoid jargon wherever possible and some of what you read below is probably too close to how I say things during a workshop – this book is not exactly an exercise in formal writing and may reflect more of my style than you care to know. But, as a certain political figure once said in 2020, “it is what it is …” On the more unambiguously positive side of things, the use of software will be illustrated in very much detail (both in terms of amount of code you’re getting and the amount of very detailed commentary it comes with) and the book has grown so much in part because the text is now answering many questions and anticipating many errors in thinking I’ve encountered in bootcamps/classes over the last 10 years; the RMarkdown document I wrote this book in returned a ≈560 page PDF. In addition and as before, there are ‘think breaks’ (and the occasional warning), exercises (with answer keys on the companion website; over time, I am planning on adding to the exercises), and of course recommendations for further reading to dive into more details than I can provide here.
Chapter 1 introduces the foundations of quantitative studies: what are variables and hypotheses, what is the structure of quantitative studies, and what kind of reasoning underlies it, how do you obtain good experimental data, and in what kind of format should you store your data? Chapter 2 provides an overview of the programming language and environment R, which will be used in all other chapters for statistical graphs and analyses: how do you create, load, and manipulate different kinds of data for your analysis? Chapter 3 explains fundamental methods of descriptive statistics: how do you describe your data, what patterns can be discerned in them, and how can you represent such findings graphically? In addition, that chapter also introduces some very basic programming aspect of R (though not in as much detail as my R corpus book, Gries 2016). Chapter 4 explains fundamental methods of analytical statistics for monofactorial – one cause, one effect – kinds of situations: how do you test whether an obtained monofactorial result is likely to have just arisen by chance? Chapter 5 introduces multifactorial regression modeling, in particular linear and generalized linear modeling using fixed effects.
While a lot of the above is revised and contains much new information, Chapter 6, then, is completely new and discusses the increasingly popular method of mixed-effects modeling. Finally, Chapter 7 is also completely new and discusses tree-based approaches, specifically classification and regression as well as conditional inference trees and random forests.
Apart from the book itself, the companion website for this book at http://www.stgries.info/research/sflwr/sflwr.html is an important resource. You can access exercise files, data files, answer keys, and errata there, and at http://groups.google.com/group/statforling-with-r you will find a newsgroup “StatFor-Ling with R”. If you become a member of that (admittedly very low-traffic) newsgroup, you can
  • ask questions about statistics relevant to this edition (and hopefully also get an answer from someone);
  • send suggestions for extensions and/ or improvements or data for additional exercises;
  • inform me and other readers of the book about bugs you find (and of course receive such information from other readers). This also means that if R commands, or code, provided in the book differs from information provided on the website, then the latter is most likely going to be correct.
Lastly, just like in the last two editions, I have to mention one important truth right at the start: You cannot learn to do statistical analyses by reading a book about statistical analyses – you must do statistical analyses. There’s no way that you read this book (or any other serious introduction to statistics) in 15-minutes-in-bed increments or ‘on the side’ before turning off the light and somehow, magically, by effortless and pleasant osmosis, learn to do statistical analyses, and book covers or titles that tell you otherwise are just plain wrong (if nothing worse). I strongly recommend that, as of the beginning of Chapter 2, you work with this book directly a...

Table of contents

  1. Title Page
  2. Copyright
  3. Contents
  4. Statistics for Linguistics with R – Endorsements of the 3rd Edition
  5. Introduction
  6. 1 Some fundamentals of empirical research
  7. 2 Fundamentals of R
  8. 3 Descriptive statistics
  9. 4 Monofactorial tests
  10. 5 Fixed-effects regression modeling
  11. 6 Mixed-effects regression modeling
  12. 7 Tree-based approaches
  13. About the Author