Chapter 1
Introduction
Data science in education—you’re invited to the party!
Dear Data Scientists, Educators, and Data Scientists who are Educators:
This book is a warm welcome and an invitation. If you’re a data scientist in education or an educator in data science, your role isn’t exactly straightforward. This book is our contribution to a growing movement to merge the paths of data analysis and education. We wrote this book to make your first step on that path a little clearer and a little less scary.
Whether you’re a data scientist using your skills in an education job or an educator who wants to learn data science skills, we invite you to read this book and put these techniques to work in the real world. We think that your work in the education community will help decide how education and data science come together going forward.
Inspired by {bookdown}, this book is open source. Its contents are reproducible and publicly accessible for people worldwide. The online version of the book is hosted at datascienceineducation.com.
Learning data science in education
Over the coming chapters we’ll be learning together about what data science in education can look like. But to understand why we were compelled to write about the topic, we need to talk about why data science in education is not such a straightforward thing.
Learning data science in education is challenging because there isn’t a universal vision for that role yet. Data science in education isn’t straightforward because the role itself is not straightforward. If education were a building, it would be multi-storied with many rooms. There are privately and publicly funded schools. There are more than 18 possible grade levels. Students can learn alone or with others in a classroom.
This imaginary building we call education also has rooms most residents never see—rooms where business and finance staff plan the most efficient use of limited funds. The transportation department plans bus routes across vast spaces. University administrators search for the best way to measure career readiness. Education consultants study how students perform on course work and even how they feel about class materials.
There are a lot of ways one could do data science in education, but building consensus on ways one should do data science in education is just getting started. The “data science in education” community is still working out how it all fits together.
And for someone just getting started, it can all seem very overwhelming.
Even if we did have perfect clarity on the topic, there’s still the issue of helping education systems learn to leverage these new analytical tools. In many education settings, school administrators and their staff may have never had someone around who deeply understands education, knows how to write code, and uses statistical techniques all at once, as data science in education could be defined (Conway 2010).
Making the path a little clearer
As data science in education grows, the way we talk about and conceptualize it also needs to grow; doing so can help us advance data science in education as a discipline and speak to the unique opportunities and concerns that arise with analyzing data in our domain.
We begin this book by offering a primer for data science in education, including a discussion of unique challenges and foundational skills in the programming language R. This includes this chapter as well as suggestions for how to use this text (Chapter 2), our definition of the process of data science and what it “looks like” in terms of who does data science and how they do it (Chapter 3), and a discussion of data science in education in the context of the wider fields of both education and data science (Chapter 4).
Next, you’ll take what you’ve learned and apply it in our data analysis in education walkthroughs. The walkthroughs in this book are our contribution towards a more example-driven approach to learning. They’re meant to make the ambiguous path of learning data science in education a little clearer by way of recognizable and actionable demonstrations.
These examples fall into four different themes, with chapters applying to each theme:
Build a foundation to use R and RStudio
Student perceptions of learning
Walkthrough 1: The Education Dataset Science Pipeline
Walkthrough 5: Text Analysis With Social Media Data
Walkthrough 7: The Role (and Usefulness) of Multilevel Models
Analyze student performance data
Get value from publicly available data
We’ll end the book by discussing how to bring data science skills into your education job, with strategic considerations for applying data science in your job (Chapter 15), an overview of teaching data science (Chapter 16), and chapters on learning more (Chapter 17), and additional resources (Chapter 18).
We hope after reading this book you’ll feel like you’re not alone in learning to do data science in education. We hope your experience with this book is the right balance of challenging and fun. Finally, we hope you’ll take what you learned and share it with others who are looking to start this journey.
Conventions used in the book
The following typographical conventions are used in this book:
Package names are surrounded by curly brackets: {caret}
Function names are in constant width
and then parentheses: clean_names()
Variable names are in constant width
: var1
Chapter 2
How to use this book
We’ve heard it from fellow data scientists and experienced it ourselves—learning a programming language is hard. Like learning a foreign language, it is not just about mastering vocabulary. It’s also about learning the language’s norms, its underlying structure, and the metaphors that hold the whole thing together.
The beginning of the learning journey is particularly challenging because it feels slow. If you have experience as an educator or consultant, you already have efficient solutions you use in your day-to-day work. Introducing code to your workflow slows you down at first because you won’t be as fast as you are with your favorite spreadsheet software. However, you’re probably reading this book because you realize that learning how to analyze data using R is like investing in your own personal infrastructure—it takes time while you’re building the initial skills, but the investment pays off when you start solving complex problems faster and at scale. One person we spoke with shared this story about their learning journey:
The first six months were hard. I knew how quickly I could do a pivot table in Excel. It took longer in R because I had to go through the syntax and take the book out. I forced myself to do it, though. In the long-term, I’d be a better data scientist. I’m so glad I thought that way, but it was hard the first few months.
Our message is this: learning R for your education job is doable, challenging, and rewarding all at once. We wrote this book for you because we do this work every day. We’re not writing as education data science masters. We’re writing as people who learned R and data science after we chose education. And like you, improving the lives of students is our daily practice. Learning to use R and data science helped us do that. Join us in enjoying all that comes with R and data science—both the challenge of learning and the joy of solving problems in creative and efficient ways.
Different strokes for different data scientists in education
As we learned in the introduction, it’s tough to define data science in education because people are educated in all kinds of settings and in all kinds of age groups. Education organizations require different roles to make it work, which creates different kinds of data science uses. A teacher’s approach to data analysis is different from an administrator’s or an operations manager’s.
We also know that learning data science and R is not in the typical job description. Most readers of this book are educators working with data and looking to expand their tools. You might even be an educator who doesn’t work with data, but you’ve discovered a love for learning about the lives of students through data. Either way, learning data science and R is probably not in your job description.
Like most professionals in education, you’ve got a full work schedule and challenging demands in the name of improving the student experience. Your busy workday doesn’t include regular professional development time or self-driven learning. You also have a life outside of work, including family, hobbies, and relaxation. We struggle with this ourselves, so we’ve designed this book to be used in lots of different ways. The important part in learning this material is to establish a routine that allows you to engage and practice the content every day, even if for just a few minutes at a time. That will make the content ever-present in your mind and will help you shift your mindset so you start seeing even more opportunities for practice.
We want all readers to have a rewarding experience, and so we believe there should be different ways to use this book. Here are some of those ways:
Read the book cover to cover (and how to keep going)
We wrote this book assuming you’re at the start of your journey learning R and using data science in your education job. The book takes you from installing R to practicing more advanced data science skills like text analysis.
If you’ve never written a line of R code, we welcome you to the community! We wrote this book for you. Consider reading the book cover to cover and doing all the analysis walkthroughs. Remember that you’ll get more from a few minutes of practice every day than you will from long hours of practice every once in awhile. Typing code every day, even if it doesn’t always run, is a daily practice that invites learning and “aha” moments. We know how easy it is to avoid coding when it doesn’t feel successful (we’ve been there), so we’ve designed this book to deliver frequent small wins to keep the momentum going. But even then, we all eventually hit a wall in our learning. When that happens, take a break and then co...