Course Hero’s machine learning team is working to provide college students with better access to study resources. Here, 3 issues we’re tackling today.
My life is about two passions: education and data science. That’s why the algorithm that brought me to Course Hero couldn’t have been simpler. There probably isn’t a single company in the world that is better at bringing those two things together. Simply put:
Course Hero is an online platform where students and educators can share their study resources. To date, these Course Hero users have shared more than 20 million study resources.
Our aim is to make these study resources easily accessible, so students can learn deeply.
But that only tells part of the story — it’s what we do, but how do we do it? And how can we do it better? That’s a question we’re asking, investigating, and answering every day.
Massive amounts of data: The good, the “bad,” and the thorny problems
First, the good: Those “20 million-plus and growing” study resources comprise course notes, study guides, practice tests, and more, all 100% crowdsourced and tagged by users to specific courses at schools around the world. While there are other platforms where students and educators can share study resources, no other platform stores and provides access to as many study resources. And that’s what makes these study resources a treasure trove for machine learning technology. Also good: All that fodder for the algorithms means they learn better and faster.
As for the bad: Well, too much data is never really a bad thing! However, it does present some interesting challenges. Here’s how my department is using machine learning to solve three of them.
#1: Tagging millions of pieces of content
One of our biggest challenges is tagging. For us to be successful in serving up the right resources to students at the right time, we need to be able to understand what our users have uploaded to the platform.
As with any user-generated–content platform, when students or educators contribute their original notes and documents, they don’t always include tags. Or if they do, their tags may not be right or thorough. As a result, these resources are not always easily accessible by other students.
Machine learning is perfectly equipped to help here. Our platform can filter user documents and extrapolate which tags are the most appropriate.
This provides several benefits:
- It helps fill in the gaps when a student or educator doesn’t include the right tags on the front end (i.e., when a student uploads their content).
- It adds a deeper taxonomy on the back end that, while mostly meaningless to students, can help automatically assign metadata to make the document as specific as possible.
- It enables students to spend less time searching (and looking at materials they don’t need), so they can spend their study time more productively.
#2: Generating quizzes from Q&A documents
Another challenge Course Hero is currently solving for is generating question-and-answer exercises to help students test themselves on facts.
There is a good deal of research out there that shows that flashcards and quizzes are among the most effective ways to help students learn. And because so many of the documents contributed by students and educators contain questions and answers, they’re a great complement to in-class learning.
So, we asked ourselves: Could we integrate machine learning into the platform in order to automatically comb through these documents and identify the questions and answers? Could the platform extract multiple choice, true and false, fill in the blank, and the like from user content?
The answer is yes. Our first Q&A generator is up and running, with exciting new enhancements on the way.
#3: Generating quizzes from blocks of text
While we’re excited about the platform finding and identifying questions and answers, we’ve got even bigger Q&A opportunities ahead. Many of the study materials contributed to the platform are not formatted as questions and answers but rather as numerous and lengthy paragraphs. Herein lies the next challenge: How does the platform generate meaningful questions and answers from blocks of text?
Teaching machines to understand context or nuance is a very challenging problem in machine learning. For example, let’s say we have several pages of notes on Barack Obama. The text may say very simply that he was the 44th President of the United States. It’s not hard to write code that will turn that sentence into “Who was the 44th President?” or “What number President was Barack Obama?” or “What country was Barack Obama the president of?”
But let’s say the passage has more information, such as his birthplace, childhood, political career, major achievements, and so on. That’s a lot more information, and it often involves pronouns in place of proper names. What’s more, “he” may not necessarily refer to Obama; it could refer to other people named in the passage, such as Joe Biden, for example.
Called coreference resolution in the Natural Language Processing (NLP) space, this challenge is particularly interesting for the Course Hero team. As we develop tools to understand who (or what) a pronoun is referring to, we can use that information to frame more intelligent questions and present answer alternatives so that students can quiz themselves.
Solving the Natural Language part of the problem is just one piece of the puzzle. As we solve this, the platform will generate ever more valuable questions and answers and provide an increasingly meaningful study experience for the student. Generating quizzes from blocks of text is a thorny problem that is still on our desks today.
Why it matters: Using machine learning to help students succeed
Using machine learning to assist people in any capacity is extremely exciting. This includes helping consumers better navigate Amazon or find their new favorite show on Netflix. Those sorts of personal recommendation engines are one of the reasons I got into data science and machine learning.
But it is problems like the ones discussed here that get me excited about working at Course Hero. They’re not necessarily unique in the fields of data science and machine learning. The difference, for me, is in the ultimate impact the work will have.
By employing machine learning to help students study, we believe we’re helping them get the most out of their coursework. That, in turn, leads to a more meaningful college experience and, hopefully, it helps students become confident, prepared college graduates who are ready to take on the world.
I feel a particularly deep sense of personal fulfillment knowing that my work may be positively impacting a student’s life. At the end of the day, that’s what we’re all trying to do at Course Hero. And we’re learning how to do that better every day.
Learn about Course Hero careers
I’m really excited that Course Hero is investing faster in machine learning than in any other part of the company. When I started here in October of 2017, I was the first machine learning engineer on staff. My team is now four, including me, and we have plans to more than double by the end of 2018. If solving these kinds of problems behind the scenes is also your passion, we’d love to talk to you about joining the Course Hero team!