Big data and analytics are being used in the education sector to personalize student curricula, but also to screen students that are likely to succeed and "cream" those that are expected to fail. That could create policy failures.
About a year ago I wrote a blog in this community, where I talked about the big ethical questions of big data. One of the use cases that I discussed was of education institutions using big data and analytics to protect their revenues. In particular I raised the following question: "... what if schools decide to offer their services only to students that are more likely to succeed ("creaming their customers") and instead "parking" those that are less likely to succeed?"
Two days ago I had precisely this conversation with a university CIO. They are starting to use a mix of business intelligence, student management, learning management and CRM tools to mine student profiles and predict in which one of three groups they fall into:
- Excellent students: they need no help, actually they could be further sources of revenue if given the opportunity to continue their studies through master degrees and PhDs.
- Average or somewhat struggling students: they need some help, so with a tutor (either a peer student or a junior faculty member) they can get to graduation, so the university gets the full tuition for the time it had planned to.
- Hopeless student: they will struggle big time, so the university should spend the least amount of money on them and let them fail sooner rather than later, so that resources can be freed up to recruite new excellent, or at least average students.
I'm oversimplifying the discussion I had with this IT executive, but it's not too far from reality and many universities (both public and private) have started applying this approach in the past two years. If you think of it, this is not too different from risk models financial institutions have applied for years to make a decision on mortgages or insurance plans. From an individual university standpoint it makes perfect sense, it increases ROI on precious resources, like faculty, facilities, career placement, alumni and so forth. From an educational system standpoint it can create information asymmetries, hence market failures. First of all for students that, if they are not aware this is happening and find out two years after they wasted time on a chemistry degree for instance, it could already be too late for them to switch to business, or med school, where their talent could be better aligned; they might have already spent too much money or feel they would get to the job market too late. The asymmetry could also negatively impact education policymakers that, to compensate for the asymmetry, could decide to increase funding for the students that require more time to find their way to graduation, in order to give them a fair chance to succeed. And this is not exactly a time where governments, particularly in Europe, have tons of money to increase spending. The ethical and economic question becomes even more daring if applied to primary and secondary (or K-12) schools.
The problem can be addressed by removing some of the asymmetry, for instance schools and universities could screen students early on and point them in the right direction. If they do not have an incentive to do so, for example because they struggle to collect and integrate enough data from prior education levels and institutions that the student attended, which reduces the accuracy of predictive modeling, then policymakers could step in and favor integration by creating standard for data sharing at the national level. Or education institutions can use alternative data sets, such as social media profiling. Students should be informed too, so they can make a concious decision; they should certainly maintain the option to choose courses, even though they are predicted to struggle. And whatever the model predicts, their privacy should be protected.
Information sharing is clearly at the heart of the problem, education instution IT executives will have to adjust their information architectures to optimize usage, production and retention of all available data.