The Practice
Data mining and data analytics – process of analysing data from different perspectives and summarising it into useful information or generating actionable insights which can then be used to guide business decisions, develop strategies and create new opportunities – is well established in the retail and other sectors. The key tasks within data mining research are predictive modelling, descriptive modelling, discovering rules and patterns, exploratory data analysis and retrieval by content.
Data mining is really about finding hidden patterns in an organisation’s data that they can use to do their business better. It essentially produces two types of result:
1. Insight e.g. new knowledge and / or better understanding.
2. Predictive capability e.g. new links, anomalies, risks or potential.
The most important value of data analytics is that they help make decisions based on facts rather than on old assumptions or gut feelings
The growth in data mining will be further driven by next generation advances in cloud technologies which reduce data warehousing costs and allow massively scalable data to be stored and analysed quickly.
Data mining by has generated a range of fascinating insights such as:
• Wal-Mart (the global US retailer which owns Asda) apparently found a statistically significant correlation between purchases of beer and purchases of nappies. It was theorised that fathers were stopping off to buy nappies for their babies and since they could no longer go down to the pub as often, would buy beer as well. As a result of this insight, it situated nappies next to beer in the aisles resulting in increased sales of both.
• Credit card companies claim they can predict when a couple is going to divorce, two years in advance, with 98% accuracy.
• Loopt, a social mobile network, can predict with 90% accuracy where its users will be tomorrow.
• Hunch has an algorithm that has found out those users who prefer aisle seats on planes “spend more money on other people than on themselves”.
As consumers, we are used to Amazon.com and others combining our personal information and the choices we make with other consumers to market new products to us (normally suggestions about what others bought or what we might like). Specialised ‘mining’ algorithms work away seamlessly in the background.
Of course there will always be situations where the data mining doesn’t necessarily generate useful insights. The Home Office’s e-border system is tied into airlines’ ticketing networks and makes judgments about travel habits and friends and family to decide if passengers are a security risk. In practice, it turned out that ‘suspect’ requests likely to lead to innocent holidaymakers receiving ‘red flags’ as potential terrorists included ordering a vegetarian meal, asking for an over-wing seat or travelling with a foreign-born husband or wife. The system also ‘red flags’ anyone buying a one-way ticket and making a last-minute reservation and those with a history of booking tickets and not showing up for flights.
The use of data mining in the public sector is an emerging field. In the US, clinical outcomes algorithms have been applied to large health information databases in order to generate models directly applicable to clinical treatment. These models have been used successfully to create mortality risk assessments for adult and pediatric intensive care units. The data mining has, via pattern discovery, found examples of previously medically-unknown correlations.
As Donald Rumsfeld, then US Secretary of Defense, infamously noted during a news briefing “Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know”.
I would argue there are currently too many unknown unknowns in education.
The insight
Stereotypes are often wrong. Interesting relationships can be sought and discovered without prior knowledge of what those relationships might be and without resorting to more traditional directed queries or statistical tests of hypotheses.
The educational applicability
Educational Data Mining should be ultimately concerned with developing methods for exploring the range of data that come from educational settings, and using those methods to better understand the settings in which students learn as well as the students themselves. Data mining development and practice is currently more advanced in the US.
The Bill and Melinda Gates Foundation is funding an interesting study at WCET (which was created to facilitate sharing of resources among the higher education systems of 15 member states in the US). The grant focuses on identifying variables that impact student retention and progression and will involve data mining of over 400,000 student records.
“The New York City Department of Education’s Achievement Reporting and Innovation System (ARIS) provides a single place where educators can find important information to use to accelerate student learning. ARIS provides New York City educators with a secure online platform for:
• Exploring data they can use to improve student outcomes
• Sharing what they have learned by publishing documents and taking part in discussions and blogs
• Finding other educators facing similar challenges
• Creating collaborative communities to solve problems together
• Parents can log in to ARIS Parent Link”
ARIS is an $80 million Web-based data mining and business intelligence project that enables all 80,000 of the city's public school teachers to access and get training in the analysis tools. The programme involves up to 100TB in a data warehouse, with enrolment, assessment, and biographical data for all 1.1 million New York City students, plus profile data for every staff member.
To date most school data in England has been crunched at a local level (i.e. Local Authority or school / federation) which leads to data sets that may be too small to identify currently unknown correlations between variables.
Data mining gives the education system the potential for generating insights which enable it to move forward from the negative stereotyping that has stubbornly persisted be it;
• Around gender-related expectations about academic engagement and consequences for performance (research has previously shown that female primary school teachers may unwittingly perpetuate low expectations of boys’ academic achievement and encourage girls to work harder by letting them think they are cleverer)
• Around the performance of ethnic minorities (e.g. the underperformance in examinations of UK medical students from ethnic minorities could be partly down to a psychological phenomenon called 'stereotype threat', according to UCL research published in the British Medical Journal.)
• Generally lower aspirations for students with a Statement of Special Educational Need or subject to additional support through School Action or School Action Plus.
• Low aspirations based on ‘postcode’ or class.
Imagine if we could combine data at a national level such as students’ use of interactive learning environments, attendance, level of engagement, attributes such as gender, ethnicity or eligibility for free school meals, hours of homework study, home postcode, participation in extracurricular activities, family circumstances, attainment, curriculum choices made, learning goals, skills, teachers the learner was taught by, school funding for the institution they attend and community context. Most of this data is already captured, but not easily mined as it is held in fragmented databases by siloed institutions.
Data mining in education has the potential to draw out ‘positive deviation’ in schools or other learning institutions (positive deviation is based on the observation that in every community there are certain individuals or groups whose uncommon behaviours and strategies enable them to find better solutions to problems than their peers, while having access to the same resources and facing similar or worse challenges).
Policy making, parental and teacher aspirations and the organisation of learning itself could all be improved, and better still transformed, through mining existing data to make and act upon previously unseen connections. The focus must be on using the insights generated to enhance teaching and learning by enabling teachers and policy makers to better understand cohorts of learners.
Trackback URL for this post:
http://www.futurelab.org.uk/trackback/894