1. Educational Data Mining (Romero & Ventura, 2007)
1.1. Data Mining in Web-Based Systems
1.1.1. Types of Web Logs to Track Student Interaction with "web-based systems"
1.1.1.1. Server log file
1.1.1.2. Client log file
1.1.1.3. Proxy log file
1.1.2. Solutions to comply with legal restrictions for web-based systems
1.1.2.1. Monitor learning pathways in a programme, such as selected courses or time to complete a course. (Yu, Own, and Lin, 2001)
1.1.2.2. Track Website Navigational Behavior of Users: (Liand Zaıane, 2004)
1.1.2.3. Connect Log Files with Comments and File Attachments: (Avouris, Komis, Fiotakis, Margaritis, and Voyiatzaki, 2005)
1.1.2.4. Combine User Profile Data and Activity Data in a Composite Information Model (Monk, 2005)
1.1.2.5. Combine written feedback, informal discussions and polls (Ingram, 1999)
1.1.2.6. Track semantics recorded by web-based educational systems (Iksal and Choquet, 2005)
1.1.2.7. Extract data using Software Agents (Markham et al. 2003)
1.2. Data Mining in Learning Management Systems
1.2.1. Record student activities (reading, writing, assessments, discussion)(Mostow,2004)
1.2.2. Access a database to view personal information, academic results & user interaction data.
1.2.3. Gain objective feedback - learn about how the students learn.
1.3. Data Preprocessing
1.3.1. Definition: Transform the original data into a suitable shape to be used by a particular mining algorithm.
1.3.2. Types
1.3.2.1. Data Cleaning
1.3.2.2. User Identification
1.3.2.3. Session Identification
1.3.2.4. Transaction identification
1.3.2.5. Path completion
1.3.2.6. Data transformation and enrichment
1.3.2.7. Data integration
1.3.2.8. Data reduction
1.3.3. Issues to Consider
1.3.3.1. User authentication means logins and logouts determine user sessions.
1.3.3.2. Student interactions are recorded in log files and but in databases (which are more powerful).
1.3.3.3. Data transformation means numerical values are transformed into ranges to support understanding of the data.
1.3.3.4. Data filtration uses specific educational concepts (no. of attempts, no. repeated reading, level of knowledge) and removing instances that violate it.
1.4. Data Mining Techniques
1.4.1. Measuring Usage Statistics
1.4.1.1. AccessWatch, Analog, Gwstat, WebStat (designed to analyze web server logs)
1.4.1.2. Synergo/ColAT (For educational data)
1.4.1.3. Total number of visits and number of visits per page (Pahl & Donnellan, 2003).
1.4.1.4. Connected learner distribution over time, the most frequent acceded courses, how learners establish many learning sessions over time (Zorrilla et al., 2005).
1.4.1.5. Specific statistical in AIWBES can show the average number of constraint violations, the average problem complexity, the total time spent in attempts (Nilakant & Mitrovic, 2005).
1.4.1.6. More complex statistical tests of procedures such as regression analysis, correlation analysis, multivariate statistical methods. (Zarzo, 2003) need to use a more powerful statistical tools as SPSS, SAS, S, R, Statistica, etc. (Klosgen & Zytkow, 2002).
1.4.2. Visualization of data
1.4.2.1. Information obtained from usage statistics is not always easy to interpret to the educators and then other techniques have to be used. p.140
1.4.2.1.1. Instructors can manipulate the graphical representations generated, which allow them to gain an understanding of their learners and become aware of what is happening in distance classes. p.141
1.4.2.2. Collaborative and social learning (in P2P systems)
1.4.2.3. Visualization tools in educational data: GISMO/CourseVis (Mazza & Milani, 2005) and Listen tool (Mostow et al., 2005).
1.4.3. Web mining techniques applied to educational systems
1.4.3.1. Clustering
1.4.3.2. Classification
1.4.3.3. Outlier detection
1.4.3.4. Association rule mining
1.4.3.5. Sequential pattern mining
1.4.3.6. Text mining
1.4.4. Web mining categories from the used DATA viewpoint
1.4.4.1. Web content mining
1.4.4.1.1. The process of extracting useful information from the contents of web documents
1.4.4.2. Web structure mining
1.4.4.2.1. The process of discovering structure information from the web
1.4.4.3. Web usage mining (WUM)
1.4.4.3.1. The discovering of meaningful patterns from data generated by client–server transactions on one or more web localities.
1.4.5. Web mining categories from the used SYSTEM viewpoint
1.4.5.1. Offline web mining
1.4.5.1.1. Used to discover patterns and other useful information to help educators to validate the learning models and restructure the web site
1.4.5.2. Online or integrated web mining
1.4.5.2.1. Patterns automatically discovered are fed into an intelligent software system or agent that could assist learners in their online learning endeavours.
1.4.6. Clustering
1.4.6.1. Process of grouping physical or abstract objects into classes of similar objects
1.4.6.2. Unsupervised classification
1.4.6.3. Example: Group together a set of pages with similar contents users with similar navigation behavior or navigation sessions.
1.4.7. Classification
1.4.7.1. Predicts class labels
1.4.7.2. Classification is a supervised classification.
1.4.7.3. Example: Allows characterizing the properties of a group of user profiles, similar pages or learning sessions.
1.4.8. Outlier
1.4.8.1. An observation unusually large or small relative to the other values in a dataset.
1.4.8.2. Outlier occurrences include:
1.4.8.2.1. Incorrect measurement entered into computer
1.4.8.2.2. Measurements come from a different population
1.4.8.2.3. Represents a rare event
1.4.8.3. Example: Outlier detection can detect students with learning problems.
2. The State of Educational Data Mining in 2009: A Review and Future Visions (Baker 2009)
2.1. Terms
2.1.1. Knowledge Data Discovery (KDD)
2.1.2. Educational Data Mining (EDD)
2.2. Review of Remero & Ventura (Educational Data Mining Categories)
2.2.1. Statistics and visualization
2.2.2. Web mining
2.2.2.1. Clustering, classification, and outlier detection
2.2.2.2. Association rule mining and sequential pattern mining
2.2.2.3. Text mining
2.3. Review of Baker's Taxonomy (Educational Data Mining Categories)
2.3.1. Prediction
2.3.1.1. Density estimation
2.3.1.2. Regression
2.3.1.3. Classification
2.3.2. Clustering
2.3.3. Relationship mining
2.3.3.1. Association rule mining
2.3.3.2. Correlation mining
2.3.3.3. Sequential pattern mining
2.3.3.4. Causal data mining
2.3.4. Distillation of data for human judgment
2.3.5. Discovery with models
2.4. Examples of Student Modelling with EDD
2.4.1. Student’s characteristics or state, such as the student’s current knowledge, motivation, meta-cognition, and attitudes. p.6
2.4.2. Discover if student a student is gaming the system (Baker et al. 2004)
2.4.3. Predict poor self-efficacy (McQuiggan et al. 2008)
2.4.4. Discover is student bored or frustrated (D'Mello et al. 2008)
2.4.5. Predictive of student failure or non-retention in college courses or in college (Dekker et al. 2009; Romero et al. 2008; Superby et al. 2006)