Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

School Dropouts – A Theoretical Framework

Profile image of Indus Foundation International Journals  UGC Approved

http://indusedu.org/

The following article is a fragment of a PhD thesis and is aimed at outlining the main traits of the school dropout phenomenon starting from the different perspectives of the authors in defining the concept, explaining the causes, understanding the consequences. The summary of an important collection of studies on the subject is meant to serve as theoretical basis for researches in the field and to offer the premises for elaborating prevention and intervention strategies.

Related Papers

Sirous Tabrizi

research proposal school dropout

Washington Office of Superintendent of Public Instruction

Pete Bylsma

Children and Youth Services Review

Cuthbert Tukundane , Alexander Minnaert

Tanja Grbesic

Indus Foundation International Journals UGC Approved , Olusegun Akinbote

In Nigeria, inclusion practice is emerging and has been required by federal law in Nigeria and advocated for by professionals. However, much of the responsibility for the successful inclusion of children with special needs rests upon the shoulders of teachers. This paper examined the pre-primary and primary school teachers' perception of and attitude towards inclusion in Ibadan, Oyo State, Nigeria. The study adopted descriptive research design. A multi-stage sampling procedure was adopted in the selection o f 200 pre-primary and primary school teachers from 15 schools (10 regular and 5 special schools) to explore their perception of and attitude towards inclusive education. Teachers' Perception of and Attitude towards Inclusion (α=0.87) was the instrument use d to collect the data. The data collected were analyzed using frequency count, simple percentage and t-test. Four research questions were answered. The results revealed that majority of the teachers have positive perception of and attitude towards inclusion respectively. The findings of this study also showed no significant difference between the special and regular school teachers in their perception of and attitude towards inclusion. The implications of the findings were discussed and recommendations were made.

Nicholas D Hartlep

Capella University Dissertation - School of Education (UMI)

Doc Debi Ash

Despite efforts to provide instructional intervention programs for students who are at risk of non-completion or who have left school without graduating, many programs are not achieving consistent success. To assess this situation, the nature of the instructional design strategies deployed within these programs was investigated, with a focus on whether participatory design principles and the student voice would enhance levels of engagement and motivation and increase the chances of graduating from high school. A mixed method, post-test two group research design using dependent samples, two tailed t-test quantitative analysis was implemented using asynchronous groups, inventories of motivation and engagement, and written observations as data collection instruments. Three distinct phases using a traditionally based lesson, participants voices in the redesign of said traditional lesson, and implementation of the redesigned lesson, led to the null hypothesis “there is no significant difference in motivation between standard and participatory design courses” failing to be rejected. Despite the failure to reject the null hypothesis in motivation, overall levels were improved in 100% of the individual categories; suggesting that participating in the design process can lead to improvements in motivation. The null hypothesis of “there is no significant difference in engagement between standard and participatory design courses” was rejected; as there was a significant increase (p =<.05) of engagement upon implementation of participatory design principles indicating that end users‘ needs are crucial in the promotion of engagement and motivation and that implementation of participatory design principles can provide what traditional instructional models, to date, have not. This study provides instructional designers, educators, and administrators with data to support the redesign of current intervention programs in order to bridge the graduation gap by utilizing the voice of the most important stakeholders: the actual students.

Journal of At-Risk Issues

David Chorney

National Dropout Prevention Center Network

Cathy Hammond , J. Smink

RELATED PAPERS

Indus Foundation International Journals UGC Approved

Gjergji D Theodhosi

Delia Duminică

Women, Gender, and Families of Color

Dorothy E Hines

ANITA WADHWA

International Journal on School Disaffection

Mohamad Bilal Ali

Mitsi Giazitzi

Claire Gordon , marina cino

Oirc Journals

Roxana Brișcariu (Toader)

Aimee Whiteside , Amy Garrett Dikkers , Somer Lewis

Educational Sciences Theory&Practice

Mary Elizabeth Ochea

Soribel Genao

Majid Shirvani

Graham Daniel

International Journal of Educational Development

Anies Al-Hroub

Fumo Almasi

Children & Schools

Cynthia Franklin , Johnny S. Kim

Julie Tagle

Magdalena Balica , Delia Goia , Oana Benga , Oana Iftode

Ciprian Fartusnic

Flavius Marcau

Educational Research Review

Sofie Cabus

Irina Ilisei

Alexander Minnaert , Cuthbert Tukundane , Jacques Zeelen , Peter Kanyandago

Fondazione Hallgarten - Franchetti Centro Studi Villa Montesca

Intellectual and Developmental Disabilities

Courtney Wilt , Mary E . Morningstar

Journal of Applied Technical and Educational Sciences

Csilla Marianna Szabó

Vitomir Jovanovic

Alina Preda

Raphael Cohen-almagor

Daniel Caro

Kate Sirota

IOSR Journal Of Humanities And Social Science (IOSR-JHSS)

Esayas Teshome

Hanna Tomaszewska

Omar Ihssane Nouairi

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

A Research Proposal On School Dropout

School Dropout Research Proposal:

Essay Example on Research Questions On School Dropouts

School dropout is the notion which explains the process of school resignation by a student. The cases of school dropout are quite frequent, because students find different reasons and excuses to leave school. Many students leave school, because they are bored and want to find a job and earn money. The others leave school, because they need money being from the low-income family, so they have to find a job to support their brothers and sisters financially.

Some students drop out of school, because they have various problems connected with their family, abuse, drugs. Very often young people are involved into street gangs and this factor also makes them drop out. Disabled students are also likely to drop out, because they feel inconvenient in the student society; moreover, they are also the common targets of mocking and abuse. Finally, students drop out, because they get married or the girls get pregnant and decide to bring up their child.

Unfortunately, students often do not realize the danger of such a decision, because school dropout has a completely negative impact on the future of the young person. People who have dropped out of school achieve success in life very rarely, because nearly every employer demands his employees to have a diploma. As a result the person will never have a perspective job, will earn little money, will probably live in poverty and ruin his health very soon being unable to consume healthy food and purchase medicine.

research proposal school dropout

Proficient in: Abuse

“ She followed all my directions. It was really easy to contact her and respond very fast as well. ”

School dropout is a serious problem for the young people and their parents, who are disappointed with their children and can not persuade them to return to school. Students who have investigated the problem of high school dropout and understood it well are able to prepare a research paper on it if they manage to convince the professor in the importance of the topic. A successful school dropout research proposal has to be interesting, up-to-date, informative, logical and brief. A student should present the scope of the research, research approach, predictions and methods used for the research and possible achievements of the paper if he wants to persuade the professor that the topic is worth paying attention to.

It is obvious that it is not easy to cope with the process of research proposal writing without possession of experience and writing skills. If one does not know how to prepare a good proposal himself, he will need to take advantage of a free example research proposal on high school dropout found in the web. If one looks through a good free sample research proposal on school dropout prepared by the experienced writer, he will understand the rules of composition, formatting and analysis.

*** NOTE! As far as you know free sample research proposals and examples about School Dropout are 100% plagiarized!!!

At EssayLib.com writing service you can order a custom research proposal on any topic . Your research paper proposal will be written from scratch. We hire top-rated Ph.D. and Master’s writers only to provide students with professional research proposal help at affordable rates. Each customer will get a non-plagiarized paper with timely delivery. Just visit our website and fill in the order form with all proposal details:

Cite this page

A Research Proposal On School Dropout. (2019, Nov 27). Retrieved from https://paperap.com/paper-on-research-proposal-on-school-dropout/

"A Research Proposal On School Dropout." PaperAp.com , 27 Nov 2019, https://paperap.com/paper-on-research-proposal-on-school-dropout/

PaperAp.com. (2019). A Research Proposal On School Dropout . [Online]. Available at: https://paperap.com/paper-on-research-proposal-on-school-dropout/ [Accessed: 3 Apr. 2024]

"A Research Proposal On School Dropout." PaperAp.com, Nov 27, 2019. Accessed April 3, 2024. https://paperap.com/paper-on-research-proposal-on-school-dropout/

"A Research Proposal On School Dropout," PaperAp.com , 27-Nov-2019. [Online]. Available: https://paperap.com/paper-on-research-proposal-on-school-dropout/. [Accessed: 3-Apr-2024]

PaperAp.com. (2019). A Research Proposal On School Dropout . [Online]. Available at: https://paperap.com/paper-on-research-proposal-on-school-dropout/ [Accessed: 3-Apr-2024]

  • Solutions To School Dropout Pages: 3 (860 words)
  • Why Students Dropout Of School Essay Pages: 2 (364 words)
  • Education & US High School Dropout Issue Pages: 6 (1638 words)
  • College Dropout Samples Pages: 4 (1052 words)
  • The Effect of Dropout Rates on Minorities Pages: 4 (932 words)
  • A Proposal on Reducing School Stress for Secondary School Students Pages: 4 (1132 words)
  • Sample Proposal For School Uniform Pages: 2 (558 words)
  • Organizational Change Research Proposal Pages: 4 (975 words)
  • Research Proposal: Global Financial Crisis Pages: 14 (4051 words)
  • Research Proposal on Organizational Development Pages: 2 (526 words)

A Research Proposal On School Dropout

Free Research Paper Samples, Research Proposal Examples and Tips | UsefulResearchPapers.com

Research proposal on school dropout.

June 10, 2013 UsefulResearchPapers Research Proposals 0

School Dropout Research Proposal:

School dropout is the notion which explains the process of school resignation by a student. The cases of school dropout are quite frequent, because students find different reasons and excuses to leave school. Many students leave school, because they are bored and want to find a job and earn money. The others leave school, because they need money being from the low-income family, so they have to find a job to support their brothers and sisters financially. Some students drop out of school, because they have various problems connected with their family, abuse, drugs. Very often young people are involved into street gangs and this factor also makes them drop out. Disabled students are also likely to drop out, because they feel inconvenient in the student society; moreover, they are also the common targets of mocking and abuse. Finally, students drop out, because they get married or the girls get pregnant and decide to bring up their child.

We Can Write a Custom Research Proposal on School Dropout for You Now!

Unfortunately, students often do not realize the danger of such a decision, because school dropout has a completely negative impact on the future of the young person. People who have dropped out of school achieve success in life very rarely, because nearly every employer demands his employees to have a diploma. As a result the person will never have a perspective job, will earn little money, will probably live in poverty and ruin his health very soon being unable to consume healthy food and purchase medicine.

School dropout is a serious problem for the young people and their parents, who are disappointed with their children and can not persuade them to return to school. Students who have investigated the problem of high school dropout and understood it well are able to prepare a research paper on it if they manage to convince the professor in the importance of the topic. A successful school dropout research proposal has to be interesting, up-to-date, informative, logical and brief. A student should present the scope of the research, research approach, predictions and methods used for the research and possible achievements of the paper if he wants to persuade the professor that the topic is worth paying attention to.

It is obvious that it is not easy to cope with the process of research proposal writing without possession of experience and writing skills. If one does not know how to prepare a good proposal himself, he will need to take advantage of a free example research proposal on high school dropout found in the web. If one looks through a good free sample research proposal on school dropout prepared by the experienced writer, he will understand the rules of composition, formatting and analysis.

*** NOTE! As far as you know free sample research proposals and examples about School Dropout are 100% plagiarized!!!

At EssayLib.com writing service you can order a custom research proposal on any topic . Your research paper proposal will be written from scratch. We hire top-rated Ph.D. and Master’s writers only to provide students with professional research proposal help at affordable rates. Each customer will get a non-plagiarized paper with timely delivery. Just visit our website and fill in the order form with all proposal details:

Custom Research Proposal on School Dropout

Similar Posts:

  • Research Paper on School Dropout
  • Research Proposal on Academic Performance
  • Research Proposal on Work Life Balance

Copyright © 2023 | WordPress Theme by MH Themes

Advertisement

Advertisement

A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile

  • Published: 23 January 2023
  • Volume 28 , pages 10103–10149, ( 2023 )

Cite this article

  • Patricio Rodríguez   ORCID: orcid.org/0000-0002-7361-5132 1 ,
  • Alexis Villanueva 2 ,
  • Lioubov Dombrovskaia 3 &
  • Juan Pablo Valenzuela 1  

3389 Accesses

6 Citations

Explore all metrics

School dropout is a structural problem which permanently penalizes students and society in areas such as low qualification jobs, higher poverty levels and lower life expectancy, lower pensions, and higher economic burden for governments. Given these high consequences and the surge of the problem due to COVID-19 pandemic, in this paper we propose a methodology to design, develop, and evaluate a machine learning model for predicting dropout in school systems. In this methodology, we introduce necessary steps to develop a robust model to estimate the individual risk of each student to drop out of school. As advancement from previous research, this proposal focuses on analyzing individual trajectories of students, incorporating the student situation at school, family, among other levels, changes, and accumulation of events to predict dropout. Following the methodology, we create a model for the Chilean case based on data available mostly through administrative data from the educational system, and according to known factors associated with school dropout. Our results are better than those from previous research with a relevant sample size, with a predictive capability 20% higher for the actual dropout cases. Also, in contrast to previous work, the including non-individual dimensions results in a substantive contribution to the prediction of leaving school. We also illustrate applications of the model for Chilean case to support public policy decision making such as profiling schools for qualitative studies of pedagogic practices, profiling students’ dropout trajectories and simulating scenarios.

Similar content being viewed by others

research proposal school dropout

The public–private debate: school sector differences in academic achievement from Year 3 to Year 9?

Sally A. Larsen, Alexander Q. Forbes, … William L. Coventry

research proposal school dropout

Educational data mining: prediction of students' academic performance using machine learning algorithms

Mustafa Yağcı

research proposal school dropout

Private funding in Australian public schools: a problem of equity

Greg Thompson, Anna Hogan & Mark Rahimi

Avoid common mistakes on your manuscript.

1 Introduction

School failure has a central place in educational systems due to its enduring effects on students. This happens when the school system fails to ensure that students reach certain levels of schooling, experiencing grade repetition and temporary or definitive dropout from school (OECD, 2020 ). This results in adults with low qualifications or students who do not complete their schooling at the secondary level.

School failure has moved from a vision that sees school dropout as a problem associated with the students’ —attributing responsibility to them— to one that understands it as an expression of a systemic problem where school system and society are also responsible (OECD, 2010 ).

1.1 Impact of school failure and its impact on students

As evidence states, school dropout permanently penalizes students and the whole society in aspects such as:

Low qualification jobs, lower skills to face the labor world and lower productivity (Gil et al., 2019 ; Lee-St John et al., 2018 ; Sahin et al., 2016 ) and higher unemployment (Lee-St John et al., 2018 ; Sahin et al., 2016 ).

Lower income (Lee-St John et al., 2018 ), higher poverty level and lower life expectancy (Sahin et al., 2016 ),

Lower pensions (Dussaillant, 2017 ) and higher economic burden to the State for social protection concepts (Höfter, 2006 ; Levin et al., 2012 ).

Higher crime rates (Lee-St John et al., 2018 ; Sahin et al., 2016 ), lower social cohesion and citizen participation (Sahin et al., 2016 ).

Lower economic growth and − in social terms − lower tax payments (Gil et al., 2019 ; Lee-St John et al., 2018 ).

1.2 Dropping out in the world

School failure is a structural problem in most societies. In OECD, the average percentage of adults between 25 and 64 years old whose maximum level of education is lower secondary OECD is 27% and Chile is slightly above with 35%. However, many other countries such as Colombia, Mexico, and Spain report higher dropout rates (OECD, 2020 ), as shown in Fig.  1 .

figure 1

Lower secondary as higher level attained for 25–64-year-old adults, showing differences by gender. Own elaboration based on OECD ( 2020 ) data

Additionally, due to COVID-19 pandemic, about 24 million learners, from pre-primary to university level, are at risk of not returning to school following the education disruption (UNESCO, 2020 ). For this, societies must address proactively all the drivers of educational exclusion to strengthen the resilience of education systems in the face of this crisis (UNESCO, 2020 ).

1.3 Risk factors for school dropout

The causes of dropout are associated with the students, their family, the school, educational system, and elements of the context or social environment where they are (Boniolo & Najmias, 2018 ; Weybright et al., 2017 ). Since this is a gradual and cumulative process, some indicators warn of this disengagement risk even from early in the school trajectory (Boniolo & Najmias, 2018 ). We can classify these risk factors at:

Individual level: such as school repetition and overage (Boniolo & Najmias, 2018 ; Hirakawa & Taniguchi, 2021 ), school attendance (Hirakawa & Taniguchi, 2021 ; Sahin et al., 2016 ), academic performance (Hirakawa & Taniguchi, 2021 ) and specific learning needs (Gil et al., 2019 ). Socioemotional factors are also included, like attitudes towards learning (Hirakawa & Taniguchi, 2021 ; Sahin et al., 2016 ; Zaff et al., 2017 ), non-academic problematic behaviors (Weybright et al., 2017 ) and school mobility (Sahin et al., 2016 ). As well as sociodemographic factors, such as gender, ethnicity, and nationality (Hirakawa & Taniguchi, 2021 ; Lee-St John et al., 2018 ).

Family level: such as socioeconomic status and parental involvement (Adelman et al., 2018 ; Boniolo & Najmias, 2018 ; Lee-St John et al., 2018 ; Sahin et al., 2016 ).

School level: its characteristics, socioeconomic and sociocultural composition (Hirakawa & Taniguchi, 2021 ), its resources (Dussaillant, 2017 ; Ecker-Lyster & Niileksela, 2016 ), the relationship between students and teachers (Gil et al., 2019 ) and participation in school activities (Gil et al., 2019 ).

Extra-school level considers community factors, e.g., the geographic location of residences, families and the condition of their housing, access to playgrounds, green areas, or “urbanity” (Zaff et al., 2017 ), having a network of high-achieving and aspirational peers (Hirakawa & Taniguchi, 2021 ). And it also includes contextual factors, understood as potential “pull factors” that incentivize early job attachment (Kattan & Székely, 2017 ).

1.4 Predictive models for school dropout based on machine learning

Machine Learning is a discipline that employs algorithms to automate tasks like classification and regression. Algorithms learn from known datasets (training sample) to estimate the true value of a target variable using predictor variables. To evaluate the performance of these results, they are contrasted with the real values in out-of- sample data (test or validation sample) (Sorensen, 2019 ).

A meta-analysis on academic literature and case studies on machine learning applications to predict dropout between 2013 and 2017 found that algorithms such as neural networks or decision trees are mainly used for the dropout prediction as a binary classification exercise on the dropout/non-dropout dichotomy (Mduma et al., 2019 ).

For predicting school dropout, researchers chose algorithms from the family of decision trees such as CART (Jena & Dehuri, 2020 ), and decision trees ensembles (Bentéjac et al., 2021 ). Sorensen ( 2019 ) elaborated a decision tree model to estimate, considering records in students’ last year of primary education, dropout in the secondary level predicting 63.3% of actual dropout cases using only academic and individual factors.

Similar dropout prediction models have been used to develop Early Warning Systems (EWS). These systems allow decision makers to identify in time students at risk of dropping out, to react to this notification and, eventually, to help potential dropouts to continue with their learning processes at different levels (Lee & Chung, 2019 ).

The main difficulty in large-scale dropout prediction is related to the severe imbalance of the phenomenon (Lee & Chung, 2019 ). Therefore, it is necessary to apply corrections, choosing an adequate model performance evaluation metrics and selecting a machine learning algorithm whose flexibility allows overfitting reduction (Lee & Chung, 2019 ; Sansone, 2019 ).

1.5 Purpose and structure of this article

Given the high consequences and impact of school dropout and the surge of the problem due the school closure during COVID pandemic (Khan & Ahmed, 2021 ; Pereira de Souza et al., 2020 ), our objective is to develop a methodology to design, develop, and evaluate a machine learning model for predicting dropout in school system. The aim of this methodology is supporting and guiding models’ development by practitioners and policy makers, − specially from Latin American and African countries (UNESCO, 2020 ) where the student dropout is higher than other regions − to implement national or subnational Early Warning Systems (EWS) to identify student with higher risk of abandoning their studies.

This methodology produces necessary steps to develop a robust model to estimate the individual risk of each student to drop out of school, generating applications to support public policy decision making. As advancement from previous research, this proposal focuses on analyzing individual trajectories of students, incorporating the student situation at multiple levels (school, family, among others), and changes and accumulation of events to predict dropout. In this way, we shift the computational from the machine learning model to the trajectories’ calculation, what is, a one-time development comparing to multiple trainings of models. Since machine learning model are less transparent (Sorensen, 2019 ) in this paper we provide a reliable option to explain results and how they depend on the context.

We develop a model for the Chilean educational system to illustrate a practical case, which is relevant for three reasons. First, reduction of school dropout has been a policy for the last decade; second, data quality permits sophisticated analysis for machine learning approach; and finally, Chile is a medium income country, therefore this experience could be useful for similar countries or others with less development level.

This paper is structured as follows. In Section 2 , we present the methodology defining the dimensions of robustness and describing every step to develop a predictive model. In Section 3 , we address the model development for the Chilean education system presenting the main results. In Section 4 , we present public policy applications, to end discussing the implications of this methodology in Section 5 .

2 A methodology for predicting school dropout using machine learning

The aim of this methodology is to produce a robust model to estimate the individual student risk for dropping out of school, to answer the research question stated in Section 1.5 . A robust model is one which fulfill the following criteria (Studer et al., 2021 ):

Has good general performance in the chosen metrics, allowing practical use in the context of application.

Is stable: the performance doesn’t depend on assumptions, imputed data, creation of training and test samples and has good general avoiding under and overfitting.

Is computational effective: has reasonable computational times for training and prediction, depending on the context of application.

Is easy to maintain, requiring the minimum variables to predict results, allowing to obtain data, and creating every case straightforward for training and predicting purposes.

Its explanations are consistent with the dominion of the model. The variables’ importance and its variance explanations are consistent with literature about the topic.

The methodology comprises the following steps, as they are shown in Fig.  2 .

figure 2

Steps of the proposed methodology for design, development and evaluating a machine learning model for predicting dropout. Each step answers specific questions about model robustness and produces an outcome for the next phase

In the following subsections, we will describe every step.

2.1 Step 1: Creating student trajectories

We define as the objective of the model to predict the first time where students leave their school (regular dropout) to avoid consequences stated in Section 1.1 . Therefore, every case should be codified to train the algorithm. Contemplating the risk factors identified previously (Section 1.3 ), we propose to codify each student’s history in the school system through a continuous range of years in a single vector of data. This is because each trajectory comprises the student situation at multiple levels (school, family, among others), and changes and accumulation of events are relevant to predict dropout (Kattan & Székely, 2017 ). Thus, we can identify clearly both dropping out and protective factors throughout the educational cycle after 12 years.

Every case should be labeled as dropout (1, positive class) or not (0, negative class) to use binary classification. Since we can verify in the data if they in a given year ( \(i\) ) are not enrolled in any school in the next year ( \(i+1\) ). In that case, we label that student as a dropout in year \(i\) .

Several datasets should be considered to incorporate dropout factors at individual, family, school, and extra school levels. This will facilitate an explanation of the model in step 6. How many variables associated with these factors will depend on the availability and reliability of data in the school system, being the most important challenge to face in the first place.

To reduce errors, data should be carefully cleaned. If there are several sources of data, we should perform several consistency analyses to ensure reliability of data: e.g., consistency of date of birth, sex, enrollment in schools in each year through the period analyzed. If there is some data that cannot be found, and we need to impute it (e.g., results of surveys of income and education of parents to create a socioeconomic status), we need to analyze the impact of chosen imputation methods on results in step 4.

2.2 Step 2: Creating training and test samples and choose performance metrics

For this kind of problems were there are a temporal prediction, we will not use a traditional sample construction which divide all the cases in a proportion such as 80% for training and 20% for testing. In this instance, we have dropouts until year \(t\) , and we need to predict if a given students will leave school at year \(t+1\) . Therefore, the sample and testing samples will follow the same logic (Sorensen, 2019 ).

Dropping out is a phenomenon naturally imbalanced since significantly fewer students abandon school than graduate. Thus, specific solutions for training unbalanced data should be used (Mduma et al., 2019 ). Two options to deal with the imbalance are undersampling the majority class (non-dropouts) or oversampling the minority class (dropouts).

Since each case codifies a student trajectory, we propose to create the training and test samples as follows. To include in the training sample cases to compare the variables from trajectories that lead to regular dropouts with those that don’t, we create a set of counterfactual trajectories for each student who drops out. For each student who drops out in a grade-year ( \(i,m\) ), we generate trajectories belonging to students of the same cohort Footnote 1 who don’t drop out or do at a later grade-year ( \(i+j,m+n\) ) with \(j\ge 0,n\ge 1\) . Hence, all these trajectories were abbreviated multiple times based on their counterfactual similarity (evaluated by year and grade reached) of a dropout case (see Fig.  3 ).

figure 3

Example of two counterfactual trajectories for student A, who begins primary education in 2004 and dropouts in 6 th grade in 2009. Since students B and C (which also began education in 2004) graduate, or dropout in a higher grade and year, their counterfactual trajectories are calculated to the same grade and year of student A

Subsequently, we will create a training sample until a year \(t\) considering all the dropouts at year \(i\) ( \(i\le t\) ) plus a random undersampling of the total counterfactual trajectories, creating a sample where dropout cases have higher proportion than the natural prevalence of the phenomenon, e.g., 30%, 40% or 50%. Additionally, some stratification criteria to subsample contrafactual trajectories can be used also based on some variables such as grade, sex, socioeconomic status, or schools’ categories to ensure their representativity. The impact of these stratification procedures on results should be also tested on step 4.

For the test sample, all the trajectories which should be on school in a year \(t+1\) are included to measure the performance of the model prediction. Test sample remains unbalanced.

Imbalanced models should be also evaluated with appropriate performance metrics. We consider the following performance metrics for binary classification:

Recall : is the class hit rate with respect to the total number of real cases belonging to that class. The false negative rate is 1 − recall. Minimizing false negatives, it ensures students could potentially drop out are detected.

Precision : is the class hit rate with respect to the total number of predictions for a class. The false positive rate is 1 − precision.

Sensitivity : Recall of the positive class in a binary classifier.

Specificity : Recall of the negative class in a binary classifier.

F1 score: Harmonic mean between the precision and the recall of a class, in this case, the positive one.

To balance results of true positive and negative rates, we use the geometric mean between the sensitivity and specificity (GM Score) to measure performance in the test sample (Márquez-Vera et al., 2016 ). Also, we consider the recall and precision of both classes, and the F1 score of the positive class (Mduma et al., 2019 ).

2.3 Step 3: Selecting predictive algorithms and train model

In the third step, we choose an algorithm to train the model. There are several algorithms to create the model such as Decision Trees and its ensembles, SVM machines, neural networks between others (Şara et al., 2015 ). However, international experiences in the application of machine learning for the prediction of school dropout strongly suggest the use of decision tree ensemble algorithms since:

They are better suited to deal with both continuous and categorical variables (Jena & Dehuri, 2020 ).

They have shown a robust performance in exercises of a similar nature (Lee & Chung, 2019 ; Sansone, 2019 ; Sorensen, 2019 ).

Ensemble decision trees use strategies to avoid overfitting (Bentéjac et al., 2021 ).

In this regard, there are a set of decision trees with gradient boosting ensembles such as eXtreme Gradient Boosting ( XGBoost ) (Chen & Guestrin, 2016 ) , Light Gradient Boosting Machine ( LightGBM ) (Ke et al., 2017 ) and Categorical Boosting ( CatBoost ) (Prokhorenkova et al., 2019 ) .

2.4 Step 4: Results and sensitivity analysis

The performance obtained with the test sample, should be carefully analyzed to discard under or overfitting. Overfitting occurs where performance is very good with the training sample but bad with test sample, and underfitting when performance is bad in both samples (Dos Santos et al., 2009 ).

In previous steps, several imputing methods and assumptions were made, and the quality of the obtained trajectories may vary. Other decisions include how stratified random undersampling methods were used to create the training sample. Impact of these assumptions on the stability of the results should be tested.

Also, model could have better results in some contexts, for example in some levels or categories of schools (e.g., public vs private, urban vs rural). These contexts can determine the limitations of the model or where could be used with more confidence.

2.5 Step 5: Model improvement

As outcome from step 4, we obtained an initial model. If such a model has good results in terms of performance the question which arises is: can we refine our model making it easy to maintain and with better computer performance in the training and predicting tasks? Producing a model easy to main means reducing the quantity of variables involved, finding a subset which enables us to make predictions at the same level of performance metric. Less variables will reduce both the effort to create the trajectories for the training and simplify obtaining data for prediction.

There are at least three algorithms to discard and determine relevance of each variable in estimating results:

Naïve Recursive Feature Elimination (RFE): eliminates variables recursively until the minimum number that maximizes the performance of the model (given an objective function) is obtained (Misra & Yadav, 2020 ). In this case, we propose to use the GM score as an objective function (see Section  2.2 ).

Boruta: evaluates the importance of each variable with respect to a permuted version of it to determine its relevance (Kursa et al., 2010 ).

Shap RFE: is a modified version of RFE that identifies more robustly the importance of each variable using SHAP (see Section  2.6 ) (Lundberg et al., 2019 ; Sharma et al., 2020 ).

Alternatively, each machine learning algorithm has some parameters (hyperparameters) which can affect both performance results and computation time to train the model. There are optimization hyperparameters algorithms based on brute force, Bayesian statistics, genetic algorithms, among others. In the present work, we consider two of them:

Tree Parzen Estimator (TPE): is a semi-random optimization algorithm improving performance by analyzing the history of parameters already used, seeking the optimization of a loss function based on Parzen Estimators (Bergstra et al., 2011 ).

Population Based Training (PBT): is an evolutionary mechanism where generations of hyperparameter configurations are created. Then, PBT evaluates their performance and selects the best ones, creating a new generation of configurations with changes with respect to previous one, repeating the process until algorithm stops after a given number of iterations, or no improvement appears (Jaderberg et al., 2017 ).

A last option for model improvement is using a corrective model to decrease the false positive rate after variable reduction and hyperparameters optimization. Reducing false positive rate decreases students identified wrongly as possible leavers. For this, a second model is trained with true positives and false positives results of previous years. Thus, to correct the original model to predict dropping out on \(t+1\) , the corrective model is trained until year \(t-1\) . These two models are applied successively and if both agree that students are positive cases, then the overall results are positive as shown in Fig.  4 .

figure 4

Improvement of the optimized model created in step 4, using a second corrective model

2.6 Step 6: Explaining the model

For the public accountability, no discrimination and transparency criteria in decision making where automatic systems are involved should be fulfilled (Buenadicha et al., 2019 ). Algorithmic discrimination refers where discrimination occurring in real world is reproduced in data environments, e.g., by gender or ethnic. Algorithm transparency refers to data they collect, how they manage it, how they analyze it, with whom they share it, what decisions are made based on it and based on what factors.

Therefore, after the best model is obtained, an explanatory model to understand how the model makes its predictions should be created. We propose using SHAP ( SHapley Additive exPlanations ) (Lundberg et al., 2019 ), because this method allows us to estimate the contribution of each variable to individual predictions in a robust, consistent, and locally accurate way (Lundberg et al., 2018 ). It uses an optimized procedure for tree-based algorithms allowing interpreting and debugging the resultant model (Sharma et al., 2020 ; Yoshida, 2020 ).

Thus, the output of this explanatory model is the probability decomposed into the specific contribution of each variable. Hence, all variable contributions for a given student sums his/her probability of dropping out. Using this method, students and schools can be profiled based on the contribution of each variable in the final probability of dropout (Section  4 ).

3 Using proposed methodology to Chilean education system

For better understanding of the model development, we first present an overview of the Chilean education system (Section  3.1 ). From Section  3.2 to 3.7 , we develop a model for the Chilean case.

3.1 Overview of Chilean education system

In Chile, compulsory education lasts 12 years. Grades 1 st to 8 th are for primary education and grades 9 th to 12 th are for secondary education, with three cycles: first cycle (1 st – 4 th grade), second cycle (5 th – 8 th grades) and third cycle (9 th – 12 th grades).

In the early 1980s, Chile implemented a school choice system, introducing a per- student subsidy mechanism (voucher scheme). The per-student subsidy is the same for public and private schools meant to cover the school’s operating costs. Students can attend the school of their choice without administrative boundaries restrictions. This policy was supposed to stimulate competition between schools to attract and retain students, leading to improved efficiency and higher quality educational services (Ladd & Fiske, 2020 ).

There are mainly three school categories of schools Footnote 2 : a) public schools , funded by the per-student subsidy paid by the state and run by each of 345 municipalities, b) private-voucher schools , funded by the per-student subsidy paid by the state and operated by the private sector, and c) private fee- paying schools , financed solely by fees paid by parents, and run by the private sector (Ladd & Fiske, 2020 ).

An ongoing system-wide reform in public school education calls for de-municipalization of the public-school sector. This creates 70 new Local Education Services (LES) between 2018 and 2025, consolidating administration of schools formerly under mayoral control (Anderson et al., 2021 ). These 70 LES respond to a new agency responsibility for public schools: The Directorate of Public Education.

Figure  5 shows global dropout incidence rate. The highest one appears when students transit from primary to secondary education.

figure 5

Global dropout per level in 2019, showing differences by gender. Grades are shown according to International Standard Classification of Education (UNESCO, 2012 ). Source: Own Elaboration based on Chilean Ministry of Education open data

Students can enroll in adult education, which accepts over 15-year-olds to primary or over 17-years-olds to secondary education. Since also they can enroll in more than one grade per year, this is a de facto alternative to avoid regular dropout. National evidence shows enrollment in this modality increases at higher grades. In 2019, 57,130 students left regular education to adult modality. In contrast, only 36,230 students dropped out the same year.

3.2 Step 1: Creating student trajectories

With the available datasets, we created student trajectories as an analyzable artifact. This consists of three procedures to produce a single observation summarizing a time ordered sequence of each student's transit from the first grade of regular primary education to the last reported period. These procedures are: 1) Dataset collection, 2) Determining student sequences; and 3) summarizing sequences into trajectories.

3.2.1 Datasets collection

In previous works on measuring and predicting school dropout, the data were collected from surveys and using administrative sources to obtain longitudinal data. For example, Sorensen ( 2019 ) and Lee and Chung ( 2019 ) used data from administrative and secondary sources to identify and quantify variables associated with students' situations.

We chose to use administrative data, obtained from secondary sources collected, organized, and published by Ministry of Education (MINEDUC) since 2004 in its open data platform, and to a lesser extent, from data related to the Education Quality Measurement System (SIMCE Footnote 3 ) census tests as well as the parents and students’ surveys made available by the Agency for Quality in Education (AQE) for research purposes (Table 1 ).

In these datasets, all students are identified anonymously by masking their National Identification Number (Masked ID or MID). Thus, individual data can be cross-referenced, and we can trace the trajectories of every student.

Using administrative data for assembling student trajectories is a great opportunity to identify trends and patterns that lead to dropout. Still, some limitations need to be considered. Mainly the exclusion of certain factors identified as relevant in the literature, but difficult to measure or non-existent in administrative data, like, e.g., contextual factors or non-academic problem behaviors.

When we consolidated all this data, we found several inconsistencies through the years, such as implausible birth years, data inconsistencies for the same student, gender discrepancies, academic statuses reported without enrollments and vice versa, students skipping grades, students graduated on a non-final grade and mismatches on the grade reported in SIMCE/PDSI datasets.

To address these problems, these situations were operationalized to subsequently assess the consistency of the reported history for each student (see Table 2 ).

3.2.2 Determining students’ sequences

We generate sequence tables composed of time-ordered series, for each student, where the student's situation is described with respect to their trajectory: enrollment status, dropout incidence, among others. For each student in the sequence table, we calculate new variables relevant to the model such as changes of grade, changes of school between or within the same year and grade repetitions. The procedure is as follows:

We standardized the available information on enrollment and academic status. These operate as articulating axes of the sequences to allow traceability. Then, we completed them by assigning data from other sources.

We assigned each student to a cohort from the first available period (2004).

After cohort assignment, the base sequences are created with enrollment and academic status of the students, matching enrollment-academic status pairs available for each year and MID. When there is no either enrollment or academic status, fictitious enrollment and academic status data are created by duplicating the available case and filling in the unavailable columns with missing values.

When there is more than one enrollment or academic status, we define the following criteria to identify the unique enrollment- academic status pair to represent the period within the sequence (Table 2 ).

After performing the above procedure, it was possible to trace the sequences of each student in the cohorts from the year 2004, obtaining sequence tables. However, it was only possible to create trajectories for students entering the first year of primary education in 2004 and, therefore, the number of students per year whose trajectory is feasible increases each successive year and stabilizes after 12 years, when the students of the 2004 cohort reached their last grade.

Even after 12 years, it is not possible to create the trajectory of all students because it is not possible to identify their cohort of origin, which occurs, for example, in the case of foreigners who do not start school in the national system, however, a traceability rate of 96.4% is achieved.

3.2.3 Summarizing sequences into student trajectories

Given the amount of data available, we opted for traditional supervised machine learning methods to generate the predictive model over larger scale alternatives traditionally used in forecasting exercises, such as models based on neural networks. This is because such models work with training samples larger than those available, and they have lower interpretability.

Then, we reduced the sequence tables of each student to a single observation describing their passage through school education. To this end, we generate a student trajectory, adding variables created from grouping the sequences table, summarizing the student's final situation, their most frequent values in some variables (for instance, number of public schools attended) and other elements related to risk factors identified in the literature. The socioeconomic status is included in the family level risk factor, and it was calculated as the mean of the standardized declared household income and the maximum standardized parental schooling. In this case, a multilevel imputation was performed to deal with the high number of missing cases. Additionally, we included other sequence descriptors, allowing us to capture relevant milestones of the trajectory summarized, such as the last year or grade reached.

Since the original raw data contains inconsistencies resulting from the data collection procedures, we create a score to evaluate the quality of the trajectories and analyze the consequences of considering s with lower consistency. We define 15 inconsistency indicators in 3 levels: from the data reported on enrollment and performance (10 indicators, level 1), from datasets provided by MINEDUC (2 indicators, level 2) and from data reported by other sources (3 indicators, level 3) (complete criteria are available in Table 9 in Appendix  A ). The consistency score was normalized with mean 0 and standard deviation 1, the distribution per cohort is shown in Fig.  6 .

figure 6

Inconsistencies per cohort. Every color line represents an inconsistency level that increases by severity. The consistency of the student trajectories by cohort has gradually improved in the last 10 years making levels two or three infrequent. Values are in log 10 scale

Each trajectory operationalizes regular dropout, which is where a student enrolled in some grade for children and youth on year \(t\) is either enrolled in adult education or out of the school system on year \(t+1\) . We also include consistency descriptors to control and evaluate the quality with which the trajectories are calculated with the procedure described in Section  2.3 . Thus, 111 variables were considered and grouped according to its type (Table 3 ), including SIMCE and IDPS ones. Complete variable descriptions are available in Appendix B .

Using administrative data limits the availability of contextual or family variables compared to more readily available individual and school data. At the end, we generated 3,847,469 student trajectories.

Using these trajectories, a first visual exploratory analysis allows us to recognize differences of performance on dropout of the different schools on LES territories, by school dependency and total school enrollment (Fig.  7 ). As Fig.  7 shows, there are territories with performance worse than regression predicted and should be the focus of public policies.

figure 7

Regression models built based on students’ trajectories by total enrollment in school categories. Several public schools on LES territories underperform. The private-voucher schools in the same territory have better results with greater enrollment

3.3 Step 2: Training and test samples and performance metrics

As we explained in Section  2.2 , we create a training and test sample with trajectories until 2018, and 2019 respectively.

The training sample was created with data until 2018. This sample is imbalanced since contains 3,847,469 trajectories of which 345,874 (8.9%) lead to dropout, with an imbalance ratio of 10.12. To deal with this problem, as it is proposed in Section  2.2 , we create contrafactual trajectories for students who drop out. We opted for a stratified subsampling using four variables: gender, category of the last school, last year and registered grade. This reduced the negative class from 26,793,262 counterfactuals to 345,874, amount equal to the number of dropouts. Therefore, the training sample has 691.748 cases, with an imbalance ratio of 1.

The test sample uses all student trajectories that reached 2019 and were (or not) dropouts in 2020, totaling 2,802,156 trajectories of which 47,632 (1.7%) lead to dropout with an imbalance ratio of 57.82. There is no intersection between training and test samples since their variables were constructed until different years. We will report recall and precision for each class, looking for better performance in GM and F1 scores. Given the sample sizes, it is unnecessary to use cross validation.

3.4 Step 3: Selecting predictive algorithms

We produced machine learning models using a basic decision tree algorithm as the simplest model and then we also tried 3 decision tree ensemble algorithms with gradient boosting: XGBoost , LightGBM and CatBoost (see Section  2.3 for justification). We trained and tested them using the samples created in the previous section.

3.5 Step 4: Results and Sensitivity analysis

The results of the four algorithms on the test sample considering 103 variables without missing data (excluding SIMCE and PSDI scores) are shown in Table 4 .

The performance of the tree algorithms with gradient boosting is superior to the classic CART decision tree. LightGBM is slightly superior to CatBoost in GM score. This indicates that while the CatBoost model achieves better performance in terms of recall for class 1 (which means fewer false negatives), it also has a higher false positive rate. Hereafter, the LightGBM model will be referenced as the base model.

3.5.1 Stability of performance on trajectories consistency, training sample creation and SIMCE and PDSI scores

Several models were also trained considering the internal consistency score of each trajectory. We concluded that it is necessary to consider all cases since less consistent trajectories also indicate a higher prevalence of dropout and discarding them does affect the final performance of the model.

The stability of the performance was evaluated for 100 random different samples of contrafactual trajectories. The greater variation was just 0.014 for the F1-score, as can be seen in Fig.  8 .

figure 8

Distribution and range of model performance results in multiple training samples

Variables related to performance on SIMCE tests and PDSI scores were also considered. but their contribution to the performance of the model was very low compared to the cost of obtaining these datasets and the high amount of missing data.

3.5.2 Performance in different grades, schools’ categories, and sizes

Since the base model is stable in trajectory consistency, random choice of counterfactuals and, SIMCE and PDSI variables did not introduce significant performance improvements, we finally address the question of how the base model performs in different contexts, defined by combinations of grades and school categories, thinking in its practical use (Table 5 ).

Performance tends to improve at higher grades (where the natural prevalence of dropping out in Chile is higher, as Fig.  5 shows) and in public schools (as shown in Fig.  7 ), with the best performance in secondary for public schools, and the worst relative performance in primary education in the private fee-paying sector. Despite that, these results are better than those from previous research with a relevant sample size (Lee & Chung, 2019 ), and a predictive capability 20% higher for the actual dropout cases, also considering the advantage of addressing the problem of classroom imbalance.

If we analyze the classification error of this model based on school size, public and private voucher schools follow the same patterns. Figure  9 shows the results for public schools, where error is minimal for false negative rates of any school size and false positive rates decreasing for large schools from 500 students.

figure 9

False negative rates ( left ) and false positive rates ( right ) by public school size for secondary and 7 th to 12 th grade

3.6 Step 5: Model improvement

To determine the relevance of each variable in estimating each student’s dropout probability and because the naïve version of RFE has problems in dealing with noise from irrelevant variables, we used two feature selection methods (see Section  2.5 ): 1) first using Boruta and then applying naïve RFE and 2) applying ShapRFE . Both approaches proved more effective than using naïve RFE, which discarded only 64 of the 103 original variables in contrast to the proposed methods which discarded 83 and 87 respectively. In Table 6 , we compare the performance of the three models with 103, 30 and 26 variables.

Both methods allowed to create simpler and more efficient models maintaining performance. Table 7 shows the contribution of the variables selected by the two previous methods by its type (as in Table 3 ), considering two values: 1) the aggregated contribution, which is the sum of the importance of each variable in the set, and 2) the average contribution, which is the aggregated contribution divided by the number of variables per type. The contributions of only the 26 relevant variables per type are depicted in Appendix B .

Individual level factors made the greatest contribution, consistent with literature. School factors and trajectories’ descriptors are also relevant in both approaches. Therefore, in contrast to previous work, the inclusion of non-individual dimensions results in a substantive contribution to the prediction of school dropout.

Since our final model is just trained in 16 s on a desktop computer (Table 6 ) and it takes less than 1 s to predict 2.8 million cases, we considered this a reasonable performance, and we did not optimize the hyperparameters of the LightGBM algorithm.

Finally, we generated a corrective model using the procedure described in Section  2.5 . For that, we took the 26 variables and identified true and false positives until 2018. False positives were codified as 0 and true positives as 1. The results of the correction are shown in Table 8 .

In all grades and school categories, recall scores for class 0, precision scores for class 1 and all the F1 scores improved to the minimal detriment of recall scores for class 1. In terms of absolute quantities, for secondary education in public schools, false positives diminished from 6.1% to 4.9% and for private voucher schools came from 6.16% to 5.15%. In the case of 7 th – 12 th grades, the reduction was from 6.35% to 5.25% in public schools, and from 6.1% to 5.19% in private voucher schools.

3.7 Step 6: Explaining the model

The SHAP method decomposes each individual probability prediction into the specific contribution of each variable. Thus, all variable contributions for a given student sums their probability of dropping out. SHAP values were computed from the initial model without the false positive correction. Figure  10 shows the contribution of each variable of the final model for two cases, one where the model predicts a high probability (0.99) and other a lower one (0.01).

figure 10

Individual variable contribution for two cases using SHAP values. SHAP values per variable for a student with low dropout probability (0.01) are shown in green , while SHAP values for a student with high dropout probability (0.99) are shown in red

Figure  11 shows the individual contribution of each variable selected for the final model with 26 variables for all 2019 cases.

figure 11

Beeswarm plot for final model. Every point shows the impact of each variable in a dropout prediction per student in the 2019 test sample. The colors denote the value of the variable in its own scale (high values in red, low ones in blue). Absolute mean contributions to predictions are ordered from left (higher) to right (lower). Variable codes are in Appendix B

For example, LAST_GRADE_APPRVD is a binary variable indicating if a student passed (1) or not (0) their last year at school. Figure  11 shows us two things: 1) LAST_GRADE_APPRVD is the most important variable in predicting dropping out and, 2) in all the cases lower LAST_GRADE_APPRVD values (0, shown in blue) have a positive contribution while higher values (1, shown in red) have negative contribution to dropping out probability. This analysis can be repeated for each school, allowing to identify the most important variables for dropout at local level. For example, the school on Fig. 12a  has a 0.5% dropout rate while the school on Fig.  12b ) has a 26.6% dropout rate. Variable importance ranks are different between schools and contribute in different ways.

figure 12

Beeswarm profiles of two schools with different dropout rates. The colors denote the value of the variable in its own scale (high values in red, low ones in blue). The school on the left (a) has a dropout rate of 0.5% while the school on the right (b) has 26.6%. Absolute mean contributions to predictions are ordered from above (higher) to below (lower). Variable codes are in Appendix B

Further implications for public policy will be discussed on Section  4.1 .

4 Public policy applications

The straightforward application of this model is developing an EWS. But as it was stated in Section  1.5 , we can also envision other applications of these models for decision and public policy making. These are: profiling schools for qualitative studies of pedagogic practices, profiling students’ dropout trajectories and simulating scenarios.

4.1 Profiling schools for qualitative studies of pedagogic practices

As it was stated in Section  2.6 , model explanations allow to guide further qualitative research about pedagogic practices. Results of the explained model can guide qualitative studies in schools. For example, the variable CL_STUDENT is a binary one indicating if a student is Chilean (1) or not (0) (see Appendix B ). As general results of the test sample show (Fig.  11 ), being foreign student increases your chance of dropping out. If we analyze a school with a low rate of dropout (Fig.  12a ), it is indifferent if a student is Chilean or not since the contribution of the variable to the dropout probability is negative. However, in the school of Fig.  12b , CL_STUDENT is the variable with most importance and being foreign has a positive contribution. Therefore, pedagogic practices with foreign students can be investigated further in both schools, and the question which arises is: what are the pedagogic practices that can be replicated (school a) or avoided (school b) in similar contexts?

Additionally, any significant difference in the quantities of dropouts expected at school or LES level could be indicative of changes in local policies for school retention with better or worse results.

4.2 Profiling students’ dropout trajectories

In second place, since SHAP values for every variable are continuous, we used clustering algorithms to identify typologies of trajectories leading to dropout. We used the 39,844 true positives’ SHAP values calculated for the year 2019 in a clustering model.

SHAP values were rescaled to adjust them to a range between -1 and 1, preserving the directionality of the predictions, but normalizing the different impact level of every variable. Since the excessive dimensionality of the data (26 variables), we used UMAP (McInnes et al., 2020 ) to reduce the information to only two. From this, 20 clusters were found using DBSCAN (Ester et al., 1996 ). The detailed characterization of clusters based on the original domain of each variable can be found in Table 10 in Appendix.

There are 3 main categories of trajectories: 1) where students completed and approved their last level (23.5%); 2) where students completed their last level but did not approve (30.4%), and 3) where students did not complete last level (46.1%). As Fig.  13 shows, within these 3 categories there are also subcategories based on just 5 variables: student is Chilean (CL_STUDNT), PPS beneficiary (PPS_BENFNC), Overage (OVRAGE), Last grade on school (LAST_GRADE) and Number of abandonments in the last cycle (NUM_ABN_LAST_CYCL). Category 1 has 5 clusters; Category 2 has 6 clusters and Category 3 has 9 clusters.

figure 13

Typology of trajectories based on the clustering model. Every square indicates a division by the variable indicated. Bifurcation to left is to lesser values and to the right to greater values

The 67% of the students’ trajectories are concentrated just in seven clusters (Table 9 ): one of category 1 (cluster 6), two of category 2 (clusters 0 and 5) and four of category 3 (clusters 1,4,10 and 13). In these clusters predominates the school categories, grade and sex expected according to incidence of the phenomenon (see Figs. 5 and 7 ). They have the following characteristics (from greater to lesser trajectories) according to variables in Table 12 in Appendix  C :

Cluster 4 (13.0%): Last grade not completed, mainly students in 9 th grade, almost only Chileans, with averages: SES of 0.22, attendance in last cycle of 82.13%, z score of − 0.99, repetition of 2.1, changes of schools of 2.47 and last school effectiveness of 41.96%.

Cluster 5 (11.7%): Last grade completed but nor approved, mainly students in 9 th and 10 th grade, almost only Chileans, with averages: SES of 0.24, attendance in last cycle of 75.7%, z score of − 1.64, repetition of 2.87, changes of schools of 2.51, last school effectiveness of 43.42%.

Cluster 1 (11.6%): Last grade not completed, mainly students in 10 th and 11 th grade, almost only Chileans, with averages SES of 0.31, attendance in last cycle of 85.37%, z score of − 0.63, repetition of 0.55, changes of schools of 2.09, last school effectiveness of 45.57%.

Cluster 0 (9.4%): Last grade completed but nor approved, mainly students in 9 th grade, almost only Chileans, with averages: SES of 0.29, attendance in last cycle of 74.52%, z score of − 1.66, repetition of 1.56, change of schools of 1.91, last school effectiveness of 46.26%.

Cluster 6 (7.3%): Completed and approved, mainly students in 7 th and 8 th grade, almost only Chileans, with averages: SES of 0.25, attendance in last cycle of 85.35%, z score of − 0.88, repetition of 2.20, change of schools of 2.42, last school effectiveness of 43.51%.

Cluster 10 (7.1%): Last grade not completed, mainly students in 4 th grade, only Chileans, with averages: SES of 0.47, attendance in last cycle of 88.53, z score of − 0.08, repetition of 0.16, change of schools of 0.59, last school effectiveness of 50%.

Cluster 13 (7.0%): Last grade not completed, only foreigners in 1 st and 2 nd grade, almost only Chileans, with averages: SES of 0.33, attendance in last cycle of 92.56%, z score of + 0.05, repetition of 0.03, change of schools of 0.08, last school effectiveness of 45.26%.

In all these clusters, the last schools were predominant public except in clusters 0 (48.6% voucher vs 42.7% public schools), 1 (44.5% voucher vs 44.2% public schools) and 10 (45.6% voucher vs 32.7% public schools). Apparently, the clustering model grouped in Cluster 13, all trajectories of foreign students which changed their identity number from a provisional to the official one. This causes an abnormal incidence of dropout in first and second grades since these trajectories were truncated by an administrative anomaly.

As Sansone ( 2019 ) verified, the heterogeneity of students at risk of dropping out through this kind of unsupervised learning, allowing to identify subpopulations among students and, thus, to design programs appropriate to each group, understanding both their peculiarities and key factors associated with their situation, so that policymakers could benefit from exploiting this to customize the treatment of each cluster of students.

4.3 Simulating scenarios: External shocks

In third place, predictive models can be used to evaluate impact in dropping out of external shocks, such as an economic recession, natural catastrophe, or a pandemic.

In this case, we present a simulation of the effect of a pandemic such as the COVID-19 in the increase of dropping out following the methodology described on Fig.  14 . The shock is created by applying scenario assumptions which alter the input data (scenario data), and the results of the model are compared in a base case (unchanged data) with the scenario data. Since we know the prediction error of the model (see Tables 5 and 8 ), we can correct final quantities to avoid overestimation.

figure 14

Procedure for simulating a scenario using the predictive model. In stage 1, the prediction is created as business usual (base). In step 2, the original dataset is altered according to the scenario assumptions, creating a modified dataset which is used for prediction (scenario). In step 3, since the error of prediction is known, the results are corrected using that generating a difference on dropout

For illustrative purposes, we analyzed the effect of diminishing attendance in the marginal increase of dropout. If we assume that all variables behave the same as 2019 where students attended in person and we just correct replace individual LAST_CYCL_AVG_ATTNDNC variable by a fixed factor, we obtain results shown in Fig.  15 .

figure 15

Results of a simulating of decreasing attendance by a given factor in additional dropouts. Note that factor a zero factor conduces to repetition, but not necessarily to dropout

On first semester of 2022, monthly attendance data from Mineduc shows that it is approximately 9% lower (equivalent to a factor of 0.91) in average compared to 2018 and 2019, for either public or private voucher schools. Therefore, without any intervention and this tendency remains and does not worsen, the simulation estimates 10,501 additional dropouts at end of the year 2022.

5 Discussion

In this paper, we proposed a methodology to design, develop, and evaluate a predictive model for regular school dropout using: 1) individual student trajectories as individual cases; 2) procedures for creating training and test samples, and choosing performance metrics considering class imbalance; 3) machine learning algorithms for this kind of problems; 4) sensitivity analyses to test dependency of results on previous assumptions, and determine contexts where the model works better; 5) methods to reduce variables improving maintenance and reducing false positives, and; 6) explanatory techniques to calculate the individual contribution of each variable to dropout probability.

Following the methodology, we develop a model for the Chilean case (Section  3 ) based on data available mostly through administrative data from the educational system, and according to known factors associated with school dropout. Our results are better than those from previous research with a relevant sample size (Lee & Chung, 2019 ), with a predictive capability 20% higher for the actual dropout cases. Also, in contrast to previous work, the inclusion of non-individual dimensions results in a substantive contribution to the prediction of leaving school. Contrary to Sorensen ( 2019 ), who found better results using SVM, Gradient based boosting decision trees worked best for us. Therefore, the importance of trying different algorithms in step 3.

Long-term policies can be devised to manage risk factors, such as academic lag, for reducing that prevalence in future cohorts of students. At school level, the model can identify students with higher dropout risk requiring support and protection strategies to ensure positive school trajectories. For example, those who have recently repeated, have high levels of absenteeism, have accumulated more than one repetition and are over-aged. In Chile, this is exacerbated when the student is male, migrant or has started his education overseas. Results show that these efforts will have greater impact in public schools, with lower socioeconomic levels from secondary education. As can be seen, these are all indicators that are easy to construct at school level. Also, in the case of Chile, these analyses will be useful for the Directorate of Public Education to understand the challenges of the territories that will become part of new public education soon.

The major contributions of this study are:

As Sorensen ( 2019 ) states, machine learning is less transparent and technological demanding. However, techniques like SHAP proposed in this paper provides a reliable option to explain results and how they depend on the context. Cloud computing infrastructure also reduces significantly computational cost, but, in our case, it was not necessary. This is because the burden of computational cost is shifted from the machine learning model to the trajectories’ calculation, what is, a one-time development comparing to multiple trainings of models.

The public policy applications envisioned in Section  4 , to inform public policies such as profiling schools for qualitative studies of pedagogic practices, profiling students’ dropout trajectories and simulating the impact of events such as pandemics or natural disasters. Simulations estimate the decreasing/increasing of dropout, providing information for calculating the return of investment of public policies on school retention.

Some limitations of this study are that the administrative nature of the available data limits the possibility of transforming the prediction into concrete action and, at the same time, gives a constrained vision of the school trajectory. In addition, since certain data are hard to obtain, it is difficult to assess their potential contribution to the predictive value.

For the Chilean dropout prediction, future work includes developing a model for predicting dropout within the same year. This was not actually possible with public data available since attendance and grades of students are not reported monthly. Another challenge is adapting the model for years 2020 and 2021 where students received mostly remote classes during COVID-19 pandemic. Attendance was measured differently (if students attended at least one online class at day) and curriculum was shortened and adapted to circumstances. Therefore, the continuity of measurement in attendance and school performance broke and they should be considered as additional and separated variables in the model. Additionally, the pandemic had an impact on socio-economic status because of parents' unemployment or death and until today there is not an actualized income data since SIMCE test and surveys were suspended in 2020 and 2021.

School failure was a diminishing problem, but the pandemic of COVID-19 will push the poorest students outside the system, especially women. Therefore, developing EWS systems with evidence-based strategies at school and territorial level should be carried out, to prevent children from abandoning their studies. The methodology proposed comprises the necessary steps to develop models with high predictive power if proper data is available.

We expect that the methodology and case presented in this article helps practitioners and public decision makers to create their own models to predict school failure, but also motivates them to capture, clean and systematize data to allow developing such kinds of systems.

Data availability

Specially thanks to the Center of Studies from the Ministry of Education, Agency of Quality of Education and JUNJI for providing special datasets to develop the model.

The data that support the findings of this study are available from Ministry of Education – Open data platform, but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Ministry of Education, Agency of Quality of Education and/or JUNJI.

Year where a student enrolls in first grade.

There is a fourth school category (delegated administration) where schools have a mechanism of funding by charters, with a basal funding to public property schools whose administration is delegated to private agents (Browne, 2017 ). Nevertheless, since there are only 70 schools in this category (41,578 students in 2019, 1.4% of total same year students) and notorious differences with respect to the ownership, funding, and administration of the schools, we decided to omit it from most of the reports in this article.

https://www.agenciaeducacion.cl/simce/

Adelman, M., Haimovich, F., Ham, A., & Vazquez, E. (2018). Predicting school dropout with administrative data: New evidence from Guatemala and Honduras. Education Economics, 26 (4), 356–372. https://doi.org/10.1080/09645292.2018.1433127

Article   Google Scholar  

Anderson, S., Uribe, M., & Valenzuela, J. P. (2021).Reforming public education in Chile: The creation of local education services. Educational Management Administration & Leadership , 1741143220983327. https://doi.org/10.1177/1741143220983327 .

Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54 (3), 1937–1967. https://doi.org/10.1007/s10462-020-09896-5

Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for Hyper-Parameter Optimization. Advances in Neural Information Processing Systems , 24 . https://papers.nips.cc/paper/2011/hash/86e8f7ab32cfd12577bc2619bc635690-Abstract.html .

Boniolo, P., & Najmias, C. (2018). School dropout and school lag in Argentina: A social classes approach. Tempo Social , 30 (3), 217–247. https://doi.org/10.11606/0103-2070.ts.2018.121349 .

Browne, M. (2017). Análisis del Sistema de Administración Delegada creada por el DL N o 3166 de 1980. Ministerio de Educación-SETP . http://biblioteca.digital.gob.cl/handle/123456789/897 . Accessed 20 Aug 2022.

Buenadicha, C., Galdon, G., Hermosilla, M., Loewe, D., & Pombo, C. (2019). La gestión ética de los datos . Inter-American Development Bank. https://doi.org/10.18235/0001623 .

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785–794. https://doi.org/10.1145/2939672.2939785 .

Dos Santos, E. M., Sabourin, R., & Maupin, P. (2009). Overfitting cautious selection of classifier ensembles with genetic algorithms. Information Fusion, 10 (2), 150–162. https://doi.org/10.1016/j.inffus.2008.11.003

Dussaillant, F. (2017). Deserción escolar en Chile. Propuestas para la investigación y la política pública. Documento N o 18 , 1–18. Available at: https://gobierno.udd.cl/cpp/files/2020/10/18-Deserción.pdf . Accessed 20 Aug 2022.

Ecker-Lyster, M., & Niileksela, C. (2016). Keeping Students on Track to Graduate: A Synthesis of School Dropout Trends, Prevention, and Intervention Initiatives. The Journal of at-Risk Issues, 19 (2), 24–31.

Google Scholar  

Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining , 226–231. Available at: https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf .

Gil, A. J., Antelm-Lanzat, A. M., Cacheiro-González, M. L., & Pérez-Navío, E. (2019). School dropout factors: A teacher and school manager perspective. Educational Studies, 45 (6), 756–770. https://doi.org/10.1080/03055698.2018.1516632

Hirakawa, Y., & Taniguchi, K. (2021). School dropout in primary schools in rural Cambodia: School-level and student-level factors. Asia Pacific Journal of Education, 41 (3), 527–542. https://doi.org/10.1080/02188791.2020.1832042

Höfter, R. H. (2006). Private health insurance and utilization of health services in Chile. Applied Economics, 38 (4), 423–439. https://doi.org/10.1080/00036840500392797

Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W. M., Donahue, J., Razavi, A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., & Kavukcuoglu, K. (2017). Population Based Training of Neural Networks. ArXiv:1711.09846 [Cs]. http://arxiv.org/abs/1711.09846.

Jena, M., & Dehuri, S. (2020). DecisionTree for Classification and Regression: A State-of-the Art Review. Informatica , 44 (4), 4. https://doi.org/10.31449/inf.v44i4.3023 .

Kattan, R. B., & Székely, M. (2017). Analyzing Upper Secondary Education Dropout in Latin America through a Cohort Approach. Journal of Education and Learning, 6 (4), 12–39. https://doi.org/10.5539/jel.v6n4p12

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: a highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 3149–3157

Khan, M. J., & Ahmed, J. (2021). Child education in the time of pandemic: Learning loss and dropout. Children and Youth Services Review, 127 , 106065. https://doi.org/10.1016/j.childyouth.2021.106065

Kursa, M. B., Jankowski, A., & Rudnicki, W. R. (2010). Boruta – A System for Feature Selection. Fundamenta Informaticae, 101 (4), 271–285. https://doi.org/10.3233/FI-2010-288

Article   MathSciNet   Google Scholar  

Ladd, H., & Fiske, E. (2020). International perspectives on school choice . Routledge.

Lee, S., & Chung, J. Y. (2019). The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences, 9 (15), 3093. https://doi.org/10.3390/app9153093

Lee-St John, T. J., Walsh, M. E., Raczek, A. E., Vuilleumier, C. E., Foley, C., Heberle, A., Sibley, E., & Dearing, E. (2018). The Long-Term Impact of Systemic Student Support in Elementary School: Reducing High School Dropout. Aera Open , 4 (4). https://doi.org/10.1177/2332858418799085 .

Levin, H. M., Belfield, C., Hollands, F., & Bowden, A. B. (2012). Cost-Effectiveness analysis of interventions that improve high school completion . Center for Benefit-Cost Studies of Education 34. https://repository.upenn.edu/cbcse/34 . Accessed 20 Aug 2022

Lundberg, S. M., Erion, G. G., & Lee, S.-I. (2019). Consistent Individualized Feature Attribution for Tree Ensembles. ArXiv:1802.03888 [Cs, Stat]. http://arxiv.org/abs/1802.03888.

Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., Liston, D. E., Low, D.K.-W., Newman, S.-F., Kim, J., & Lee, S.-I. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2 (10), 749–760. https://doi.org/10.1038/s41551-018-0304-0

Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Fardoun, H. M., & Ventura, S. (2016). Early dropout prediction using data mining: A case study with high school students. Expert Systems, 33 (1), 107–124. https://doi.org/10.1111/exsy.12135

McInnes, L., Healy, J., & Melville, J. (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction ( arXiv:1802.03426 ). arXiv. https://doi.org/10.48550/arXiv.1802.03426 .

Mduma, N., Kalegele, K., & Machuve, D. (2019). A Survey of Machine Learning Approaches and Techniques for Student Dropout Prediction. Data Science Journal, 18 , 14. https://doi.org/10.5334/dsj-2019-014

Misra, P., & Yadav, A. (2020). Improving the classification accuracy using recursive feature elimination with cross-validation. International Journal on Emerging Technologies , 11 (3), 659-665.

Şara, N-B., Halland, R., Igel, C., and Alstrup, S. (2015). High-school dropout prediction using machine learning: a Danish large-scale study. In M. Verleysen (Ed.), Proceedings. ESANN 2015: 23rd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (pp. 319-324).

OECD. (2010). Overcoming school failure: Policies that work. OECD project description, (April). Available at https://www.oecd.org/education/school/45171670.pdf

OECD. (2020). Education at a Glance 2020: OECD Indicators . Organisation for Economic Co-operation and Development. https://www.oecd-ilibrary.org/education/education-at-a-glance-2020_69096873-en . Accessed 20 Aug 2022.

Pereira de Souza, C. M., Pereira, J. M., & de Jesus Ranke, M. da C. (2020). Reflexes of the Pandemic in school dropout/exit: The democratization of access and permanence. Revista Brasileira De Educacao Do Campo-Brazilian Journal of Rural Education , 5 , e10844. https://doi.org/10.20873/uft.rbec.e10844 .

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2019). CatBoost: Unbiased boosting with categorical features ( arXiv:1706.09516 ). arXiv. https://doi.org/10.48550/arXiv.1706.09516 .

Sahin, S., Arseven, Z., & Kilic, A. (2016). Causes of Student Absenteeism and School Dropouts. International Journal of Instruction , 9 (1), 195–210. https://doi.org/10.12973/iji.2016.9115a .

Sansone, D. (2019). Beyond Early Warning Indicators: High School Dropout and Machine Learning. Oxford Bulletin of Economics and Statistics, 81 (2), 456–485. https://doi.org/10.1111/obes.12277

Sharma, P., Mirzan, S. R., Bhandari, A., Pimpley, A., Eswaran, A., Srinivasan, S., & Shao, L. (2020). Evaluating Tree Explanation Methods for Anomaly Reasoning: A Case Study of SHAP TreeExplainer and TreeInterpreter. In G. Grossmann & S. Ram (Eds.), Advances in Conceptual Modeling (pp. 35–45). Springer International Publishing. https://doi.org/10.1007/978-3-030-65847-2_4 .

Sorensen, L. C. (2019). “Big Data” in Educational Administration: An Application for Predicting School Dropout Risk. Educational Administration Quarterly, 55 (3), 404–446. https://doi.org/10.1177/0013161X18799439

Studer, S., Bui, T. B., Drescher, C., Hanuschkin, A., Winkler, L., Peters, S., & Müller, K.-R. (2021). Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology. Machine Learning and Knowledge Extraction, 3 (2), 392–413. https://doi.org/10.3390/make3020020

UNESCO. (2012). International Standard Classification of Education ISCED 2011 . UNESCO Institute of Statistics, Montreal. Available at http://uis.unesco.org/sites/default/files/documents/international-standard-classification-of-education-isced-2011-en.pdf . Accessed 20 Aug 2022

UNESCO. (2020). UNESCO COVID-19 education response: How many students are at risk of not returning to school? Advocacy paper. UNESCO Paris. Available at https://unesdoc.unesco.org/ark:/48223/pf0000373992 . Accessed 20 Aug 2022.

Valenzuela, J. P., & Allende, C. (2014). Trayectorias de mejoramiento en el Sistema Escolar Chileno: Las escuelas de educación básica 2002 - 2010. Apuntes sobre Mejoramiento Escolar N°1, Enero 2014. Anillo de Ciencias Sociales sobre Mejoramiento de la Efectividad Escolar en Chile. https://www.mejoramientoescolar.cl/download.php?file=recursos/nota_tecnica.pdf . Accessed 20 Aug 2022.

Weybright, E. H., Caldwell, L. L., Wegner, L., & Smith, E. A. (2017). Predicting secondary school dropout among South African adolescents: A survival analysis approach. South African Journal of Education , 37 (2), 1–11. https://doi.org/10.15700/saje.v37n2a1353 .

Yoshida, S. (2020). Verification of Usefulness of SHAP values in Interpretation of Decision Tree Models . The Japanese Society for Artificial Intelligence. https://confit.atlas.jp/guide/event/jsai2020/subject/3E5-GS-2-04/detail . Accessed 20 Aug. 20022.

Zaff, J. F., Donlan, A., Gunning, A., Anderson, S. E., Mcdermott, E., & Sedaca, M. (2017). Factors that Promote High School Graduation: A Review of the Literature. Educational Psychology Review , 447–476. https://doi.org/10.1007/s10648-016-9363-5 .

Download references

We thank the support from ANID/PIA/Basal Funds for Centers of Excellence FB0003 and ANID-FONDEF IT17I0006 grants.

Author information

Authors and affiliations.

Institute of Education and Center for Advanced Research in Education, Universidad de Chile, Periodista José Carrasco Tapia 75, 8330014, Santiago, Región Metropolitana, Chile

Patricio Rodríguez & Juan Pablo Valenzuela

Center for Advanced Research in Education, Universidad de Chile, Santiago, Chile

Alexis Villanueva

Department of Informatics, Universidad Técnica Federico Santa María, Santiago, Chile

Lioubov Dombrovskaia

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Patricio Rodríguez .

Ethics declarations

Competing interests.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix B List of variables used and their significance in the final model

Rights and permissions.

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rodríguez, P., Villanueva, A., Dombrovskaia, L. et al. A methodology to design, develop, and evaluate machine learning models for predicting dropout in school systems: the case of Chile. Educ Inf Technol 28 , 10103–10149 (2023). https://doi.org/10.1007/s10639-022-11515-5

Download citation

Received : 22 August 2022

Accepted : 05 December 2022

Published : 23 January 2023

Issue Date : August 2023

DOI : https://doi.org/10.1007/s10639-022-11515-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Data science applications in education
  • Machine learning
  • Learning analytics
  • Student retention
  • Find a journal
  • Publish with us
  • Track your research

Lebanon, IN (46052)

Windy. Snow showers in the morning will give way to a mixture of rain and snow for the afternoon. High around 40F. Winds WSW at 20 to 30 mph. Chance of precip 70%. Higher wind gusts possible..

Snow this evening will give way to lingering snow showers late. Low 33F. Winds WNW at 15 to 25 mph. Chance of snow 80%. Snow accumulations less than one inch. Higher wind gusts possible.

Updated: April 3, 2024 @ 7:33 am

  • Full Forecast

site-logo

Indiana State Library houses massive collection…

Indiana emerging as national roundabout leader,…, 'we don't know what to do:' indiana schools str…, trends threaten growth of indiana wine industry, weather alert, special weather statement until wed 2:30 pm edt.

Katie Jenner

Indiana officials propose new ‘streamlined’ high school diplomas for Hoosier students

  • Casey Smith Indiana Capital Chronicle
  • Apr 3, 2024
  • 36 mins ago

A proposal to streamline Indiana’s high school diplomas and reduce options to just two primary graduation paths was recently announced by state education officials.

The plan is part of an ongoing statewide effort to “reinvent” the high school experience and better prepare Hoosiers for their lives post-graduation — whether they want to pursue college or other skills training, or choose to directly enter the workforce.

The new options will take effect beginning with the Class of 2029 — for students that are currently in seventh grade. Indiana Secretary of Education Katie Jenner said some Hoosier schools will likely roll out the revamped graduation requirements sooner, though.

“How do we make the four years of high school as valuable as possible for students? What does that look like in a country where high school education has not changed, for most, in over 100 years? And yet the world around us, technology, is advancing — the world around us is changing,” Jenner said, noting that Indiana’s diploma has not been “significantly updated” since the late 1980s.

“We don’t have a roadmap. A lot of discussion is going to be key — feedback is going to be key — and it will not be perfect out of the gate,” Jenner added. “But we cannot let the perfect get in the way of the good on this. We have to start somewhere.”

The Indiana Department of Education (IDOE) presented the proposal before the State Board of Education (SBOE). Paramount to the plan, according to state officials, is maximized “flexibility” for students to personalize learning pathways and experiences, including with college courses taken while still in high school, as well as the ability to count internships, apprenticeships, military experience and other work-based learning toward their graduation requirements.

If approved, Indiana’s diplomas would, for the first time, be aligned to the state’s current graduation pathways. They would also model five key Indiana Graduate Prepared to Succeed (Indiana GPS) characteristics: academic mastery, career and postsecondary readiness (which includes credentials and experiences), communication and collaboration, work ethic, and civic, financial and digital literacy.

The rule making process to finalize the diploma model is expected to take at least four months, Jenner said. Before the board takes a final vote – which is anticipated in August or September — multiple rounds of public comment and changes to improve the overall plan are expected. By law, board members must give their stamp of approval by December.

“Nobody in the state, higher ed industry, or K-12 is thrilled with the outcomes that we’re getting today,” said Scott Bess, who sits on the education board.

He emphasized that the proposed diploma model places “a high value” on credential attainment and student experiences, “but we’re not telling schools and students how to get them.”

“And that’s really the biggest issue with the current system we have. It’s very prescriptive, Bess continued. “This is the best shot we’ve got, because the diploma drives the majority of the behavior at the high school level, and giving schools flexibility — I have full trust. I don’t think there’s anybody running a school, in any community, that doesn’t have the best interest of their students at heart.”

Two ‘streamlined’ degrees

Currently, Hoosier students can work toward one of numerous diploma designations, including the general, Core 40, academic honors, or technical honors options. Some schools additionally offer the International Baccalaureate diploma, which is dependent on successful completion of specific assessments and examinations during grades 11 and 12.

State education officials conceded that the existing diploma system is outdated and confusing for both parents and students. Jenner said it especially lacks options for students to get hands-on training or earn high-value credentials, given the various course requirements. The move now, however, is to be less “course obsessed.”

As laid out in the proposal, Indiana’s future diplomas would include the “Indiana GPS Diploma” — a more flexible, personalized version of the current Core 40 diploma — and the “Indiana GPS Diploma Plus.”

For all students, regardless of the diploma type they earn, learning in grades nine and 10 would be focused mostly on “essential knowledge and skills.” All students would complete a set of foundational courses covering English, math, science, career and postsecondary preparedness, physical education, and civic, financial and digital literacy. Even so, students would have some ability to choose how the satisfy the priority courses.

Additional flexibility and personalization is then available to students in grades 11 and 12.

For students pursuing the proposed Indiana GPS Diploma, a minimum of 20 additional “points” must be earned through a combination of courses and experiences. Those points can be earned through traditional classes, but also from job training, JROTC or other military experience, and by earning certain scores on the SAT or ACT, for example. Additional points can be earned by taking more rigorous courses, like Advanced Placement (AP) and dual credit classes.

Overall, at least two points must come from math and four from English-related coursework.

On the other hand, students pursuing the proposed Indiana GPS Diploma Plus must complete additional coursework necessary to earn their chosen credential of value. They’ll also be required to complete a “high-quality work-based learning experience.” What those work-based learning experiences can or should look like are still under consideration, according to state education officials.

All students will use their individual graduation plan — already required by law to be completed by the end of eighth grade — to determine initial course sequences in high school.

Indiana will also continue to offer the federally-required alternate diploma, which is designed for students with the most significant cognitive disabilities. Per state statute, no more than 1% of the state’s students can receive the diploma type annually.

“In many respects, this has felt like pushing a massive boulder up a steep mountain because there’s there’s no roadmap anywhere in the nation that we can just grab and take,” Jenner said. “This is going to be one of the toughest, most challenging conversations that we’ve had in years.”

Roughly two-thirds of Indiana high school students earn college credit before graduation. Still, Jenner pointed to thousands of Hoosier students who each year are just a few courses short of earning a credential while still in high school. She said many kids “are not aware” of opportunities in the first place “because we have never systematically set it up this way.”

As of now, only about 5% of Indiana students successfully earn a credential before leaving the K-12 system.

“A lot of that … is because of barriers that they’re facing, requirements that they have to have, a checklist that they have to have, versus really focusing on their purpose and dreams ahead and getting that credential,” Jenner said.

Getting students trained

Recent legislation approved by state lawmakers set the stage for a diploma overhaul that seeks to implement new requirements that are more “flexible and relevant to students, employers, and communities.” Part of the goal, too, is improving access to high-quality work-based learning opportunities and increasing the number of postsecondary credentials earned by students before they graduate from high school.

But before SBOE officials give final approval for the new diploma requirements, the state will host two public comment periods.

The first public comment period will open later this spring, officials said. Hoosiers can provide immediate feedback, too, using IDOE’s online Jotform, which asks for additional solution-ideas, as well tools and resources that would be helpful in implementing new diplomas.

Jenner said the state education department is also thinking about ways to help school counselors handle the changes, in addition to funding asks for the 2025 legislative session.

  • School Systems
  • Institutions
  • The Economy
  • Teaching Methodology
  • Roads And Traffic
  • Mathematics
  • Legislation
  • Grammar And Syntax
  • Bureaucratic Terminology

Trending Video

Recommended for you, featured local savings.

research proposal school dropout

Trending Recipes

Allen, ladonna, winnona steiner, thomas lesperance, sutherlin, paisley, pedigo, cynthia, tyree, william, devol, sr., donald.

  • More Obituaries

Most Popular

  • Euchre tournament for cancer survivors breaks records
  • Lady Eagles top No. 2 Center Grove, improve to 4-0
  • WeBo finishes 2-1 on successful trip
  • Broadway in Indianapolis announces 2024-25 season
  • Stars looking to take next step
  • Lebanon man killed in freak accident
  • Indiana Public Access Counselor opines on redacted invoices for Rokita’s disciplinary case
  • Governor signs controversial voting bill into law
  • Zionsville Mayor John Stehr unveils $250 million South Village project plan
  • Baby biter sentenced to three years

Sorry, there are no recent results for popular videos.

Sorry, there are no recent results for popular commented articles.

research proposal school dropout

Sign up now to get our FREE breaking news coverage delivered right to your inbox.

IMAGES

  1. (PDF) School Dropouts

    research proposal school dropout

  2. (PDF) Teachers’ Perceptions of High School Dropout and Their Role in

    research proposal school dropout

  3. Dropouts for academic problems in pie chart

    research proposal school dropout

  4. Research proposal

    research proposal school dropout

  5. A Research Proposal On School Dropout Essay Example

    research proposal school dropout

  6. (PDF) FACTORS AFFECTING STUDENTS' DECISION TO DROP OUT OF SCHOOL

    research proposal school dropout

COMMENTS

  1. (PDF) School dropout

    show that 3.4% of students dropped out that year. Alternatively, status dropout is the percentage. of 16- to 24-year-olds who are not enrolled in school and have not earned a high school ...

  2. Understanding Why Students Drop Out of High School, According to Their

    Research on school dropout extends from early 20th-century pioneers until now, marking trends of causes and prevention. However, specific dropout causes reported by students from several nationally representative studies have never been examined together, which, if done, could lead to a better understanding of the dropout problem. ...

  3. Student Engagement and School Dropout: Theories, Evidence ...

    Research suggests that many youth who drop out of school had gone through a short- or ... three important gaps emerge between the state of empirical evidence and theoretical proposals. They pertain to (1) the trajectories of student engagement associated with student dropout, (2) the short- and long-term processes of student disengagement ...

  4. PDF School Dropout Indicators, Trends, and Interventions for School ...

    Abstract. School counselors are expected to develop programs that promote academic success for all students, including those at risk for dropping out of school. Knowledge of key indicators of potential dropouts and current trends in dropout prevention research may assist school counselors in better understanding this complex issue.

  5. PDF Why Students Drop Out of School: A Review of 25 Years of Research

    California Dropout Research Project Report #15 October 2008 By Russell W. Rumberger and Sun Ah Lim University of California, Santa Barbara ... persistence.1 Most models have attempted to explain why students drop out of high school. Some have attempted to explain engagement, an important precursor to dropping out. Another

  6. Perspectives on the Factors Affecting Students' Dropout Rate During

    Eckstrom et al. (1986) have given a much broader perspective regarding dropout phenomena which helps us to visualize beyond the factors associated with students like the socio-cultural background of the family, the behavior of teachers and role cum distance from school, and most effectively, the persistent affordability of the family which ...

  7. School dropout prediction and feature importance exploration ...

    Designing early warning systems through machine learning (ML) models to identify students at risk of dropout can improve targeting mechanisms and lead to efficient social policy interventions in education. School dropout is a culmination of various factors that drive children to leave school, and timely policy responses are most needed to address these underlying factors and improve school ...

  8. Risk Factors for Dropping out of High School: A Review of ...

    School dropout is still a relevant problem in the US, Canada, and European countries. The factors that determine, or contribute to, this phenomenon are still not clear and, especially in recent years, have been extensively investigated in the US, using large databases. This article presents a review of international literature from an inter-disciplinary perspective, takes a developmental ...

  9. PDF Research Title: Major Causes of Girls'School Dropout from General

    In this study, Ianalyzed the factors contributing to the dropout of girls and concluded thatthe major factors are (1) economic factors: e.g. poverty and migration to foreign countries to earn (2) household level factors: e.g. child labor (3) school level factors: e.g. distance and (4) cultural factors: e.g. early marriage.

  10. PDF Factors contributing to school dropout among the girls

    Most past reviews on the school dropout issue have been carried out regardless of the gender of the students. A few of the studies in this context have been done on girls' dropout outcome based on particular regions and cultures of the world. In this paper, we identify the factors and illustrate a conceptual model for girls' dropout from school.

  11. School Dropouts

    Raluca Ungureanu, International Journal of Research in Engineering and Social Sciences, ISSN 2249-9482, Impact Factor: 6.301, Volume 07 Issue 1, January 2017, Page 21-27 School Dropouts - A Theoretical Framework Raluca Ungureanu (Alexandru Ioan Cuza University of Iasi, Romania) Abstract: The following article is a fragment of a PhD thesis and is aimed at outlining the main traits of the ...

  12. (PDF) SCHOOL DROPOUTS: REASONS AND PROSPECTIVE SOLUTIONS ...

    The percentage of students who drop out of school has either climbed o r sta yed the same. during the past few years, despite an increase in enrollment rates. The secondary level dropout rate (17% ...

  13. PDF INVESTIGATING THE CAUSES OF SCHOOL DROPOUT IN NINE ...

    SCHOOL DROPOUT Voice of Research | Volume 10 Issue 4, March 2022 | 154 school dropout. It was finally recommended that the Government of Rwanda and school managers should manage to put into consideration all the strategies to fight against school dropout emanated from the findings of this study. Keywords: School dropout and Nine years basic ...

  14. PDF Causes of Student Absenteeism and School Dropouts

    The aim of the study is to investigate the causes of student absenteeism and school dropouts at primary, secondary and high school level in Düzce Province and to develop suggestions for solving these problems. A "case study" design, which is one of the qualitative research approaches, was used in this study.

  15. A Qualitative Study of Unveiling School Dropout Complexity ...

    This study investigates school dropout, particularly the shift to open high schools in Türkiye during the pandemic, through a multi-stakeholder lens. Using grounded theory, data was collected via semi-structured interviews with 12 students, 15 teachers, and 20 school administrators. Results reveal a model linking themes: predictive reasons for transferring to open high school, both pandemic ...

  16. PDF Factors Contributing to Secondary School Dropouts and the Dro- pouts

    The purpose of this study is to determine the factors that cause students drop out of school in secondary edu-cation. This is a descriptive survey study and the quantitative research method was employed. 2985 secondary school dropouts participated in the study and the data was collected through "Secondary School Dropout Qu-estionnaire".

  17. A Research Proposal On School Dropout Essay Example

    A successful school dropout research proposal has to be interesting, up-to-date, informative, logical and brief. A student should present the scope of the research, research approach, predictions and methods used for the research and possible achievements of the paper if he wants to persuade the professor that the topic is worth paying ...

  18. Research Proposal on School Dropout

    A successful school dropout research proposal has to be interesting, up-to-date, informative, logical and brief. A student should present the scope of the research, research approach, predictions and methods used for the research and possible achievements of the paper if he wants to persuade the professor that the topic is worth paying ...

  19. A methodology to design, develop, and evaluate machine ...

    School dropout is a structural problem which permanently penalizes students and society in areas such as low qualification jobs, higher poverty levels and lower life expectancy, lower pensions, and higher economic burden for governments. Given these high consequences and the surge of the problem due to COVID-19 pandemic, in this paper we propose a methodology to design, develop, and evaluate a ...

  20. Indiana officials propose new 'streamlined' high school diplomas for

    A proposal to streamline Indiana's high school diplomas and reduce options to just two primary graduation paths was recently announced by state education officials. The plan is part of an ...

  21. PDF Update to Graduation and Dropout Rate Reporting

    TEA determines each student's status at each district.a. For those who did not return to a district, TEA determines whether the student is a graduate, other leaver, TxCHSE recipient, or dropout. A dropout is a student who attends Grade 7-12 in a public school in a particular school year, does not return the following fall, is not expelled ...