A study of the changes in our education system and their impacts on different educational outcomes

A study of the changes in our education system and their impacts on different educational outcomes

William Ming


A study of the changes in our education system

and their impacts on different educational outcomes


  • William Ming (@CPWill)

from class Sat 7.00pm

Feedback from Instructor

Overall Comments

  • Well done! Comprehensive analysis that is insightful and well-substantiated. It was great to see how you applied a range of skills learnt from the DS102 in this project.
  • Your project was an enjoyable read and I really applaud your effort in documenting your purpose-driven methodology and your code chunks. As a result, I found your notebook very coherent and easy to follow.
  • The stacked bar plot under Step 3 looks great, but you might want to consider having the student progress line plot on another graph. It is mildly confusing for readers to see the plots super imposed, and the different scale on the left and right y axes confounds the analysis a little. I think it could suffice to plot the line plots separately and point out how these trends correlate with what's shown by the stacked bar plot.
  • Do keep in mind that correlation does not imply causality - even when there is strong correlation between two variables, it would be challenging to even make inferences that imply cause-and-effect relationships, such as government education expenditure on literacy rate for instance. Literacy rates could very well be directly improved by government expenditure on supporting underprivileged groups (that improved overall standard of living), or even cultural factors that gave rise to such a trend.
  • Overall great job on the project! :)

Score Breakdown

Component Score
Executive Summary 3/3
Problem Statement 3/3
Methodology 12.5/14
Total 18.5/20

Executive Summary

Singapore has a strong education system which has markedly improved over the decades, largely as a result of changes in the government's education policy. This study investigates how changes in the education system and policies have impacted 3 key educational outcomes - Literacy Rate, % Pass, and progress to higher levels of education. This was done by implementing various statistical and data visualisation methods found in the matplotlib and statsmodels python libraries, such as scatter plots and correlation.

It was found that the government's increased spending on education is correlated with a higher literacy rate. From this, it can be inferred that basic reading education has been improved due to the increased expenditure. Another change in the education system - decreasing class size - has helped more O Level students pass English and Mathematics, but this effect is minimal for Mother Tongue. Hence, it is recommended that research is done to examine the difficulties faced in MTL so as to craft more effective MTL promotion campaigns. Finally, although more teachers have higher academic qualifications, the study on the impacts of this trend is inconclusive. Further research must be done to confirm the various inferences made from the findings of this study.

Research Topic & Hypothesis

Education in Singapore has undeniably improved since the 1980s and this is due in large part to the education policies implemented by the Singapore government. This study aims to understand how different changes in the education policies and system have impacted key educational outcomes. This investigation is structured into 3 parts, each answering one of the following questions:

Part 1: How effective has the increasing education expenditure been in increasing our national literacy rate?
Part 2: Do smaller class sizes help students achieve better academic results?
Part 3: How have the academic qualifications of teachers changed and does this have any impact on students?

Part 1 looks at education at a national level, whereas part 2 and 3 zoom in specifically on secondary school education.

Datasets used:

Part 1:

Part 2:

  • pupils-per-teacher-in-secondary-schools.csv from Data.gov.sg retrieved on 1 Jun 2019
  • percentage-of-gce-o-level-students-who-passed-english-language.csv from Data.gov.sg retrieved on 1 Jun 2019
  • percentage-of-gce-o-level-students-who-passed-english-mathematics.csvfrom Data.gov.sg retrieved on 1 Jun 2019
  • percentage-of-gce-o-level-students-who-passed-mtl.csv from Data.gov.sg retrieved on 1 Jun 2019

Part 3:

  • teachers-in-schools-academic-qualification.csv from Data.gov.sg retrieved on 1 Jun 2019
  • percentage-of-o-level-cohort-that-progressed-to-post-secondary-education.csv from Data.gov.sg retrieved on 1 Jun 2019
  • percentage-of-n-level-cohort-that-progressed-to-post-secondary-education.csv from Data.gov.sg retrieved on 1 Jun 2019

Python libraries and modules used:

In [1]:
#import the relvant libraries and modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm
from sklearn import linear_model
from functools import reduce

%matplotlib inline

Part 1: Education Expenditure and Literacy Rate


In [2]:
#read relevant csv files into dataframes
govt_df = pd.read_csv('government-expenditure-on-education.csv')
literacy_df = pd.read_csv('literacy-rate-annual.csv')

year total_expenditure_on_education
0 1981 942517
1 1982 1358430
2 1983 1611647
3 1984 1769728
4 1985 1812376
year total_expenditure_on_education
count 37.000000 3.700000e+01
mean 1999.000000 5.656212e+06
std 10.824355 3.732883e+06
min 1981.000000 9.425170e+05
25% 1990.000000 2.056374e+06
50% 1999.000000 4.857488e+06
75% 2008.000000 8.229694e+06
max 2017.000000 1.268000e+07
year level_1 value
0 1960 Literacy Rate (15 Years & Over) 52.6
1 1961 Literacy Rate (15 Years & Over) 53.8
2 1962 Literacy Rate (15 Years & Over) 55.2
3 1963 Literacy Rate (15 Years & Over) 56.7
4 1964 Literacy Rate (15 Years & Over) 58.4
year value
count 59.000000 59.000000
mean 1989.000000 83.361017
std 17.175564 13.194618
min 1960.000000 52.600000
25% 1974.500000 75.400000
50% 1989.000000 88.400000
75% 2003.500000 93.650000
max 2018.000000 97.300000
In [3]:
#merge the 2 dataframes, only including years for which both expenditure and literacy data are available
govt_lit_df = govt_df.merge(literacy_df, how = 'inner', on = 'year')

#express expenditure in millions
govt_lit_df['expenditure_in_millions'] = govt_lit_df['total_expenditure_on_education']/1000000

#remove the unnecessary columns and rename the columns
govt_lit_df.drop(columns = ['level_1','total_expenditure_on_education'], inplace = True)
govt_lit_df.columns = ['year','literacy_rate','expenditure_in_millions']

year literacy_rate expenditure_in_millions
0 1981 83.1 0.942517
1 1982 83.8 1.358430
2 1983 84.4 1.611647
3 1984 85.0 1.769728
4 1985 85.7 1.812376


First, line charts are plotted for government expenditure on education vs year and literacy rate vs year to examine how each variable has changed over the years. By plotting on a shared x-axis, we can see how both variables change over the same period of time.

In [4]:
fig = plt.figure(figsize = (16,8))
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()

govt_lit_df.plot(kind='line', marker = 'o', color = 'r', x = 'year', y = 'expenditure_in_millions', 
                 ax = ax2, grid = True, xticks = range(1981,2019,2), yticks = range (0,17, 2))
govt_lit_df.plot(kind='line', marker = 'x', color = 'b', x = 'year', y = 'literacy_rate', ax = ax1, 
                 yticks = range(82,99,2), grid = True)

ax1.set_ylabel('Literacy Rate (%)', fontsize = 13, labelpad = 15)
ax2.set_ylabel('Total Expenditure on Education (million SGD)', fontsize = 13, rotation = 270, labelpad = 20)
ax1.set_xlabel('Year', fontsize = 13)

ax1.legend(['Literacy Rate (%)'], fontsize = 13, loc = (0.02,0.93))
ax2.legend(['Total Expenditure on Education (million SGD)'], fontsize = 13, loc = (0.02,0.88))

ax1.set_title('Change in Total Expenditure on Education and Literacy Rate from 1981 to 2017', fontsize = 16)

As shown in the graph, from 1981 to 2017, both government expenditure on education and literacy rate have increased. The former suggests that the government is putting a greater focus on improving education while the latter suggests that improvements in basic reading and writing education have been successful, translating to higher literacy rates.

Although both variables have increased in tandem over the years, there are certain outliers in this pattern. For example, from 1983 to 1989, government expenditure on education remained relatively constant around 1.8 million SGD, yet literacy rate rose from around 84.5% to around 88.5% over this same period of time. A similar phenomenon is seen from 2000 to 2005. This could be explained by other factors that affect the literacy rate. Further research would be required in this area to determine which factors are more important.

Nonetheless, there is still an obvious positive correlation between government expenditure on education and literacy rate. Therefore, we shall quantify the strength of this correlation by finding the correlation coefficient and visualise this by drawing the regression line.

In [5]:
#create input data for OLS model
X = govt_lit_df[['expenditure_in_millions']]
y = govt_lit_df['literacy_rate']

#statsmodel OLS model
X = sm.add_constant(X)
model = sm.OLS(y,X)
result = model.fit()
govt_lit_df['fitted_LiteracyRate'] = result.fittedvalues
print('The Pearson product-moment correlation coefficient is ',
OLS Regression Results
Dep. Variable: literacy_rate R-squared: 0.859
Model: OLS Adj. R-squared: 0.855
Method: Least Squares F-statistic: 212.7
Date: Wed, 12 Jun 2019 Prob (F-statistic): 1.94e-16
Time: 01:45:54 Log-Likelihood: -68.617
No. Observations: 37 AIC: 141.2
Df Residuals: 35 BIC: 144.5
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 85.7899 0.479 179.121 0.000 84.818 86.762
expenditure_in_millions 1.0348 0.071 14.583 0.000 0.891 1.179
Omnibus: 5.091 Durbin-Watson: 0.093
Prob(Omnibus): 0.078 Jarque-Bera (JB): 4.856
Skew: -0.865 Prob(JB): 0.0882
Kurtosis: 2.603 Cond. No. 12.6

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The Pearson product-moment correlation coefficient is  0.9266459815516203

An R-squared value of 0.859 and a correlation coefficient of +0.93 implies that there is a strong positive correlation between government expenditure on education and Singapore's literacy rate. The small (or 0) p-value for both the t-tests and F-test suggest that this correlation is significant.

In [6]:
fig2 = plt.figure(figsize=(12,6))
ax3 = fig2.add_subplot(111)

govt_lit_df.plot(kind = 'scatter', x = 'expenditure_in_millions', y = 'literacy_rate', ax = ax3)
govt_lit_df.plot(kind = 'line', x = 'expenditure_in_millions', y = 'fitted_LiteracyRate', ax = ax3, color = 'r')
ax3.set_title('Literacy Rate(%) vs Expenditure on Education(in million SGD)')
ax3.set_xlabel('Government Expenditure on Education (in million SGD)')
ax3.set_ylabel('Literacy Rate(%)')

As expected from the summmary statistics, there is a strong positive correlation between government expenditure on education and literacy rate. Most of the points in the scatter plot are evenly distributed around the line of best fit (coloured red in the figure above).

Part 2: Class Size and Academic Performance

As mentioned above, part 2 focuses on secondary school education. In this section, average class size will be measured by using the pupils-to-teacher ratio (PTTR). A smaller PTTR means that there are fewer students per tear and hence implies a smaller average class size. Academic performance will be measured by the percentage of O Level students who passed the subject. The subjects being studied are English, Mathematics and Mother Tongue.


In [7]:
#read relevant csv files into dataframes
pttr_df = pd.read_csv('pupils-per-teacher-in-secondary-schools.csv')
english_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-english-language.csv')
math_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-mathematics.csv')
mtl_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-mtl.csv')

display(pttr_df.head(), pttr_df.describe(),
#only english_df.head() is displayed as the other 2 subject dataframes are similar
year sec_pupil_to_teacher
0 1981 20.6
1 1982 20.6
2 1983 20.7
3 1984 22.6
4 1985 21.8
year sec_pupil_to_teacher
count 37.000000 37.000000
mean 1999.000000 18.775676
std 10.824355 3.328113
min 1981.000000 11.600000
25% 1990.000000 17.900000
50% 1999.000000 19.600000
75% 2008.000000 21.200000
max 2017.000000 23.100000
year race percentage_passed_olevel_el
0 1997 Malay 59.8
1 1997 Chinese 69.9
2 1997 Indian 79.8
3 1997 Others 79.2
4 1997 Overall 69.4
In [8]:
#merges all three subject dataframes on the 'year' and 'race' columns
def merge_df(df1,df2):
    return df1.merge(df2, how = 'inner', on = ['year','race'])
subject_list = [english_df,math_df,mtl_df]
subject_dfs_list = [x[x['race']=='Overall'] for x in subject_list]
merged_subject_df = reduce(merge_df,subject_dfs_list)

year race percentage_passed_olevel_el percentage_passed_olevels_math percentage_passed_olevels_mtl
0 1997 Overall 69.4 85.2 95.0
1 1998 Overall 70.2 84.9 93.7
2 1999 Overall 73.8 86.3 94.4
3 2000 Overall 76.3 87.1 94.6
4 2001 Overall 79.8 86.6 96.2
In [9]:
#merge the PTTR dataframe with the subjects dataframe and drop the unnecessary column
pttr_subject_df = pttr_df.merge(merged_subject_df,how='inner',on='year')
pttr_subject_df = pttr_subject_df.drop(columns='race')

year sec_pupil_to_teacher percentage_passed_olevel_el percentage_passed_olevels_math percentage_passed_olevels_mtl
0 1997 19.8 69.4 85.2 95.0
1 1998 20.1 70.2 84.9 93.7
2 1999 19.4 73.8 86.3 94.4
3 2000 19.2 76.3 87.1 94.6
4 2001 19.6 79.8 86.6 96.2


A heatmap of the correlation between all pairs of variables in pttr_subject_df will be used to examine the strenght of correlation between PTTR and academic performance. Scatter plots are then used to further visualise this correlation.

In [10]:
#get an array of the pearson correlation coefficients between all variables in the dataframe
correlation_df = pttr_subject_df.corr()

#plot a heatmap
fig3 = plt.figure(figsize=(14,14))
ax4 = fig3.add_subplot(211)
sns.heatmap(correlation_df, cmap = 'RdBu_r', ax = ax4)
ax4.set_title('Correlation between year, PTTR and percentage O level passes by subjects', fontsize = 16)

#plot the various scatter plots
ax5 = fig3.add_subplot(234)
pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevel_el', ax=ax5)
ax5.set_title('Percentage O Level EL pass vs PTTR')

ax6 = fig3.add_subplot(235)
pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevels_math', ax=ax6)
ax6.set_title('Percentage O Level Math pass vs PTTR')

ax7 = fig3.add_subplot(236)
pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevels_mtl', ax=ax7)
ax7.set_title('Percentage O Level MTL pass vs PTTR')

fig3.subplots_adjust(wspace = 0.5,hspace = 0.8)

#correlation coefficients for reference
print('Pearson Correlation Coefficient between:')
print('1) PTTR and Percentage of O Level Students who passed English: ',
print('2) PTTR and Percentage of O Level Students who passed Math: ',
print('3) PTTR and Percentage of O Level Students who passed MTL: ',
Pearson Correlation Coefficient between:
1) PTTR and Percentage of O Level Students who passed English:  -0.7163134536635376
2) PTTR and Percentage of O Level Students who passed Math:  -0.7773872779478243
3) PTTR and Percentage of O Level Students who passed MTL:  -0.33356330988327393

This heatmap and the negative sign in the correlation coefficient imply that overall, PTTR is inversly correlated with pass percentages for the three O Level subjects. The scatter plots for English and Mathematics supports this, whereas the scatter plot for Mother Tongue Language (MTL) seems to suggest that there is very little correlation between pass percentage for O Level MTL and PTTR.

Overall, the percentage of O Level students who passed English, Mathematics and MTL has increased as the PTTR in secondary schools decreased over the years. This may be because a smaller PTTR (implying a smaller class size) encourages student participation and ensures that each student gets more individual attention from teachers. This correlation is slightly stronger for Mathematics than English. However, this correlation is much weaker when it comes to MTL. There are likely other factors that are impeding improvements in MTL, such as the quality of MTL education, teachers and teaching methods. Moreover, the strong correlation found for English and Mathematics do not necessarily mean that PTTR is the main cause for the better performance. Further research must be done to determine what has improved students' performance in these subjects.

It is also worth noting that the increases in percentage pass for English and Mathematics over the years are much larger than the increase in percentage pass for MTL (seen in the heatmap). This may be due to the increasing focus on STEM (Science, Technology, Engineering and Mathematics) and the success of english-promoting campaigns such as the Speak Good English Movement. The improvements in English and Mathematics are particularly important as our increasingly globalised and digitalised economy necessitates better communication and analytical skills.

On the other hand, this observation also suggests that government efforts to promote the learning of MTL were not very effective. Further research could be done to examine how the learning of MTL could be further improved. It is important to continue to protect our MTLs alongside English due to the cultural value, especially in Singapore's multi-racial society.

Part 3: Academic Qualifications of Teachers and Students' Progress to Post-Secondary Education


In [11]:
#read the relevant csv files into dataframes
teacher_acad_df = pd.read_csv('teachers-in-schools-academic-qualification.csv')
o_postsec_df = pd.read_csv('percentage-of-o-level-cohort-that-progressed-to-post-secondary-education.csv')
n_postsec_df = pd.read_csv('percentage-of-n-level-cohort-that-progressed-to-post-secondary-education.csv')

display(teacher_acad_df.head(), o_postsec_df.head())
#note MF means Male
year sex level_of_school academic_qualification number_of_teachers
year race percentage_o_level_progressed_to_post_secondary_education
0 2008 Malay 96.9
1 2008 Chinese 97.8
2 2008 Indian 96.4
3 2008 Others 89.6
4 2008 Overall 97.4
In [12]:
#filter for secondary school teachers only
teacher_acad_df = teacher_acad_df[teacher_acad_df['level_of_school'] == 'SECONDARY'].drop(columns='level_of_school')

#filter for overall percentage rather than percentages for each race 
#and merge for all years between the two datasets
o_postsec_df = o_postsec_df[o_postsec_df['race'] == 'Overall'].drop(columns='race')
n_postsec_df = n_postsec_df[n_postsec_df['race'] == 'Overall'].drop(columns='race')
postsec_df = n_postsec_df.merge(o_postsec_df, how = 'outer', on = 'year')

display(teacher_acad_df.head(), postsec_df)
year sex academic_qualification number_of_teachers
2 1982 MF BELOW GCE 'O' LEVEL 68
3 1982 F BELOW GCE 'O' LEVEL 23
8 1982 MF GCE 'O' LEVEL 2574
9 1982 F GCE 'O' LEVEL 1141
14 1982 MF GCE 'A' LEVEL/DIPLOMA 2500
year percentage_progressed_to_post_sec_education percentage_o_level_progressed_to_post_secondary_education
0 2007 88.3 NaN
1 2008 88.8 97.4
2 2009 89.9 97.4
3 2010 90.8 97.8
4 2011 92.4 97.8
5 2012 93.7 98.0
6 2013 95.1 98.1
7 2014 95.3 98.1
8 2015 95.7 98.2
9 2016 95.7 98.6
10 2017 NaN 98.1
In [13]:
#find the total number of teachers with each academic qualification in each year
teacher_acad_df = teacher_acad_df.groupby(['year','academic_qualification']).sum().reset_index()
teacher_acad_pivot = teacher_acad_df.pivot(index = 'year', columns = 'academic_qualification', values = 'number_of_teachers')

#flatten the multilevel index to prepare for plotting stacked bar graph
teacher_acad_flat = teacher_acad_pivot.rename_axis(None,axis=1).reset_index()

#create a new dataframe to populate with percentage values for percentage stacked bar graph
teacher_acad_percent = pd.DataFrame(teacher_acad_flat['year'])

qual_list = list(teacher_acad_df['academic_qualification'].unique())
teacher_acad_flat['Total'] = 0
for qual in qual_list:
    teacher_acad_flat['Total'] += teacher_acad_flat[qual]
for qual in qual_list:
    teacher_acad_percent[qual + ' %'] = teacher_acad_flat[qual]/teacher_acad_flat['Total']*100

0 1982 91 4082 3715 1147 148 3760 4
1 1983 31 3867 3344 1162 166 3692 4
2 1984 30 3439 2524 1252 165 4014 2
3 1985 25 3290 2479 1336 166 4510 2
4 1986 23 2999 2259 1519 155 4952 2
0 1982 0.702866 31.528539 28.693906 8.859195 1.143122 29.041477 0.030895
1 1983 0.252731 31.526170 27.262351 9.473341 1.353334 30.099462 0.032610
2 1984 0.262559 30.098022 22.089970 10.957465 1.444075 35.130404 0.017504
3 1985 0.211721 27.862466 20.994241 11.314363 1.405827 38.194444 0.016938
4 1986 0.193131 25.182635 18.968847 12.755059 1.301537 41.581997 0.016794


In [14]:
fig4 = plt.figure(figsize = (20,14))
ax5 = fig4.add_subplot(111)
ax6 = ax5.twinx()

#percentage stacked bar graph to show change in academic qualifications of secondary school teachers
teacher_acad_percent[teacher_acad_percent['year'] >= 2007].plot(kind = 'bar', x = 'year', stacked=True, ax = ax5)

#line graphs to show change in percentage of O/N Level students who progressed to post secondary education
postsec_df.plot(kind='line', y = 'percentage_o_level_progressed_to_post_secondary_education',
                  ax = ax6, color = 'b', lw = 5)
postsec_df.plot(kind='line', y = 'percentage_progressed_to_post_sec_education',
                  ax = ax6, color = 'k', lw = 5)

#set various properties of the graph

ax5.set_title("Teachers' academic qualifications and Students' progress to post secondary education from 2007 to 2017",
             fontsize = 18)
ax5.set_xlabel("Year", fontsize = 14)
ax5.set_ylabel("Percentage of Teachers with each academic qualification", fontsize = 14)
ax6.set_ylabel("Percentage of O/N Level Students", fontsize = 14, rotation = 270, labelpad = 20)

ax5.legend(loc=2, fontsize = 10)
ax6.legend(['% progressed to post secondary education (O LEVELS)',
            '% progressed to post secondary education (N LEVELS)'], 
            loc = 1, fontsize = 10)


From 2007 to 2017, the academic qualifications of secondary schools teachers have improved, with a larger percentage of teachers having degrees. More specifically, the percentage of teachers with Masters and Honours degrees has increased. This has been accompanied by a very slow/minimal increase in the percentage of O Level students who went on to post-secondary education (97.8% to 98%), as well as a relatively sharp increase in the percentage of N level students who went on to post-secondary education (88% to 96%).

The latter phenomenon could be due to more academically-qualified teachers being able to use their greater knowledge in helping students better grasp and maintain interest in school subjects, encouraging them to pursue post-secondary education. However, overall, this investigation does not seem to be very conclusive, and further research should be done to see how the academic qualifications of teachers might impact the students.

Summary and Conclusions

To summarise, the following insights were drawn from the 3 part investigation:

  • there is an increasing trend in Singapore's literacy rate
  • government expenditure on education and singapore's literacy rate are strongly positively correlated
  • class sizes (measured by pupil to teacher ratio) and the percentage pass for English and Mathematics in secondary schools have a negative correlation
  • class sizes do not have much correlation with percentage pass for Mother Tongue
  • percentage pass for English and Mathematics have increased greatly over the years, whereas it is minimal for MTL
  • the academic qualifications of secondary school teachers have improved
  • the change in percentage of students who progress to post-secondary education vary widely between streams

From these insights, the following inferences could be made, although more research would be required to confirm these inferences:

  • the government's expenditure on education has likely been effective in improving basic reading and writing education, hence improving literacy rate
  • smaller class sizes may result in better academic results due to greater attention being given to each student
  • campaigns promoting the English Language and the focus on STEM have been successful, at least in students
  • further improvements may be required for campaigns promoting MTLs
  • the impact of teachers' academic qualifications on students is largely inconclusive

Based on this, further research in the following areas is recommended:

  • the effect of smaller class sizes on academic results in other educational levels, and on other non-academic indicators
  • the problems that students face in learning MTL (so as to craft more effective campaigns and policies to promote MTL)
  • whether teachers' academic qualifications have any impact on students in other educational levels
August 12, 2020 Published by  William Ming-

Related Topics

Road Safety in Great Britain

Road Safety in Great Britain

Read more

Obesity in America

Read more
Analysis of Singapore's Human Freedom Index as compared to other countries.

Analysis of Singapore's Human Freedom Index as compared to other countries.

Read more