from class Sat 7.00pm
Feedback from Instructor¶
Singapore has a strong education system which has markedly improved over the decades, largely as a result of changes in the government's education policy. This study investigates how changes in the education system and policies have impacted 3 key educational outcomes - Literacy Rate, % Pass, and progress to higher levels of education. This was done by implementing various statistical and data visualisation methods found in the matplotlib and statsmodels python libraries, such as scatter plots and correlation.
It was found that the government's increased spending on education is correlated with a higher literacy rate. From this, it can be inferred that basic reading education has been improved due to the increased expenditure. Another change in the education system - decreasing class size - has helped more O Level students pass English and Mathematics, but this effect is minimal for Mother Tongue. Hence, it is recommended that research is done to examine the difficulties faced in MTL so as to craft more effective MTL promotion campaigns. Finally, although more teachers have higher academic qualifications, the study on the impacts of this trend is inconclusive. Further research must be done to confirm the various inferences made from the findings of this study.
Education in Singapore has undeniably improved since the 1980s and this is due in large part to the education policies implemented by the Singapore government. This study aims to understand how different changes in the education policies and system have impacted key educational outcomes. This investigation is structured into 3 parts, each answering one of the following questions:
Part 1: How effective has the increasing education expenditure been in increasing our national literacy rate? Part 2: Do smaller class sizes help students achieve better academic results? Part 3: How have the academic qualifications of teachers changed and does this have any impact on students?
Part 1 looks at education at a national level, whereas part 2 and 3 zoom in specifically on secondary school education.
government-expenditure-on-education.csvfrom Data.gov.sg retrieved on 1 Jun 2019
literacy-rate-annual.csvfrom Data.gov.sg retrieved on 1 Jun 2019
pupils-per-teacher-in-secondary-schools.csvfrom Data.gov.sg retrieved on 1 Jun 2019
percentage-of-gce-o-level-students-who-passed-english-language.csvfrom Data.gov.sg retrieved on 1 Jun 2019
percentage-of-gce-o-level-students-who-passed-english-mathematics.csvfrom Data.gov.sg retrieved on 1 Jun 2019
percentage-of-gce-o-level-students-who-passed-mtl.csvfrom Data.gov.sg retrieved on 1 Jun 2019
teachers-in-schools-academic-qualification.csvfrom Data.gov.sg retrieved on 1 Jun 2019
percentage-of-o-level-cohort-that-progressed-to-post-secondary-education.csvfrom Data.gov.sg retrieved on 1 Jun 2019
percentage-of-n-level-cohort-that-progressed-to-post-secondary-education.csvfrom Data.gov.sg retrieved on 1 Jun 2019
#import the relvant libraries and modules import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm from sklearn import linear_model from functools import reduce %matplotlib inline
#read relevant csv files into dataframes govt_df = pd.read_csv('government-expenditure-on-education.csv') literacy_df = pd.read_csv('literacy-rate-annual.csv') display(govt_df.head(),govt_df.describe()) display(literacy_df.head(),literacy_df.describe())
|0||1960||Literacy Rate (15 Years & Over)||52.6|
|1||1961||Literacy Rate (15 Years & Over)||53.8|
|2||1962||Literacy Rate (15 Years & Over)||55.2|
|3||1963||Literacy Rate (15 Years & Over)||56.7|
|4||1964||Literacy Rate (15 Years & Over)||58.4|
#merge the 2 dataframes, only including years for which both expenditure and literacy data are available govt_lit_df = govt_df.merge(literacy_df, how = 'inner', on = 'year') #express expenditure in millions govt_lit_df['expenditure_in_millions'] = govt_lit_df['total_expenditure_on_education']/1000000 #remove the unnecessary columns and rename the columns govt_lit_df.drop(columns = ['level_1','total_expenditure_on_education'], inplace = True) govt_lit_df.columns = ['year','literacy_rate','expenditure_in_millions'] govt_lit_df.head()
First, line charts are plotted for government expenditure on education vs year and literacy rate vs year to examine how each variable has changed over the years. By plotting on a shared x-axis, we can see how both variables change over the same period of time.
fig = plt.figure(figsize = (16,8)) ax1 = fig.add_subplot(111) ax2 = ax1.twinx() govt_lit_df.plot(kind='line', marker = 'o', color = 'r', x = 'year', y = 'expenditure_in_millions', ax = ax2, grid = True, xticks = range(1981,2019,2), yticks = range (0,17, 2)) govt_lit_df.plot(kind='line', marker = 'x', color = 'b', x = 'year', y = 'literacy_rate', ax = ax1, yticks = range(82,99,2), grid = True) ax1.set_ylabel('Literacy Rate (%)', fontsize = 13, labelpad = 15) ax2.set_ylabel('Total Expenditure on Education (million SGD)', fontsize = 13, rotation = 270, labelpad = 20) ax1.set_xlabel('Year', fontsize = 13) ax1.legend(['Literacy Rate (%)'], fontsize = 13, loc = (0.02,0.93)) ax2.legend(['Total Expenditure on Education (million SGD)'], fontsize = 13, loc = (0.02,0.88)) ax1.set_title('Change in Total Expenditure on Education and Literacy Rate from 1981 to 2017', fontsize = 16) plt.show()
As shown in the graph, from 1981 to 2017, both government expenditure on education and literacy rate have increased. The former suggests that the government is putting a greater focus on improving education while the latter suggests that improvements in basic reading and writing education have been successful, translating to higher literacy rates.
Although both variables have increased in tandem over the years, there are certain outliers in this pattern. For example, from 1983 to 1989, government expenditure on education remained relatively constant around 1.8 million SGD, yet literacy rate rose from around 84.5% to around 88.5% over this same period of time. A similar phenomenon is seen from 2000 to 2005. This could be explained by other factors that affect the literacy rate. Further research would be required in this area to determine which factors are more important.
Nonetheless, there is still an obvious positive correlation between government expenditure on education and literacy rate. Therefore, we shall quantify the strength of this correlation by finding the correlation coefficient and visualise this by drawing the regression line.
#create input data for OLS model X = govt_lit_df[['expenditure_in_millions']] y = govt_lit_df['literacy_rate'] #statsmodel OLS model X = sm.add_constant(X) model = sm.OLS(y,X) result = model.fit() govt_lit_df['fitted_LiteracyRate'] = result.fittedvalues display(result.summary()) print('The Pearson product-moment correlation coefficient is ', str(np.corrcoef(govt_lit_df['expenditure_in_millions'],govt_lit_df['literacy_rate'])))
|Date:||Wed, 12 Jun 2019||Prob (F-statistic):||1.94e-16|
The Pearson product-moment correlation coefficient is 0.9266459815516203
An R-squared value of 0.859 and a correlation coefficient of +0.93 implies that there is a strong positive correlation between government expenditure on education and Singapore's literacy rate. The small (or 0) p-value for both the t-tests and F-test suggest that this correlation is significant.
fig2 = plt.figure(figsize=(12,6)) ax3 = fig2.add_subplot(111) govt_lit_df.plot(kind = 'scatter', x = 'expenditure_in_millions', y = 'literacy_rate', ax = ax3) govt_lit_df.plot(kind = 'line', x = 'expenditure_in_millions', y = 'fitted_LiteracyRate', ax = ax3, color = 'r') ax3.set_title('Literacy Rate(%) vs Expenditure on Education(in million SGD)') ax3.set_xlabel('Government Expenditure on Education (in million SGD)') ax3.set_ylabel('Literacy Rate(%)') plt.show()
As expected from the summmary statistics, there is a strong positive correlation between government expenditure on education and literacy rate. Most of the points in the scatter plot are evenly distributed around the line of best fit (coloured red in the figure above).
As mentioned above, part 2 focuses on secondary school education. In this section, average class size will be measured by using the pupils-to-teacher ratio (PTTR). A smaller PTTR means that there are fewer students per tear and hence implies a smaller average class size. Academic performance will be measured by the percentage of O Level students who passed the subject. The subjects being studied are English, Mathematics and Mother Tongue.
#read relevant csv files into dataframes pttr_df = pd.read_csv('pupils-per-teacher-in-secondary-schools.csv') english_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-english-language.csv') math_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-mathematics.csv') mtl_df = pd.read_csv('percentage-of-gce-o-level-students-who-passed-mtl.csv') display(pttr_df.head(), pttr_df.describe(), english_df.head()) #only english_df.head() is displayed as the other 2 subject dataframes are similar
#merges all three subject dataframes on the 'year' and 'race' columns def merge_df(df1,df2): return df1.merge(df2, how = 'inner', on = ['year','race']) subject_list = [english_df,math_df,mtl_df] subject_dfs_list = [x[x['race']=='Overall'] for x in subject_list] merged_subject_df = reduce(merge_df,subject_dfs_list) display(merged_subject_df.head())
#merge the PTTR dataframe with the subjects dataframe and drop the unnecessary column pttr_subject_df = pttr_df.merge(merged_subject_df,how='inner',on='year') pttr_subject_df = pttr_subject_df.drop(columns='race') display(pttr_subject_df.head())
A heatmap of the correlation between all pairs of variables in
pttr_subject_df will be used to examine the strenght of correlation between PTTR and academic performance. Scatter plots are then used to further visualise this correlation.
#get an array of the pearson correlation coefficients between all variables in the dataframe correlation_df = pttr_subject_df.corr() #plot a heatmap fig3 = plt.figure(figsize=(14,14)) ax4 = fig3.add_subplot(211) sns.heatmap(correlation_df, cmap = 'RdBu_r', ax = ax4) ax4.set_title('Correlation between year, PTTR and percentage O level passes by subjects', fontsize = 16) #plot the various scatter plots ax5 = fig3.add_subplot(234) pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevel_el', ax=ax5) ax5.set_title('Percentage O Level EL pass vs PTTR') ax6 = fig3.add_subplot(235) pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevels_math', ax=ax6) ax6.set_title('Percentage O Level Math pass vs PTTR') ax7 = fig3.add_subplot(236) pttr_subject_df.plot(kind='scatter', x='sec_pupil_to_teacher', y='percentage_passed_olevels_mtl', ax=ax7) ax7.set_title('Percentage O Level MTL pass vs PTTR') fig3.subplots_adjust(wspace = 0.5,hspace = 0.8) plt.show() #correlation coefficients for reference print('Pearson Correlation Coefficient between:') print('1) PTTR and Percentage of O Level Students who passed English: ', correlation_df['sec_pupil_to_teacher']['percentage_passed_olevel_el']) print('2) PTTR and Percentage of O Level Students who passed Math: ', correlation_df['sec_pupil_to_teacher']['percentage_passed_olevels_math']) print('3) PTTR and Percentage of O Level Students who passed MTL: ', correlation_df['sec_pupil_to_teacher']['percentage_passed_olevels_mtl'])
Pearson Correlation Coefficient between: 1) PTTR and Percentage of O Level Students who passed English: -0.7163134536635376 2) PTTR and Percentage of O Level Students who passed Math: -0.7773872779478243 3) PTTR and Percentage of O Level Students who passed MTL: -0.33356330988327393
This heatmap and the negative sign in the correlation coefficient imply that overall, PTTR is inversly correlated with pass percentages for the three O Level subjects. The scatter plots for English and Mathematics supports this, whereas the scatter plot for Mother Tongue Language (MTL) seems to suggest that there is very little correlation between pass percentage for O Level MTL and PTTR.
Overall, the percentage of O Level students who passed English, Mathematics and MTL has increased as the PTTR in secondary schools decreased over the years. This may be because a smaller PTTR (implying a smaller class size) encourages student participation and ensures that each student gets more individual attention from teachers. This correlation is slightly stronger for Mathematics than English. However, this correlation is much weaker when it comes to MTL. There are likely other factors that are impeding improvements in MTL, such as the quality of MTL education, teachers and teaching methods. Moreover, the strong correlation found for English and Mathematics do not necessarily mean that PTTR is the main cause for the better performance. Further research must be done to determine what has improved students' performance in these subjects.
It is also worth noting that the increases in percentage pass for English and Mathematics over the years are much larger than the increase in percentage pass for MTL (seen in the heatmap). This may be due to the increasing focus on STEM (Science, Technology, Engineering and Mathematics) and the success of english-promoting campaigns such as the Speak Good English Movement. The improvements in English and Mathematics are particularly important as our increasingly globalised and digitalised economy necessitates better communication and analytical skills.
On the other hand, this observation also suggests that government efforts to promote the learning of MTL were not very effective. Further research could be done to examine how the learning of MTL could be further improved. It is important to continue to protect our MTLs alongside English due to the cultural value, especially in Singapore's multi-racial society.
#read the relevant csv files into dataframes teacher_acad_df = pd.read_csv('teachers-in-schools-academic-qualification.csv') o_postsec_df = pd.read_csv('percentage-of-o-level-cohort-that-progressed-to-post-secondary-education.csv') n_postsec_df = pd.read_csv('percentage-of-n-level-cohort-that-progressed-to-post-secondary-education.csv') display(teacher_acad_df.head(), o_postsec_df.head()) #note MF means Male
|0||1982||MF||PRIMARY||BELOW GCE 'O' LEVEL||278|
|1||1982||F||PRIMARY||BELOW GCE 'O' LEVEL||162|
|2||1982||MF||SECONDARY||BELOW GCE 'O' LEVEL||68|
|3||1982||F||SECONDARY||BELOW GCE 'O' LEVEL||23|
|4||1982||MF||PRE-UNIVERSITY||BELOW GCE 'O' LEVEL||1|
#filter for secondary school teachers only teacher_acad_df = teacher_acad_df[teacher_acad_df['level_of_school'] == 'SECONDARY'].drop(columns='level_of_school') #filter for overall percentage rather than percentages for each race #and merge for all years between the two datasets o_postsec_df = o_postsec_df[o_postsec_df['race'] == 'Overall'].drop(columns='race') n_postsec_df = n_postsec_df[n_postsec_df['race'] == 'Overall'].drop(columns='race') postsec_df = n_postsec_df.merge(o_postsec_df, how = 'outer', on = 'year') display(teacher_acad_df.head(), postsec_df)
|2||1982||MF||BELOW GCE 'O' LEVEL||68|
|3||1982||F||BELOW GCE 'O' LEVEL||23|
|8||1982||MF||GCE 'O' LEVEL||2574|
|9||1982||F||GCE 'O' LEVEL||1141|
|14||1982||MF||GCE 'A' LEVEL/DIPLOMA||2500|
#find the total number of teachers with each academic qualification in each year teacher_acad_df = teacher_acad_df.groupby(['year','academic_qualification']).sum().reset_index() teacher_acad_pivot = teacher_acad_df.pivot(index = 'year', columns = 'academic_qualification', values = 'number_of_teachers') #flatten the multilevel index to prepare for plotting stacked bar graph teacher_acad_flat = teacher_acad_pivot.rename_axis(None,axis=1).reset_index() display(teacher_acad_flat.head()) #create a new dataframe to populate with percentage values for percentage stacked bar graph teacher_acad_percent = pd.DataFrame(teacher_acad_flat['year']) qual_list = list(teacher_acad_df['academic_qualification'].unique()) teacher_acad_flat['Total'] = 0 for qual in qual_list: teacher_acad_flat['Total'] += teacher_acad_flat[qual] for qual in qual_list: teacher_acad_percent[qual + ' %'] = teacher_acad_flat[qual]/teacher_acad_flat['Total']*100 display(teacher_acad_percent.head())
|year||BELOW GCE 'O' LEVEL||GCE 'A' LEVEL/DIPLOMA||GCE 'O' LEVEL||HONOURS DEGREE||MASTERS DEGREE||PASS DEGREE||PHD|
|year||BELOW GCE 'O' LEVEL %||GCE 'A' LEVEL/DIPLOMA %||GCE 'O' LEVEL %||HONOURS DEGREE %||MASTERS DEGREE %||PASS DEGREE %||PHD %|
fig4 = plt.figure(figsize = (20,14)) ax5 = fig4.add_subplot(111) ax6 = ax5.twinx() #percentage stacked bar graph to show change in academic qualifications of secondary school teachers teacher_acad_percent[teacher_acad_percent['year'] >= 2007].plot(kind = 'bar', x = 'year', stacked=True, ax = ax5) #line graphs to show change in percentage of O/N Level students who progressed to post secondary education postsec_df.plot(kind='line', y = 'percentage_o_level_progressed_to_post_secondary_education', ax = ax6, color = 'b', lw = 5) postsec_df.plot(kind='line', y = 'percentage_progressed_to_post_sec_education', ax = ax6, color = 'k', lw = 5) #set various properties of the graph ax5.set_ylim((0,120)) ax6.set_ylim((88,102)) ax5.set_title("Teachers' academic qualifications and Students' progress to post secondary education from 2007 to 2017", fontsize = 18) ax5.set_xlabel("Year", fontsize = 14) ax5.set_ylabel("Percentage of Teachers with each academic qualification", fontsize = 14) ax6.set_ylabel("Percentage of O/N Level Students", fontsize = 14, rotation = 270, labelpad = 20) ax5.legend(loc=2, fontsize = 10) ax6.legend(['% progressed to post secondary education (O LEVELS)', '% progressed to post secondary education (N LEVELS)'], loc = 1, fontsize = 10) plt.show()
From 2007 to 2017, the academic qualifications of secondary schools teachers have improved, with a larger percentage of teachers having degrees. More specifically, the percentage of teachers with Masters and Honours degrees has increased. This has been accompanied by a very slow/minimal increase in the percentage of O Level students who went on to post-secondary education (97.8% to 98%), as well as a relatively sharp increase in the percentage of N level students who went on to post-secondary education (88% to 96%).
The latter phenomenon could be due to more academically-qualified teachers being able to use their greater knowledge in helping students better grasp and maintain interest in school subjects, encouraging them to pursue post-secondary education. However, overall, this investigation does not seem to be very conclusive, and further research should be done to see how the academic qualifications of teachers might impact the students.
To summarise, the following insights were drawn from the 3 part investigation:
From these insights, the following inferences could be made, although more research would be required to confirm these inferences:
Based on this, further research in the following areas is recommended: