The survey aimed to assess the best syntax for the implementation of Refinement Types in Java.
Refinement types have been proposed as an incremental approach for program verification that is directly embedded in the programming language. We propose the usage of Refinement Types within the Java programming language not only as a means for program verification but also for fault localization and efficient mutation in the context of software repair.
The survey was conducted in the context of the master thesis "LiquidJava: Extending Java with Refinements" by Catarina Gamboa, advised by professor Alcides Fonseca, at Lasige, Faculdade de Ciências da Universidade de Lisboa.
The survey was open for 7 weeks, from 2020/12/03 to 2021/01/22, and had a total of 53 answers which will be analysed in thoughout this document.
import re
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
def load_answers (csv_file):
lines = open(csv_file, 'r').readlines()
ans_matrix = []
for l in lines:
l = l.rstrip('\n')
ans = l.split(",")
#only get answers and not lines with the final comment
if(ans != [] and ans[0]!= "" and
(ans[0][1:5] == "2020" or ans[0][1:5] == "2021")):
ans = ans[1:18] #remove timestamp, comments and email
for i in range(len(ans)) :
ans[i] = re.sub('\=.*', '', ans[i])
ans[i] = re.sub('"', '', ans[i])
ans_matrix.append(ans)
return ans_matrix
answers_array = load_answers('Refinements in Java - Syntax.csv')
print("Number of answers:", len(answers_array),"\n")
notFamiliar, vagFamiliar, familiar, veryFamiliar = \
'Not familiar', 'Vaguely familiar', 'Familiar', 'Very familiar'
notAccept, accept, pref = 'Not acceptable', 'Acceptable', 'Preferable'
fJava, fFun, fJML, fRefs = "fam_java","fam_functional","fam_jml", "fam_refs"
wildC, var, mets, alias, ghost = "wildcard_", "variables_", "methods_", "alias_", "ghost_"
column_names = [fJava, fFun, fJML, fRefs]
column_names += [wildC + str(i) for i in range(3)]
column_names += [var + str(i) for i in range(2)]
column_names += [mets + str(i) for i in range(2)]
column_names += [alias + str(i) for i in range(3)]
column_names += [ghost + str(i) for i in range(3)]
answers_pd = pd.DataFrame(answers_array, columns=column_names)
answers_pd[:5]#showcase
This section aims to understand the background of respondents of the survey.
import random
import math
def getFamiliaritySize(pd, colName):
a = len(pd.loc[pd[colName]== notFamiliar])
b = len(pd.loc[pd[colName]== vagFamiliar])
c = len(pd.loc[pd[colName]== familiar])
d = len(pd.loc[pd[colName]== veryFamiliar])
return [d,c,b,a]
def donutChart(pd, title, colName):
# The slices will be ordered and plotted counter-clockwise.
labels = [notFamiliar, vagFamiliar, familiar, veryFamiliar][::-1]
sizes = getFamiliaritySize(pd, colName)
colors = ['yellowgreen', 'gold', 'turquoise', 'tomato']#lightcoral
explode = (0, 0, 0, 0) # explode a slice if required
patches, texts, autotexts = plt.pie(sizes, explode=explode, labels=labels, colors=colors,
autopct='%1.1f%%', shadow=True)
[autotexts[i].set_fontsize(14) for i in range(4)]
#draw a circle at the center of pie to make it look like a donut
centre_circle = plt.Circle((0,0),0.4,color='black', fc='white',linewidth=1.25)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
fig.suptitle(title, fontsize=18, fontweight='bold')
# Set aspect ratio to be equal so that pie is drawn as a circle.
plt.axis('equal')
def horizontalBarsPlot(title, pd, colName, palette="rocket", percentage=True):
lSize = getFamiliaritySize(pd, colName)
maxY = max(lSize)
x = 8 if maxY > 10 else 4
f, axs = plt.subplots(1, 1, figsize=(x, 6), sharex=True)
f.suptitle(title, fontsize=16)
sns.set_theme(style="white", context="talk")
x = np.array(["Not Familiar", "Vaguely Familiar","Familiar","Very Familiar"][::-1])
y1 = np.array(lSize)
splot = sns.barplot(x=y1, y=x, palette=palette, ax=axs)
i = 0
for p in splot.patches:
height = int(lSize[i]) if not percentage else'{:.1f}'.format((lSize[i]/sum(lSize))*100)+"%"
splot.annotate(height,
(p.get_x()+lSize[i]+1 , p.get_height()+i-0.7),
ha = 'left', va = 'center',
xytext = (0, 9),
textcoords = 'offset points')
i+=1
axs.set_xlim([0, maxY+5])
Once the survey aims to access the best syntax for the usage of Refinements in Java it is important that the respondents have some familiarity with the language, otherwise the answers may not be demonstrative of the population that uses Java and that might use the Refinements for Java.
horizontalBarsPlot("Familiarity with Java", answers_pd, fJava, palette="tab10", percentage=True)
The chart above shows that some respondents marked the option "Not familiar" with Java which raises the question if these answers should be used in the study. To access this, we can analyse the other familiarity answers of these respondents.
answers_pd.loc[answers_pd[fJava]== notFamiliar]
The 5.7% of respondants not familiar with Java correspond to 3 of the answers of the survey. From these three, two marked the remaining background questions as "Not Familiar" which leads to the remotion of these two answers as the reponders are not inserted in the target population.
answers_pd = answers_pd.drop(38)
answers_pd = answers_pd.drop(40)
answers_pd = answers_pd.drop(50)
The first implementations of Refinement Types were developed in functional languages. In this survey, some of the proposed syntaxes follow similar lines to other implementations of refinements types, hence, have some similarities to functional languages syntaxes.
horizontalBarsPlot("Familiarity with Functional Languages", answers_pd, fFun, palette="tab10")
Java Modeling Language is a specification language created for Java.
With the addition of the refinements we are also adding specifications to Java code, which means that the syntax chosen will serve as a specification language.
horizontalBarsPlot("Familiarity with JML", answers_pd, fJML, palette="tab10", percentage=True)
horizontalBarsPlot("", answers_pd, fRefs, palette="tab10", percentage=True)
From the reponders that are Familiar or Very Familiar with Refinement Types, all claim to be at least Vaguely familiar with Functional Languages, and 62.5% of them identify as Very familiar, which is expected, once the research on refinement types has been done mostly in funcional languages.
refinements_pd = pd.concat([answers_pd.loc[answers_pd[fRefs]== veryFamiliar], answers_pd.loc[answers_pd[fRefs]== familiar]])
horizontalBarsPlot("Familiar with Functional Languages given Familiarity with Refinements",
refinements_pd, fFun, palette="tab10", percentage=True)
#trocar titulo
def getListSizes(pd, colName):
a = len(pd.loc[answers_pd[colName]== notAccept])
b = len(pd.loc[answers_pd[colName]== accept])
c = len(pd.loc[answers_pd[colName]== pref])
return [a,b,c]
def barsPlot(title, labelNames, pd, column_prefix, indexes, palette="rocket", zoomX=0, zoomY=0, percentage=False):
size = len(indexes)
figXsize = 7*size if size < 3 else 6*size;
f, axs = plt.subplots(1, size, figsize=(figXsize+zoomX, 5+zoomY), sharex=True)
f.suptitle(title, fontsize=16)
sns.set_theme(style="white", context="talk")
maxY = 0
for i in range(size):
lSize = getListSizes(pd, column_prefix+str(indexes[i]))
x = np.array(["Not Accept.", "Acceptable","Preferable"])
y1 = np.array(lSize)
splot = sns.barplot(x=x, y=y1, palette=palette, ax=axs[i])
axs[i].axhline(0, color="k", clip_on=False)
axs[i].set_xlabel(labelNames[i], fontsize=15+zoomX)
maxY = max(lSize) if max(lSize) > maxY else maxY
for p in splot.patches:
height = int(p.get_height()) if not percentage else '{:.1f}'.format((p.get_height()/sum(lSize))*100)+"%"
splot.annotate(height,
(p.get_x() + p.get_width() / 2., p.get_height()),
ha = 'center', va = 'center',
xytext = (0, 9),
textcoords = 'offset points')
for i in range(size):
axs[i].set_ylim([0, maxY+4])
labelNames = ['@Refinement("\\v > 0")', '@Refinement("? > 0")', '@Refinement("_ > 0")']
barsPlot("Wildcard Variables", labelNames, answers_pd, wildC, range(3))
From the gathered answers the first option can be discarded as a possible syntax due to the high rate of "Not Acceptable" answers compared to the others.
To choose between the second and the third options, a more detailed approach needs to be taken.
The "?" option has the lowest rate of "Not Acceptable" and the highest rate of "Acceptable" but contains less "Preferable" answers than the "_" option. While the 3rd option has an higher rate of "Not Acceptable" answers, a lower rate of "Acceptable" ones and the highest number of "Preferable". This leads to an inconclusive answers to which option should be used in the refinements.
To differentiate the last two approaches, it was decided analyse the sideffects of removing the answers that marked the first option(already excluded) as "Preferable". The remotion effects are shown in the plot below.
answers_pd2 = answers_pd.copy()
answers_pd2.drop(answers_pd2 .loc[answers_pd2["wildcard_0"] == pref].index, inplace=True)
labelNames = labelNames[1:]
barsPlot('Wildcard Variables excluding marked as "Preferable" in 1st', labelNames, answers_pd2, wildC, [1,2])
Without counting with the answers that prefered the 1st option, the rate of "Not Acceptable" answers is the same for the 2nd and 3rd options, while the "Preferable" rate stays the same.
This leads to the choice of _ as the Wildcard Variable.
labelNames = ['@Refinement("negativeGrade < 10")\nint negativeGrade = 8;',
""'@Refinement("{x | x < 10}")\nint negativeGrade = 8;']
barsPlot("Variable Refinements", labelNames, answers_pd, var, range(2), palette="viridis")
From this answers we can conclude that the first option was the preferable one. Although, it is possible to see that the second option was also acceptable, and the rate of "Not Acceptable" answers was quite low in both options.
This leads to the choice of the first option as the Syntax for the Refinements in Variables.
labelNames = ['@Refinement("\\v >= 0 && \\v <= 100") \n static int percentageFromGrade( \n @Refinement("grade >= 0") int grade,\n @Refinement("scale > 0") int scale){...}',
'@Refinement("{grade >= 0} -> {scale > 0} -> {\\v >= 0 && \\v <= 100}")\n static int percentageFromGrade(int grade, int scale){...}']
barsPlot("Refinements in Methods", labelNames, answers_pd, mets, range(2))
This answers show a preference for the first syntax over the second, once the first has an higher rate of "Preferable" answers and a lower rate of "Not Acceptable" answers.
By this analysis the chosen option is the first one.
labelNames = ['@Refinement("PtGrade refines Integer where \n(\\v >= 0 && \\v <= 20)")\n...\n@Refinement("positiveGrade == PtGrade")',
'@Refinement("type PtGrade(int x) { x >= 0 && x <= 20}")\n...\n@Refinement("PtGrade(positiveGrade)")',
'@Positive (2 files)']
barsPlot("Refinement Alias", labelNames, answers_pd, alias, range(3), palette='viridis')
For the alising of refinements, the second option shows the best answers, once it has the lowest number of "Not Acceptable" answers and the highest of "Preferable" ones.
This concludes that the second option is the chosen for the refinement alias.
labelNames = ['Above the class',
'Above any method',
'Above class attributes']
barsPlot("Ghost Functions", labelNames, answers_pd, alias, range(3))
The second choice is a clear choice for the positioning of the ghost functions, once it obtained the lowest number of "Not Acceptable" answers and the highest number of "Preferable" answers.
The chosen option for the ghost functions is the second one.
Along this survey there were two refinements options strongly inspired by the syntax used in functional languages and in refinements for functional languages - specifically LiquidHaskell:
In this subsection we try to evaluate if the background of familiarity with functional languages is linked to the preference on each of this syntaxes. For this we select a total of 26 answers out of the 50 used throughout the study.
functional_pd = pd.concat([answers_pd.loc[answers_pd[fFun]== veryFamiliar], answers_pd.loc[answers_pd[fFun]== familiar]])
len(functional_pd)
def stackPlot(pd1, pd2, colName, title, xleg):
#Set general plot properties
sns.set_style("white")
sns.set_context({"figure.figsize": (6, 4)})
colorBot = '#e35430'
colorUp = "#ff8630"
sns.set_theme(style="white", context="talk")
#Plot 1 - background - "total" (top) series
lSize = getListSizes(pd1, colName)
x = np.array(["Not Accept.", "Acceptable","Preferable"])
y1 = np.array(lSize)
splot = sns.barplot(x=x, y=y1, color = colorUp)
#Plot 2 - overlay - "bottom" series
lSize = getListSizes(pd2, colName)
x = np.array(["Not Accept.", "Acceptable","Preferable"])
y1 = np.array(lSize)
bottom_plot = sns.barplot(x = x, y = y1, color = colorBot)
topbar = plt.Rectangle((0,0),1,1,fc=colorUp, edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc=colorBot, edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['Familiar with Functional Languages', 'Remaining sample'], loc=0, ncol = 1, prop={'size':10})
l.draw_frame(False)
plt.suptitle(title, size=14)
#Optional code - Make plot look nicer
sns.despine(left=False)
bottom_plot.set_ylabel("Number of answers")
bottom_plot.set_xlabel(xleg)
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(12)
stackPlot(answers_pd, functional_pd, var+"0", "Refinements in Variables - option 1", '@Refinement("negative < 10")\nint negativeGrade = 8;')
plt.figure().clear()
stackPlot(answers_pd, functional_pd, var+"1", "Refinements in Variables - option 2", '@Refinement("{x | x < 10}")\nint negativeGrade = 8;')
For these options there is no clear difference between the responders that are acquainted with functional languages and the global sample once both graphic show simiar results and a preference for the first option.
However we can see that all the answers that mark the first option as Not Acceptable are from responders that are familiar with functional languages and that prefered the second option.
answers_pd.loc[answers_pd[var+"0"]== notAccept][[fFun, var+'0', var+'1']]
labelNames = ['@Refinement("\\v >= 0") \n static int percentageFromGrade( \n @Refinement("grade >= 0") int grade,\n @Refinement("scale > 0") int scale){...}',
'@Refinement("{grade >= 0} -> {scale > 0} -> {\\v >= 0 && \\v <= 100}")\n static int percentageFromGrade(int grade, int scale){...}']
stackPlot(answers_pd, functional_pd, mets+"0", "Refinements in Methods - option 1", labelNames[0])
plt.figure().clear()
stackPlot(answers_pd, functional_pd, mets+"1", "Refinements in Methods - option 2", labelNames[1])
Again, the results of the responders acquainted to Functional Languages are similar to the global sample. The charts above represent the percentage of option answers from the total of each sample. The percentages show that both samples have similar distribution results with the highest difference of 10.4% in the Acceptable answers on the first option.
Neither of the cases strongly inspired by functional languages show a discrepancy between the total of responders and the responders familiar with Functional Languages, which leads to the conclusion that there the familiarity with functional languages was not a big influence in the choices of the syntax.
listFamiliarCat = [notFamiliar, vagFamiliar, familiar, veryFamiliar][::-1]
listPreferenceCat = [notAccept, accept, pref]
def setCategorical(df, column_names, cats, jitter=False):
df = df.copy()
for cat in column_names:
df[cat] = df[cat].astype(pd.api.types.CategoricalDtype(categories = cats, ordered=True))
cat_columns = df.select_dtypes(['category']).columns
df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes if not jitter else np.random.uniform(x.cat.codes-0.15, x.cat.codes+0.15))
return df
def pairPlot(df, y, first_columns, cats=listFamiliarCat, pal='tab10',hue=None):
df[y] = df[y].astype(pd.api.types.CategoricalDtype(categories = cats, ordered=True))
h = y if hue == None else hue
sns.set_theme(style="ticks")
sns.pairplot(df, y_vars=[y], x_vars=first_columns, hue=h, palette=pal, diag_kind = None)
sns.pairplot(df, y_vars=[y], x_vars=column_names[4:9], hue=h, palette=pal, diag_kind = None)
sns.pairplot(df, y_vars=[y], x_vars=column_names[9:-3], hue=h, palette=pal, diag_kind = None)
sns.pairplot(df, y_vars=[y], x_vars=column_names[-2:], hue=h, palette=pal, diag_kind = None)
answersCat_pd = setCategorical(answers_pd, column_names[4:], [notAccept, accept, pref], jitter=True)
answersCat_pd[:5]
allCat_pd = setCategorical(answersCat_pd, [fFun, fJML, fRefs],
[notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(allCat_pd, fJava, column_names[1:4])
all_pd = setCategorical(answersCat_pd, [fJava, fJML, fRefs],
[notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fFun, [column_names[0]]+column_names[2:4])
all_pd = setCategorical(answersCat_pd, [fJava, fFun, fRefs],
[notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fJML, column_names[:2]+[column_names[3]])
all_pd = setCategorical(answersCat_pd, [fJava, fFun, fJML],
[notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fRefs, column_names[:3])
This section includes some comments considered interesting.