Analysis of the Survey

Refinements in Java - Syntax

The survey aimed to assess the best syntax for the implementation of Refinement Types in Java.

Refinement types have been proposed as an incremental approach for program verification that is directly embedded in the programming language. We propose the usage of Refinement Types within the Java programming language not only as a means for program verification but also for fault localization and efficient mutation in the context of software repair.

The survey was conducted in the context of the master thesis "LiquidJava: Extending Java with Refinements" by Catarina Gamboa, advised by professor Alcides Fonseca, at Lasige, Faculdade de Ciências da Universidade de Lisboa.

The survey was open for 7 weeks, from 2020/12/03 to 2021/01/22, and had a total of 53 answers which will be analysed in thoughout this document.

In [1]:
import re
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def load_answers (csv_file):
    lines = open(csv_file, 'r').readlines()
    ans_matrix = []
    for l in lines:
        l = l.rstrip('\n')

        ans = l.split(",")
        #only get answers and not lines with the final comment
        if(ans != [] and ans[0]!= "" and
           (ans[0][1:5] == "2020" or ans[0][1:5] == "2021")):
            ans = ans[1:18] #remove timestamp, comments and email
            for i in range(len(ans)) :
                ans[i] = re.sub('\=.*', '', ans[i])
                ans[i] = re.sub('"', '', ans[i])
            ans_matrix.append(ans)
    return ans_matrix

answers_array = load_answers('Refinements in Java - Syntax.csv')
print("Number of answers:", len(answers_array),"\n")
Number of answers: 53

In [2]:
notFamiliar, vagFamiliar, familiar, veryFamiliar = \
'Not familiar', 'Vaguely familiar', 'Familiar', 'Very familiar'
notAccept, accept, pref = 'Not acceptable', 'Acceptable', 'Preferable'
fJava, fFun, fJML, fRefs = "fam_java","fam_functional","fam_jml", "fam_refs"
wildC, var, mets, alias, ghost = "wildcard_", "variables_", "methods_", "alias_", "ghost_"


column_names = [fJava, fFun, fJML, fRefs]
column_names += [wildC + str(i) for i in range(3)]
column_names += [var + str(i) for i in range(2)]
column_names += [mets + str(i) for i in range(2)]
column_names += [alias + str(i) for i in range(3)]
column_names += [ghost + str(i) for i in range(3)]
answers_pd = pd.DataFrame(answers_array, columns=column_names)
answers_pd[:5]#showcase
Out[2]:
fam_java fam_functional fam_jml fam_refs wildcard_0 wildcard_1 wildcard_2 variables_0 variables_1 methods_0 methods_1 alias_0 alias_1 alias_2 ghost_0 ghost_1 ghost_2
0 Familiar Vaguely familiar Familiar Vaguely familiar Acceptable Preferable Preferable Preferable Acceptable Preferable Acceptable Preferable Acceptable Preferable Preferable Acceptable Preferable
1 Familiar Vaguely familiar Not familiar Not familiar Not acceptable Preferable Acceptable Acceptable Preferable Acceptable Not acceptable Preferable Not acceptable Acceptable Acceptable Acceptable Acceptable
2 Familiar Not familiar Vaguely familiar Not familiar Acceptable Preferable Not acceptable Preferable Acceptable Acceptable Preferable Acceptable Acceptable Preferable Not acceptable Not acceptable Preferable
3 Very familiar Vaguely familiar Not familiar Not familiar Not acceptable Preferable Acceptable Preferable Not acceptable Acceptable Not acceptable Acceptable Preferable Not acceptable Not acceptable Preferable Acceptable
4 Familiar Familiar Not familiar Vaguely familiar Preferable Acceptable Not acceptable Preferable Acceptable Preferable Acceptable Acceptable Acceptable Not acceptable Preferable Acceptable Acceptable

1. Background Information - Familiarity with tools

This section aims to understand the background of respondents of the survey.

In [3]:
import random
import math
def getFamiliaritySize(pd, colName):
    a = len(pd.loc[pd[colName]== notFamiliar])
    b = len(pd.loc[pd[colName]== vagFamiliar])
    c = len(pd.loc[pd[colName]== familiar])
    d = len(pd.loc[pd[colName]== veryFamiliar])
    return [d,c,b,a]

def donutChart(pd, title, colName):
    # The slices will be ordered and plotted counter-clockwise.
    labels = [notFamiliar, vagFamiliar, familiar, veryFamiliar][::-1]
    sizes = getFamiliaritySize(pd, colName)
    colors = ['yellowgreen', 'gold', 'turquoise', 'tomato']#lightcoral
    explode = (0, 0, 0, 0)  # explode a slice if required

    patches, texts, autotexts = plt.pie(sizes, explode=explode, labels=labels, colors=colors,
            autopct='%1.1f%%', shadow=True)
    [autotexts[i].set_fontsize(14) for i in range(4)]

    #draw a circle at the center of pie to make it look like a donut
    centre_circle = plt.Circle((0,0),0.4,color='black', fc='white',linewidth=1.25)
    fig = plt.gcf()
    fig.gca().add_artist(centre_circle)
    fig.suptitle(title, fontsize=18, fontweight='bold')
    # Set aspect ratio to be equal so that pie is drawn as a circle.
    plt.axis('equal')


def horizontalBarsPlot(title, pd, colName, palette="rocket", percentage=True):
    lSize = getFamiliaritySize(pd, colName)
    maxY = max(lSize)
    x = 8 if maxY > 10 else 4
    f, axs = plt.subplots(1, 1, figsize=(x, 6), sharex=True)
    f.suptitle(title, fontsize=16)
    sns.set_theme(style="white", context="talk")

    x = np.array(["Not Familiar", "Vaguely Familiar","Familiar","Very Familiar"][::-1])
    y1 = np.array(lSize)
    splot = sns.barplot(x=y1, y=x, palette=palette, ax=axs)


    i = 0
    for p in splot.patches:
        height = int(lSize[i]) if not percentage else'{:.1f}'.format((lSize[i]/sum(lSize))*100)+"%"
        splot.annotate(height,
               (p.get_x()+lSize[i]+1 , p.get_height()+i-0.7),
               ha = 'left', va = 'center',
               xytext = (0, 9),
               textcoords = 'offset points')
        i+=1
    axs.set_xlim([0, maxY+5])

1.1 Familiarity with Java

Once the survey aims to access the best syntax for the usage of Refinements in Java it is important that the respondents have some familiarity with the language, otherwise the answers may not be demonstrative of the population that uses Java and that might use the Refinements for Java.

In [4]:
horizontalBarsPlot("Familiarity with Java", answers_pd, fJava, palette="tab10", percentage=True)

The chart above shows that some respondents marked the option "Not familiar" with Java which raises the question if these answers should be used in the study. To access this, we can analyse the other familiarity answers of these respondents.

In [5]:
answers_pd.loc[answers_pd[fJava]== notFamiliar]
Out[5]:
fam_java fam_functional fam_jml fam_refs wildcard_0 wildcard_1 wildcard_2 variables_0 variables_1 methods_0 methods_1 alias_0 alias_1 alias_2 ghost_0 ghost_1 ghost_2
38 Not familiar Not familiar Not familiar Not familiar Preferable Preferable Preferable Acceptable Preferable Acceptable Preferable Preferable Preferable Acceptable Acceptable Preferable Preferable
40 Not familiar Not familiar Not familiar Not familiar Acceptable Preferable Acceptable Acceptable Preferable Preferable Acceptable Acceptable Acceptable Preferable Acceptable Preferable Acceptable
50 Not familiar Very familiar Not familiar Vaguely familiar Not acceptable Acceptable Preferable Preferable Acceptable Preferable Acceptable Not acceptable Preferable Acceptable Acceptable Acceptable Acceptable

The 5.7% of respondants not familiar with Java correspond to 3 of the answers of the survey. From these three, two marked the remaining background questions as "Not Familiar" which leads to the remotion of these two answers as the reponders are not inserted in the target population.

In [6]:
answers_pd = answers_pd.drop(38)
answers_pd = answers_pd.drop(40)
answers_pd = answers_pd.drop(50)

1.2 Familiarity with Functional Languages

The first implementations of Refinement Types were developed in functional languages. In this survey, some of the proposed syntaxes follow similar lines to other implementations of refinements types, hence, have some similarities to functional languages syntaxes.

In [7]:
horizontalBarsPlot("Familiarity with Functional Languages", answers_pd, fFun, palette="tab10")

1.3 Familiarity with JML

Java Modeling Language is a specification language created for Java.

With the addition of the refinements we are also adding specifications to Java code, which means that the syntax chosen will serve as a specification language.

In [8]:
horizontalBarsPlot("Familiarity with JML", answers_pd, fJML, palette="tab10", percentage=True)

1.4 Familiarity with Refinement Types

In [9]:
horizontalBarsPlot("", answers_pd, fRefs, palette="tab10", percentage=True)

From the reponders that are Familiar or Very Familiar with Refinement Types, all claim to be at least Vaguely familiar with Functional Languages, and 62.5% of them identify as Very familiar, which is expected, once the research on refinement types has been done mostly in funcional languages.

In [10]:
refinements_pd = pd.concat([answers_pd.loc[answers_pd[fRefs]== veryFamiliar], answers_pd.loc[answers_pd[fRefs]== familiar]])
horizontalBarsPlot("Familiar with Functional Languages given Familiarity with Refinements",
                   refinements_pd, fFun, palette="tab10", percentage=True)
#trocar titulo

2. Preferences on the Syntax

In [11]:
def getListSizes(pd, colName):
    a = len(pd.loc[answers_pd[colName]== notAccept])
    b = len(pd.loc[answers_pd[colName]== accept])
    c = len(pd.loc[answers_pd[colName]== pref])
    return [a,b,c]


def barsPlot(title, labelNames, pd, column_prefix, indexes, palette="rocket", zoomX=0, zoomY=0, percentage=False):
    size = len(indexes)
    figXsize = 7*size if size < 3 else 6*size;
    f, axs = plt.subplots(1, size, figsize=(figXsize+zoomX, 5+zoomY), sharex=True)
    f.suptitle(title, fontsize=16)
    sns.set_theme(style="white", context="talk")
    maxY = 0
    for i in range(size):
        lSize = getListSizes(pd, column_prefix+str(indexes[i]))
        x = np.array(["Not Accept.", "Acceptable","Preferable"])
        y1 = np.array(lSize)
        splot = sns.barplot(x=x, y=y1, palette=palette, ax=axs[i])
        axs[i].axhline(0, color="k", clip_on=False)
        axs[i].set_xlabel(labelNames[i], fontsize=15+zoomX)
        maxY = max(lSize) if max(lSize) > maxY else maxY
        for p in splot.patches:
            height =  int(p.get_height()) if not percentage else '{:.1f}'.format((p.get_height()/sum(lSize))*100)+"%"
            splot.annotate(height,
                   (p.get_x() + p.get_width() / 2., p.get_height()),
                   ha = 'center', va = 'center',
                   xytext = (0, 9),
                   textcoords = 'offset points')

    for i in range(size):
        axs[i].set_ylim([0, maxY+4])

2.1 Wildcard Variable

In [12]:
labelNames = ['@Refinement("\\v > 0")', '@Refinement("? > 0")', '@Refinement("_ > 0")']
barsPlot("Wildcard Variables", labelNames, answers_pd, wildC, range(3))

From the gathered answers the first option can be discarded as a possible syntax due to the high rate of "Not Acceptable" answers compared to the others.

To choose between the second and the third options, a more detailed approach needs to be taken.

The "?" option has the lowest rate of "Not Acceptable" and the highest rate of "Acceptable" but contains less "Preferable" answers than the "_" option. While the 3rd option has an higher rate of "Not Acceptable" answers, a lower rate of "Acceptable" ones and the highest number of "Preferable". This leads to an inconclusive answers to which option should be used in the refinements.

To differentiate the last two approaches, it was decided analyse the sideffects of removing the answers that marked the first option(already excluded) as "Preferable". The remotion effects are shown in the plot below.

In [13]:
answers_pd2 = answers_pd.copy()
answers_pd2.drop(answers_pd2 .loc[answers_pd2["wildcard_0"] == pref].index, inplace=True)
labelNames = labelNames[1:]
barsPlot('Wildcard Variables excluding marked as "Preferable" in 1st', labelNames, answers_pd2, wildC, [1,2])

Without counting with the answers that prefered the 1st option, the rate of "Not Acceptable" answers is the same for the 2nd and 3rd options, while the "Preferable" rate stays the same.

This leads to the choice of _ as the Wildcard Variable.

2.2 Variable Refinements

In [14]:
labelNames = ['@Refinement("negativeGrade < 10")\nint negativeGrade = 8;',
              ""'@Refinement("{x | x < 10}")\nint negativeGrade = 8;']
barsPlot("Variable Refinements", labelNames, answers_pd, var, range(2), palette="viridis")

From this answers we can conclude that the first option was the preferable one. Although, it is possible to see that the second option was also acceptable, and the rate of "Not Acceptable" answers was quite low in both options.

This leads to the choice of the first option as the Syntax for the Refinements in Variables.

2.3 Refinements in Methods

In [15]:
labelNames = ['@Refinement("\\v >= 0 && \\v <= 100") \n static int percentageFromGrade(        \n           @Refinement("grade >= 0") int grade,\n              @Refinement("scale > 0")  int scale){...}',
              '@Refinement("{grade >= 0} -> {scale > 0} -> {\\v >= 0 && \\v <= 100}")\n static int percentageFromGrade(int grade, int scale){...}']
barsPlot("Refinements in Methods", labelNames, answers_pd, mets, range(2))

This answers show a preference for the first syntax over the second, once the first has an higher rate of "Preferable" answers and a lower rate of "Not Acceptable" answers.

By this analysis the chosen option is the first one.

2.4 Refinement Alias

In [16]:
labelNames = ['@Refinement("PtGrade refines Integer where \n(\\v >= 0 && \\v <= 20)")\n...\n@Refinement("positiveGrade == PtGrade")',
              '@Refinement("type PtGrade(int x) { x >= 0 && x <= 20}")\n...\n@Refinement("PtGrade(positiveGrade)")',
              '@Positive (2 files)']
barsPlot("Refinement Alias", labelNames, answers_pd, alias, range(3), palette='viridis')

For the alising of refinements, the second option shows the best answers, once it has the lowest number of "Not Acceptable" answers and the highest of "Preferable" ones.

This concludes that the second option is the chosen for the refinement alias.

2.5 Ghost/Uninterpreted Functions

In [17]:
labelNames = ['Above the class',
              'Above any method',
              'Above class attributes']
barsPlot("Ghost Functions", labelNames, answers_pd, alias, range(3))

The second choice is a clear choice for the positioning of the ghost functions, once it obtained the lowest number of "Not Acceptable" answers and the highest number of "Preferable" answers.

The chosen option for the ghost functions is the second one.

3. Other considerations

3.1 Refinements inspired in Functional Languages and Responders Answers

Along this survey there were two refinements options strongly inspired by the syntax used in functional languages and in refinements for functional languages - specifically LiquidHaskell:

  • @Refinement("{x | x < 10}") - This is similar to the syntax to add a refinement to a variable in LiquidHaskell
  • @Refinement("{grade >= 0} -> {scale > 0} -> {\v >= 0 && \v <= 100}") - The use of -> is a common way to specify the function types in functional languages, where the item after the last arrow refers to the return and all the items before refere to the parameters of the function. In LiquidHaskell the use of the arrow is combined with the braces to identify the the refinements for each of the parameters and the return

In this subsection we try to evaluate if the background of familiarity with functional languages is linked to the preference on each of this syntaxes. For this we select a total of 26 answers out of the 50 used throughout the study.

In [18]:
functional_pd = pd.concat([answers_pd.loc[answers_pd[fFun]== veryFamiliar], answers_pd.loc[answers_pd[fFun]== familiar]])
len(functional_pd)
Out[18]:
25
In [19]:
def stackPlot(pd1, pd2, colName, title, xleg):
    #Set general plot properties
    sns.set_style("white")
    sns.set_context({"figure.figsize": (6, 4)})
    colorBot = '#e35430'
    colorUp = "#ff8630"
    sns.set_theme(style="white", context="talk")

    #Plot 1 - background - "total" (top) series
    lSize = getListSizes(pd1, colName)
    x = np.array(["Not Accept.", "Acceptable","Preferable"])
    y1 = np.array(lSize)
    splot = sns.barplot(x=x, y=y1, color = colorUp)

    #Plot 2 - overlay - "bottom" series
    lSize = getListSizes(pd2, colName)
    x = np.array(["Not Accept.", "Acceptable","Preferable"])
    y1 = np.array(lSize)
    bottom_plot = sns.barplot(x = x, y = y1, color = colorBot)

    topbar = plt.Rectangle((0,0),1,1,fc=colorUp, edgecolor = 'none')
    bottombar = plt.Rectangle((0,0),1,1,fc=colorBot,  edgecolor = 'none')
    l = plt.legend([bottombar, topbar], ['Familiar with Functional Languages', 'Remaining sample'], loc=0, ncol = 1, prop={'size':10})
    l.draw_frame(False)
    plt.suptitle(title, size=14)

    #Optional code - Make plot look nicer
    sns.despine(left=False)
    bottom_plot.set_ylabel("Number of answers")
    bottom_plot.set_xlabel(xleg)

    #Set fonts to consistent 16pt size
    for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
                 bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
        item.set_fontsize(12)

3.1.1 Refinements in Variables

In [20]:
stackPlot(answers_pd, functional_pd, var+"0", "Refinements in Variables - option 1", '@Refinement("negative < 10")\nint negativeGrade = 8;')
plt.figure().clear()
stackPlot(answers_pd, functional_pd, var+"1", "Refinements in Variables - option 2", '@Refinement("{x | x < 10}")\nint negativeGrade = 8;')

For these options there is no clear difference between the responders that are acquainted with functional languages and the global sample once both graphic show simiar results and a preference for the first option.

However we can see that all the answers that mark the first option as Not Acceptable are from responders that are familiar with functional languages and that prefered the second option.

In [21]:
answers_pd.loc[answers_pd[var+"0"]== notAccept][[fFun, var+'0', var+'1']]
Out[21]:
fam_functional variables_0 variables_1
27 Familiar Not acceptable Preferable
31 Very familiar Not acceptable Preferable
49 Very familiar Not acceptable Preferable

3.1.2 Refinements in Methods

In [22]:
labelNames = ['@Refinement("\\v >= 0") \n static int percentageFromGrade(        \n           @Refinement("grade >= 0") int grade,\n              @Refinement("scale > 0")  int scale){...}',
              '@Refinement("{grade >= 0} -> {scale > 0} -> {\\v >= 0 && \\v <= 100}")\n static int percentageFromGrade(int grade, int scale){...}']
stackPlot(answers_pd, functional_pd, mets+"0", "Refinements in Methods - option 1", labelNames[0])
plt.figure().clear()
stackPlot(answers_pd, functional_pd, mets+"1", "Refinements in Methods - option 2", labelNames[1])

Again, the results of the responders acquainted to Functional Languages are similar to the global sample. The charts above represent the percentage of option answers from the total of each sample. The percentages show that both samples have similar distribution results with the highest difference of 10.4% in the Acceptable answers on the first option.

Conclusion

Neither of the cases strongly inspired by functional languages show a discrepancy between the total of responders and the responders familiar with Functional Languages, which leads to the conclusion that there the familiarity with functional languages was not a big influence in the choices of the syntax.

3.2 Further Comparison between Background and Answers

In [23]:
listFamiliarCat = [notFamiliar, vagFamiliar, familiar, veryFamiliar][::-1]
listPreferenceCat = [notAccept, accept, pref]

def setCategorical(df, column_names, cats, jitter=False):
    df = df.copy()
    for cat in column_names:
        df[cat] = df[cat].astype(pd.api.types.CategoricalDtype(categories = cats, ordered=True))
        cat_columns = df.select_dtypes(['category']).columns

    df[cat_columns] = df[cat_columns].apply(lambda x: x.cat.codes if not jitter else np.random.uniform(x.cat.codes-0.15, x.cat.codes+0.15))
    return df

def pairPlot(df, y, first_columns, cats=listFamiliarCat, pal='tab10',hue=None):
    df[y] = df[y].astype(pd.api.types.CategoricalDtype(categories = cats, ordered=True))
    h = y if hue == None else hue
    sns.set_theme(style="ticks")
    sns.pairplot(df, y_vars=[y], x_vars=first_columns, hue=h, palette=pal, diag_kind = None)
    sns.pairplot(df, y_vars=[y], x_vars=column_names[4:9], hue=h, palette=pal, diag_kind = None)
    sns.pairplot(df, y_vars=[y], x_vars=column_names[9:-3], hue=h, palette=pal, diag_kind = None)
    sns.pairplot(df, y_vars=[y], x_vars=column_names[-2:], hue=h, palette=pal, diag_kind = None)
In [24]:
answersCat_pd = setCategorical(answers_pd, column_names[4:], [notAccept, accept, pref], jitter=True)
answersCat_pd[:5]
Out[24]:
fam_java fam_functional fam_jml fam_refs wildcard_0 wildcard_1 wildcard_2 variables_0 variables_1 methods_0 methods_1 alias_0 alias_1 alias_2 ghost_0 ghost_1 ghost_2
0 Familiar Vaguely familiar Familiar Vaguely familiar 1.019940 2.144094 1.900379 2.126033 1.130255 2.087837 0.953225 2.031033 0.979932 2.041004 1.901341 0.872308 1.965238
1 Familiar Vaguely familiar Not familiar Not familiar -0.021367 2.049559 1.008319 0.947915 1.952277 0.879520 -0.109455 1.930870 0.097811 0.972254 1.083418 1.086225 1.134627
2 Familiar Not familiar Vaguely familiar Not familiar 0.953364 1.866290 0.144448 2.073993 0.942039 1.007189 2.139496 1.064986 0.980584 1.966876 -0.032436 -0.060492 2.124879
3 Very familiar Vaguely familiar Not familiar Not familiar -0.095493 1.955942 0.871363 1.901381 0.006440 0.876084 0.148863 0.913969 1.885618 -0.142547 0.099636 1.963288 0.953422
4 Familiar Familiar Not familiar Vaguely familiar 1.956723 1.031686 0.108748 2.042009 1.116475 2.071959 1.092969 1.089871 1.077602 0.132336 2.065416 0.939014 0.922902
3.2.1 Pair Plot - Familiarity with Java and answers
In [25]:
allCat_pd = setCategorical(answersCat_pd, [fFun, fJML, fRefs],
                                   [notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(allCat_pd, fJava, column_names[1:4])
3.2.2 Pair Plot - Familiarity with Functional Languages and Answers
In [26]:
all_pd = setCategorical(answersCat_pd, [fJava, fJML, fRefs],
                                   [notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fFun, [column_names[0]]+column_names[2:4])
3.2.3 Pair Plot - Familiarity with JML and Answers
In [27]:
all_pd = setCategorical(answersCat_pd, [fJava, fFun, fRefs],
                                   [notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fJML, column_names[:2]+[column_names[3]])
3.2.4 Pair Plot - Familiarity with Refinements and Answers
In [28]:
all_pd = setCategorical(answersCat_pd, [fJava, fFun, fJML],
                                   [notFamiliar, vagFamiliar, familiar, veryFamiliar], jitter=True)
pairPlot(all_pd, fRefs, column_names[:3])

4. Final Comments

This section includes some comments considered interesting.

  • Would it be possible to **"reflect"** java functions into "ghost" versions (to be called in refinements)? In a similar way to Liquid Haskell's --reflect. It does seem like doing this with imperative functions might be strange though.
  • I'd use **@RefinementFunction("int", "len", "List")** for ghost functions instead of @Refinement (and varargs if there is more than one argument). Syntactically it's significantly different from other refinements. At the very least I wouldn't bind ghost function arguments, i.e. "int len(List)" instead of "int len(List xs)". Similarly, I'd use **@RefinementType("PtGrade", "int", "_ >= 0 && _ <= 20")** instead of the provided examples (and _1 _2 or %1 %2 if types accept more than one argument).
In [ ]: