Skip to content

Mayaryin/GenderInfluenceInCodeGeneration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gender Influence in Code Generation

Examining Prompting Styles and Bias in Large Language Models

Master's Thesis Research Project — An empirical study investigating whether and how gender shapes the way people prompt AI coding assistants, and whether these differences translate into measurable differences in generated code quality.


Overview

This repository contains the full analytical pipeline for a study in which participants solved real-world programming tasks using LLMs (ChatGPT, Claude). Their conversations were collected via an online survey, stored in a structured database, and analysed across three dimensions:

Dimension What is measured
Prompt Linguistics Writing style, tone, length, grammar, politeness, n-grams, request type, sentiment
Code Quality Pylint score, Radon cyclomatic complexity, maintainability index
Gender Prediction Logistic Regression, Support Vector Machine, Fine-tuned RoBERTa classifier trained on user prompts

Research Questions

  1. Prompting Style — Do cisgender men and women differ in how they write prompts to LLMs (length, formality, politeness markers, sentence structure, request framing)?
  2. Code Quality — Does the gender of the prompter correlate with the quality of the LLM-generated code?
  3. Gender Predictability — Can a machine learning model reliably predict a user's gender from their prompts alone?

Data Pipeline

Online Survey (LimeSurvey)
        │
        ▼
Playwright Scraper ──► Raw chat HTML from ChatGPT / Claude share links
        │
        ▼
Importer ──► SQLite (giicg.db)
        │
        ├── Language Detection  (xlm-roberta-base-language-detection)
        ├── Translation DE/IT → EN  (HuggingFace Helsinki-NLP)
        ├── Spelling Correction  (oliverguhr/spelling-correction-english-base)
        └── Contraction Expansion
        │
        ▼
Prompt Parser (GPT-4o via LangChain)
    Segments each user message into:
      conversational | code | other
        │
        ▼
Analysis Notebooks
    ├── Linguistic analyses (spaCy, statsmodels, scipy, pingouin)
    ├── Code quality  (Pylint, Radon)
    └── Gender prediction  (RoBERTa fine-tune, LIME)

Statistical Methods

  • Group comparison: Welch's t-test, Mann-Whitney U, Fisher's exact test
  • Effect sizes: Cohen's d, odds ratio
  • Multiple testing correction: Bonferroni
  • Normality checks: Shapiro-Wilk
  • Explainability: LIME (Local Interpretable Model-agnostic Explanations)

Notebooks

notebooks/prompt_analysis/ — Linguistic Prompt Analyses

Notebook Description
00_Power_Analysis.ipynb Sample size and statistical power calculation
00_Mask_Prompts.ipynb Anonymisation of prompts for modelling
01_PromptLength_Raw_Prompt.ipynb Token & character length analysis (raw prompts)
01_PromptLength_Conversational.ipynb Length analysis on conversational prompt segments
02_Top_Used_Ngrams.ipynb Most frequent uni-/bi-/trigrams by gender group
03_Grammar_Spelling.ipynb Grammatical error rates and spelling correction analysis
03_Punctuation.ipynb Punctuation usage patterns
03_Word Count Analyses.ipynb Vocabulary richness and word-count statistics
04_Request_Type.ipynb Informational vs. involved request classification
04_Sentiment.ipynb Sentiment polarity analysis
06_Involved_Informational.ipynb Deep-dive into involved/informational language dimensions
10_Communication_Objectives_Quality.ipynb LLM-judged communication quality
10_Rating_Communication_Quality.ipynb Manual & automated communication quality ratings

notebooks/code_analysis/ — Code Quality Analyses

Notebook Description
01_Select_Prompt_Candidates.ipynb Filter & select prompts with extractable code
02_Translate.ipynb Translate non-English code comments / docstrings
03_Run_Prompts.ipynb Re-run prompts against LLM APIs for reproducibility
04_Parse_code.ipynb Extract and store code blocks from messages
05_Satisfaction.ipynb User satisfaction ratings analysis
06_Scores_On_All_Codeblocks.ipynb Aggregate Pylint + Radon scores
07_Pylint_Radon.ipynb Detailed linting and complexity analysis
08_Pylint_Codes.ipynb Breakdown of individual Pylint error/warning codes
09_Code_Quality_X_Gender_Request_Type.ipynb Code quality by gender × request type
10_LLM_as_a_Judge_CoT.ipynb Chain-of-thought LLM evaluation of code quality
11_Code_Quality_Correlations.ipynb Correlations between code quality metrics

notebooks/prediction/ — Gender Prediction

Notebook Description
07_Gender_Prediction.ipynb Baseline gender prediction experiments
08_Dataset_for_Roberta.ipynb Dataset preparation for RoBERTa fine-tuning
08_Roberta_Per_Prompt_*.ipynb Per-prompt RoBERTa fine-tuning (standard, masked, CoLab)
08_Roberta_Per_User.ipynb Per-user aggregated prediction
08_Roberta_Hyperparam_Search.ipynb Hyperparameter optimisation
08_Lime_Explainability*.ipynb LIME-based model explainability (masked & unmasked)
09_Test_Roberta.ipynb Final model evaluation
Push_Model_To_Hub.ipynb Upload fine-tuned model to HuggingFace Hub

Tech Stack

Category Libraries
Data & Storage pandas, numpy, SQLite
Web Scraping playwright, beautifulsoup4
NLP spacy (en_core_web_sm), transformers, wtpsplit, contractions
LLM APIs openai (GPT-4o), anthropic (Claude), langchain
ML / Deep Learning torch, scikit-learn, adapters, datasets
Statistics scipy, statsmodels, pingouin
Code Analysis pylint, radon
Explainability lime
Visualisation matplotlib, seaborn

⚙️ Setup

1. Install dependencies

pip install -r requirements.txt

2. Download spaCy language model

python -m spacy download en_core_web_sm

3. Configure API keys

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_key_here
ANTHROPIC_API_KEY=your_anthropic_key_here

4. Launch notebooks

jupyter notebook

📄 License

This project is part of a Master's thesis. All code is provided for academic transparency. The survey dataset is not redistributed due to participant privacy.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors