You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Examining Prompting Styles and Bias in Large Language Models
Master's Thesis Research Project — An empirical study investigating whether and how gender shapes the way people prompt AI coding assistants, and whether these differences translate into measurable differences in generated code quality.
Overview
This repository contains the full analytical pipeline for a study in which participants solved real-world programming tasks using LLMs (ChatGPT, Claude). Their conversations were collected via an online survey, stored in a structured database, and analysed across three dimensions:
Pylint score, Radon cyclomatic complexity, maintainability index
Gender Prediction
Logistic Regression, Support Vector Machine, Fine-tuned RoBERTa classifier trained on user prompts
Research Questions
Prompting Style — Do cisgender men and women differ in how they write prompts to LLMs (length, formality, politeness markers, sentence structure, request framing)?
Code Quality — Does the gender of the prompter correlate with the quality of the LLM-generated code?
Gender Predictability — Can a machine learning model reliably predict a user's gender from their prompts alone?
Data Pipeline
Online Survey (LimeSurvey)
│
▼
Playwright Scraper ──► Raw chat HTML from ChatGPT / Claude share links
│
▼
Importer ──► SQLite (giicg.db)
│
├── Language Detection (xlm-roberta-base-language-detection)
├── Translation DE/IT → EN (HuggingFace Helsinki-NLP)
├── Spelling Correction (oliverguhr/spelling-correction-english-base)
└── Contraction Expansion
│
▼
Prompt Parser (GPT-4o via LangChain)
Segments each user message into:
conversational | code | other
│
▼
Analysis Notebooks
├── Linguistic analyses (spaCy, statsmodels, scipy, pingouin)
├── Code quality (Pylint, Radon)
└── Gender prediction (RoBERTa fine-tune, LIME)
Statistical Methods
Group comparison: Welch's t-test, Mann-Whitney U, Fisher's exact test
This project is part of a Master's thesis. All code is provided for academic transparency. The survey dataset is not redistributed due to participant privacy.