TestMaker

I built a research-focused tool that uses Large Language Models (LLMs) to automate JUnit test generation from natural language problem descriptions. The tool supports zero-shot, few-shot, and chain-of-thought prompting strategies through LangChain and Ollama integration.

To evaluate effectiveness, I engineered a scalable benchmarking pipeline that processed 26K+ Java submissions against 15K+ generated test cases, measuring compilation success, failure categorization, test reliability, and redundancy.

The platform supports model-wise performance comparison, and dynamic test tagging to analyze edge cases and reasoning patterns.

GitHub Repository

Next Project

Mailtor