AI-Driven Capstone Project Evaluation: Executable Code Review for Engineering Cohorts

Abhi Anand7 May 20267 min read

Capstone project evaluation in Indian engineering programmes is one of the most under-resourced assessment areas. Here is how AI-driven executable code review brings real depth and consistency to the rubric.

Two hundred capstone projects, due the second week of April. Four faculty on the evaluation panel. Each project is supposed to be a full semester of work, an actual functioning system, a 60-page report, a 20-minute demo, and a 30-minute viva.

In reality, each faculty member gets fifteen minutes per project. They skim the report, watch the demo if it runs, ask one viva question, and award marks they cannot defend in any meaningful detail. The students with polished demos get higher marks than the students with deeper work. Everyone knows this is a problem; the panel does not have time to fix it.

AI-driven capstone evaluation closes this gap, not by replacing the panel, but by giving the panel a structured, executable, defensible signal before they walk into the viva room.

Why Capstone Evaluation Is Especially Hard

Three specific reasons capstone assessment is the most stubborn evaluation problem in Indian engineering education.

Volume against panel size. 200 projects, four faculty, two-week window. The maths is the same problem the COE has on exam evaluation, except the artefact is software, not paper.

The skill gap inside the panel. A panel that has to evaluate a deep-learning project, a distributed-systems project, an embedded-IoT project, and a mobile-app project may not have deep expertise in all four. Faculty default to surface assessment because they cannot fairly assess depth in a domain they do not actively work in.

The demo bias. Students with strong front-end polish present better than students with strong back-end engineering. The panel sees the front-end. The back-end gets unrewarded.

What AI-Driven Capstone Evaluation Does

It runs structured analysis on the project artefacts, the code, the documentation, the demo recording, and the architecture diagram, and produces a multi-dimensional evaluation that the panel reviews before the viva. The viva becomes the final layer of judgement, not the only layer of assessment.

Five analysis layers run in parallel.

Layer 1: Executable code review. The system clones the repository, builds the project, runs the test suite, and reports pass/fail, coverage, and obvious correctness issues. For a project that does not build, this is immediately visible. For a project that builds and tests pass, the analysis moves to depth.

Layer 2: Static code analysis. Architectural patterns, code quality (linter pass, cyclomatic complexity, dead code, duplication), language-idiomatic usage, and dependency hygiene. A project that uses 47 dependencies for what should have been 6 is flagged.

Layer 3: Domain-specific rubric. The faculty author has defined what "strong" looks like for the project category. For an ML project, this includes appropriate model selection, data hygiene, evaluation methodology, and reproducibility. For a distributed system, it includes consistency model choices, failure handling, and observability. For an embedded project, it includes power profile, real-time guarantees, and hardware-software boundary.

Layer 4: Documentation depth. The README, the architecture document, the test plan, the API documentation. Quality, not length. A 6-page README that explains the system well beats a 60-page report that is mostly screenshots.

Layer 5: Demo evaluation. The recorded demo is analysed for what the project actually does (versus what the report claims), edge-case handling shown, and the gap between claimed scope and demonstrated scope.

Plagiarism and LLM-Detection Layer

A separate but linked layer addresses academic-integrity questions specifically.

Code plagiarism detection against public repositories and against the cohort's own submissions. Cross-cohort and cross-year comparisons surface obvious copies.

LLM-generated content detection for the report. AI-generated prose has signature patterns (uniform sentence length, specific vocabulary distributions, characteristic transition phrases). The detector produces a likelihood score, not a verdict; the panel decides.

Commit-history analysis for the code. A project that materialised entirely in three commits the night before the deadline tells a different story than one that has 120 commits across the semester. Commit history is a useful, hard-to-game signal.

None of these are accusatorial automation. They are flags that surface to the panel for judgement.

The Per-Student Project Report

Before the viva, each panel member gets a structured per-student report.

Project: "Distributed key-value store with leader election."

Executable status: Builds, 87% test coverage, 12 of 14 tests passing.

Architecture: Implements Raft consensus, leader election present, log replication present, snapshot mechanism missing.

Code quality: Strong. Idiomatic Go. Modular structure. Minor concurrency anti-pattern in heartbeat handler.

Documentation: Architecture doc complete. API doc present. README clear. Test plan absent.

Demo: Shows three-node cluster, demonstrates leader failure and re-election. Does not show network partition handling.

Integrity: Code plagiarism: low (one common pattern shared with another submission, likely from class material). LLM-generated report content: low. Commit history: 87 commits across 14 weeks, evenly distributed.

Recommended viva focus: Why snapshot mechanism was not implemented. How the student would handle network partition. Why the chosen concurrency pattern in the heartbeat handler over a channel-based alternative.

This is the report that lets the panel ask viva questions that go to the depth of the student's actual work, instead of asking the seven generic questions the panel asks everyone.

The Panel's Role

The panel's job is not to accept the AI report; it is to use the report to ask better questions and produce a more defensible final mark.

In practice, the panel's conversation in the viva becomes more substantive. The questions are project-specific. The student is challenged on the gaps the report surfaces. The mark is signed off by a named faculty member who took responsibility for it, with the AI report as one of multiple inputs.

What This Replaces

It replaces three patterns most faculty privately know are problems.

The "demo bias" pattern. Polished UI wins over deep engineering. AI evaluation surfaces depth.

The "panel-skill-gap" pattern. A panel without deep ML expertise can still fairly assess an ML project, because the structured layer covers the depth dimensions.

The "demoralised faculty" pattern. Faculty asked to evaluate 50 projects in 12 hours stop trying. Faculty with structured reports per project rediscover the actual academic conversation that capstone is supposed to be.

What It Does Not Replace

The viva. The judgement on whether the project genuinely advanced the student's capability. The mentoring relationship between supervisor and student. The institutional decision on whether a project qualifies for honours, distinction, or a particular award. AI surfaces signal; the academic judgement stays with the faculty.

The Compliance Layer

Student code repositories and project reports are personal academic data. Under the DPDP Act, retention follows UGC and university policy. Code analysis runs inside the institution's tenant; no student code is shared with public model providers. The audit trail of every analysis run, every panel access, and every viva mark is logged and exportable.

What Implementation Takes

For a department of 200-400 students with active capstones, implementation runs about 60 days for the first cycle. Rubric design with faculty in month one, integration with the project submission system in month two. The first cohort goes through with the AI report supplementing the existing viva process. By the second cycle, the panel rhythm has reorganised around the structured reports.

For the integrated skills assessment module including capstone evaluation, mock interviews, and continuous diagnostics, see the product page. For the broader placement-readiness context, see why static aptitude tests are failing.

Frequently asked questions

It is a structured analysis pipeline that runs on the artefacts of a capstone project — the code, the documentation, the demo recording, the architecture diagram — and produces a multi-dimensional evaluation report. The faculty panel reviews the report before the viva, asks deeper questions during the viva, and produces a final mark with the AI report as one of multiple inputs.

No. The AI produces a structured analysis across five layers: executable code review, static analysis, domain-specific rubric, documentation depth, and demo evaluation. The panel reviews these, conducts the viva, and signs off the final mark. The AI surfaces signal; the academic judgement stays with the faculty.

Three layers. Code plagiarism detection compares against public repositories and the cohort's own submissions. LLM-content detection on the report produces a likelihood score based on signature patterns in AI-generated prose. Commit-history analysis looks at how the code materialised over the semester. None of these are accusatorial — they are flags that surface to the panel for judgement.

Executable status (builds, tests, coverage), architecture analysis, code quality, documentation depth, demo evaluation, integrity flags, and a recommended viva focus. The report is designed to let the panel ask project-specific questions in the viva instead of seven generic questions every student gets.

No, in any responsible deployment. Code analysis runs inside the institution's tenant on a controlled environment. Repositories, reports, and student data are not sent to public model APIs. This is what makes capstone evaluation DPDP-compliant and what makes the analysis defensible against IP-leakage concerns.

Written by

Abhi Anand

Founder & CEO of QverLabs, helping enterprises deploy Enterprise AI Solutions and achieve DPDP Act compliance at scale. Ex-PwC, Ex-EY Director with 16 years in consulting, and global experience working with Fortune 500 companies across banking, retail, healthcare, financial services, and enterprise technology.

Schedule a call