The Risks of Using Generative AI for Marking

Understanding the risks and challenges of using generative AI for marking

September 1, 2025

AI is reshaping education and training, with apprenticeship providers increasingly exploring its potential to streamline assessment, improve feedback, and support learner progression. Yet, as with any emerging technology, the use of AI in marking is not without risks. From concerns around accuracy to safeguarding learner data, providers need to understand the limitations of different types of AI, and the regulatory implications of adopting them.

The risks of general-purpose large language models (LLMs)

When it comes to marking apprenticeship work, some providers have experimented with LLMs such as ChatGPT. These tools are attractive because of their ability to rapidly generate human-like text, making them appear well-suited to creating feedback. However, several risks limit their suitability in regulated apprenticeship environments.

Hallucinations are the most widely recognised issue. Generative AI models can produce confident, authoritative-sounding responses that are factually inaccurate, irrelevant, or inconsistent. In a marking context, this poses a significant threat: inaccurate feedback risks misleading learners and undermining their confidence. Worse still, it could result in incorrect judgements being recorded against standards, with implications for learner outcomes and organisational compliance.

Another major concern is data privacy. General-purpose generative AI tools are not designed for secure, sector-specific use. Feeding learner work or provider information into open platforms introduces risks around data storage, ownership, and potential reuse. For apprenticeship providers, whose regulatory obligations require safeguarding of learner information, this is an unacceptable exposure. In addition, confidential information regarding employers may be discussed in assignments, creating a risk of a breach that could damage the relationship between employers and providers.

A closer look at generative AI tools

Most providers recognise that tools like ChatGPT lack alignment with apprenticeship curricula and rubrics. Their outputs may be eloquent but not relevant to Ofsted or DfE requirements, leaving tutors with more work to validate and correct. To address this, there has been a rise in the development of purpose-built tools for assessment and marking. These platforms are designed with secure data handling, integration with apprenticeship standards, and structured rubrics in mind. They offer a significant step forward in addressing the privacy concerns associated with general-purpose AI.

However, at their core, many of these tools still rely on generative AI. While they can produce tailored feedback and help reduce repetitive marking tasks, the same risks of hallucination and inconsistency remain. Generative systems can recycle information in unhelpful ways, produce biased outputs, or misinterpret learner submissions, especially in nuanced contexts.

In high-stakes environments such as apprenticeships, where compliance with Ofsted and DfE requirements is essential, even a small margin of error can risk the reputation of the provider and the integrity of the qualification. Providers need a way to combine the efficiencies of AI with the rigour, transparency, and reliability that regulators demand. The answer is to keep tutors in the driving seat.

The value of classification AI

Classification AI offers a fundamentally different and safer approach for providers. Aptem’s marking aid is not powered by generative AI but by classification AI—an important distinction. Unlike generative models, classification AI does not invent content. It analyses learner submissions against clearly defined categories, rubrics, and criteria. This means:

No hallucinations: Classification AI cannot make things up, eliminating the risk of fabricated or misleading feedback.
Consistency and fairness: It produces stable, repeatable outcomes, reducing bias and ensuring learners are treated equitably.
Transparency: Every classification can be explained, enabling tutors and auditors to see exactly how the system reached its conclusion.
Regulatory alignment: Because it is structured around rubrics and standards, classification AI meets Ofsted and DfE requirements for objectivity and reliability in marking.

Equally important, Aptem’s marking aid is a walled garden. Unlike open platforms, all data remains securely within the provider’s ecosystem. Learner submissions and organisational practices are never shared externally, protecting intellectual property, institutional reputation, and compliance with GDPR.

Classification AI: a reliable foundation for AI innovation in marking

Classification AI offers a safer, more reliable foundation for innovation. By combining transparency, compliance, and consistency with efficiency gains, Aptem’s marking aid demonstrates how apprenticeship providers can harness AI responsibly to improve learner outcomes and tutor experience—without compromising on trust, quality, or regulatory obligations. This feature from Aptem Enhance augments the tutor’s role rather than attempting to replace it. It provides a robust, compliant, and scalable way to ensure that marking and feedback remain consistent, high-quality, and learner focused.

Understanding the risks and challenges of using generative AI for marking

The risks of general-purpose large language models (LLMs)

A closer look at generative AI tools

The value of classification AI

Classification AI: a reliable foundation for AI innovation in marking

Share this post with your friends

Let's connect

Company

Head Office

Understanding the risks and challenges of using generative AI for marking

The risks of general-purpose large language models (LLMs)

A closer look at generative AI tools

The value of classification AI

Classification AI: a reliable foundation for AI innovation in marking

Share this post with your friends

Aptem Enhance - new features