EU AI Act Article 10 and 14: An Operational Governance Manual for High-Risk AI
A practical EU AI Act governance manual for high-risk AI teams: Article 10 data controls, Article 14 human oversight, audit trails, bias testing, and enforcement dates.
High-risk AI governance is no longer a policy slide. Under the EU AI Act, Regulation (EU) 2024/1689, teams need operating controls for data quality, bias, traceability, human oversight, and audit evidence before high-risk systems reach users.
Start with classification: when an AI system becomes high-risk
The first operational question is whether the system is high-risk. The EU AI Act applies progressively after 1 August 2024, with high-risk rules mainly landing in 2026 and 2027, so classification must happen before model training or launch.
In practice, classification should happen in a product risk workshop. Bring product, legal, data science, security, and operations together. Ask what decision the AI influences, who is affected, and whether a person can realistically challenge it.
A recruitment ranking tool in Rotterdam, a credit scoring model in Madrid, or an education assessment system in Berlin may look like normal automation. Under Annex III, those systems can trigger high-risk duties because they affect access to work, money, or learning.
The high-risk decision tree
Use two gates. First, check if the AI is a safety component in a regulated product. Second, test Annex III: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, justice, or democracy.
Do not classify only the model. Classify the full AI system: data inputs, model logic, user interface, API outputs, workflow actions, and the business decision it supports. A low-risk model can become high-risk inside a sensitive process.
The safest workflow is a written decision record. Note why the system is or is not high-risk, which Annex III domain was checked, what evidence supports the view, and who approved the classification before launch.
Article 5 red lines: systems to stop before launch
Article 5 is the stop sign. Ban checks should cover manipulative systems, exploitation of vulnerabilities, social scoring, untargeted facial scraping, workplace emotion recognition, sensitive biometric categorisation, and narrow biometric ID cases.
The red-line review should happen before procurement and before proof-of-concept work. Teams should not spend weeks testing a use case that later fails because it relies on prohibited manipulation, social scoring, or sensitive biometric categorisation.
A useful question is simple: would the system change how a person is treated without their awareness, consent, or fair appeal route? When the answer is yes, legal review should happen before any vendor demo, dataset pull, or pilot.
Article 10: make data governance measurable
Article 10 requires training, validation, and testing data to be relevant, representative, sufficiently error-free, and complete. Turn those words into gates: source approval, lineage records, missingness checks, label QA, and bias review.
Article 10 is not asking for perfect data. It asks for data governance that matches the intended purpose. A fraud model, a hiring screen, and a medical triage tool each need different evidence because their users, harms, and error costs differ.
For readers, the key is this: data governance becomes proof. When a regulator, customer, or affected person asks why an output happened, the team should show where the data came from, why it was suitable, and how errors were controlled.
Dataset quality KPIs for high-risk AI
Create a quality scorecard before training. Track intended-use fit, target-population coverage, missing critical fields, label accuracy, edge-case coverage, and known data gaps. A model should not move forward when its evidence is weak.
A good scorecard includes thresholds and owners. For example, decide who approves missing-data exceptions, who validates label quality, who checks subgroup coverage, and when new data must be collected instead of patched.
Representativeness is often where teams fail. The dataset may look large but still miss older applicants, rural users, disabled people, non-native speakers, or edge cases that appear rarely but matter deeply when harm occurs.
Lineage: prove which data trained which model
For every dataset, record source, consent or legal basis, collection date, preprocessing steps, version hash, and owner. Link the data version to the model version so an auditor can reconstruct the decision path months later.
Lineage should be boring by design. Every transformation should leave a record: filtering, deduplication, normalisation, feature engineering, imputation, labelling rules, and synthetic data generation where used.
Use version IDs that engineering and legal can both understand. A Git hash is useful, but add plain-language release notes: which dataset changed, why it changed, and whether the change affects model behaviour.
Bias control belongs before training, not after harm
Run a bias gate before training: historical bias, measurement bias, representation gaps, aggregation errors, and proxy features. A postcode, school, device type, or work history field can quietly stand in for protected traits.
Bias work should include people who understand the domain. A data scientist can measure disparate impact, but a recruiter, teacher, clinician, or case worker may recognise a proxy feature that looks harmless in a spreadsheet.
A useful review asks: who benefits from false positives, who suffers from false negatives, and whose history is missing from the data? That framing makes fairness testing concrete for business teams.
Mitigation: document the fix, not just the metric
When bias appears, choose a traceable mitigation: resample, collect more data, remove a proxy feature, re-label examples, adjust thresholds, or add fairness constraints. Record who approved the change and why it was proportionate.
Mitigation should not mean hiding bad results. If a model performs poorly for one group, record the trade-off: collect better data, narrow the use case, add human review, or block the system from making that category of decision.
The strongest evidence is a before-and-after record. Show the original risk, the chosen mitigation, the validation result, and the residual risk accepted by the accountable owner.
Technical documentation is regulatory evidence
Treat the technical file as the system’s memory. Include intended purpose, model design, datasets, tests, limitations, foreseeable misuse, risk controls, human oversight, cybersecurity, and post-market monitoring evidence.
The documentation should be useful during an incident, not only during an audit. A support lead should be able to read it and understand what the system does, where it is weak, and when it must be escalated.
Keep model cards, test reports, data sheets, risk assessments, release notes, and monitoring plans together. Scattered evidence is the enemy of audit readiness when a regulator asks for proof quickly.
FRIA: connect model risk to fundamental rights
A Fundamental Rights Impact Assessment should name affected people, context, privacy risks, discrimination risks, necessity, mitigation, and review cadence. It makes rights protection operational, not abstract.
A FRIA is strongest when it includes real context: affected groups, decision stakes, appeal options, human alternatives, and consultation notes. It should explain why AI is justified, not simply describe the technology.
For example, a school AI tool in Copenhagen should assess pupils, parents, teachers, data privacy, appeal routes, and equality impacts. The same template will not fit an insurance pricing model in Milan.
Logging: the audit trail must survive pressure
Logs should capture timestamp, system ID, model version, data version, input hash, output, confidence, override flag, and human overseer. Store enough evidence to explain a decision without exposing unnecessary personal data.
Logging also protects the business. If a rejected applicant, patient, tenant, or customer challenges an outcome, the team can show the version used, the input class, the review step, and whether a human changed the result.
Good logs avoid two extremes. They are detailed enough to explain decisions, but limited enough to avoid becoming a new privacy risk. Hash sensitive inputs where possible and control access to raw records.
Article 14: human oversight must be designed in
Article 14 says oversight must prevent or minimise risks to health, safety, and fundamental rights. Do not bolt this on later. Design review screens, escalation routes, stop controls, and override logging from day one.
Oversight fails when humans only rubber-stamp outputs. Give reviewers the context they need: key factors, confidence, uncertainty, alternatives, warning flags, and a clear route to disagree with the AI.
The interface should make intervention easy. If the stop button is hidden, the confidence score is unclear, or the audit reason is free text only, human oversight will degrade during busy operational moments.
Choose HITL or HOTL by consequence
Use HITL when a person must approve each output before action, such as medical triage or hiring rejection. Use HOTL when a trained overseer monitors operations and can halt, reverse, or escalate the system.
The choice is not philosophical. It depends on consequence, speed, reversibility, and user vulnerability. A reversible workflow may use monitoring. An irreversible denial of opportunity usually needs active approval.
Define escalation thresholds in advance. Low confidence, protected-group disparity, missing critical data, unusual inputs, or conflicting records should move the case to a trained person before the system acts.
The override protocol
A useful override has four steps: detect the anomaly, halt or override the AI action, complete the decision manually, and log the reason. The reason field should be structured, searchable, and reviewed after incidents.
The best override protocols are tested in drills. Ask reviewers to handle messy cases: incomplete files, conflicting evidence, demographic edge cases, model drift, and a senior stakeholder pressuring them to accept the AI output.
Do not punish thoughtful overrides. If staff learn that disagreeing with the model creates friction, they will stop intervening. Review override patterns as safety signals, not only as productivity exceptions.
Oversight competence and sandbox testing
Train overseers on system limits, automation bias, confidence scores, protected groups, and escalation. Use Article 57 regulatory sandboxes where useful to test oversight before full production deployment.
Training should include examples from the actual workflow. A generic AI literacy course is useful, but overseers also need case studies, screenshots, escalation rules, and practice with the system they will supervise.
Refresh training after model changes, data changes, incidents, or regulatory updates. A reviewer trained on last year’s workflow may miss new risks after a model release or vendor integration.
Compliance timeline: dates teams should plan around
Plan around 1 August 2024 entry into force, 2 February 2025 prohibited rules, 2 August 2025 GPAI rules, and high-risk duties from 2026 to 2027, subject to official EU implementation updates.
Because Brussels may refine parts of the timetable through later implementation packages, teams should maintain a living compliance calendar. Use official EU updates as the source of truth before launch or conformity work.
Penalty exposure: why governance is cheaper than repair
Penalty exposure is real: prohibited practices can reach €35 million or 7% of worldwide annual turnover. Other high-risk failures can reach €15 million or 3%, making governance cheaper than emergency remediation.
The audit-ready evidence pack
Keep one evidence pack: risk classification, technical file, data scorecards, bias tests, FRIA, oversight training, override logs, post-market monitoring, CE marking where relevant, and the EU Declaration of Conformity.
The evidence pack should be maintained continuously, not assembled after a complaint. Assign an owner, review date, storage location, and change log so the file stays current as the system evolves.
A strong pack helps sales, security, and customer success too. Enterprise buyers increasingly ask how AI decisions are governed, who can override them, and whether evidence exists when something goes wrong.
How Feluda.ai teams should operationalise this
In Feluda.ai workflows, assign every high-risk action an owner, approval step, data boundary, audit trail, and escalation rule. Governance should feel like a reliable workflow, not a separate compliance spreadsheet.
For deeper implementation, pair this manual with a practical audit trails guide. High-risk AI needs reviewable records that connect data, model behaviour, human decisions, and business accountability.
For approval design, connect this governance manual to an AI approval workflows guide. High-risk AI reviews should be fast enough for operations and strict enough for legal accountability.
For data boundary work, pair Article 10 controls with an AI data privacy guide. High-risk datasets should be useful for validation without exposing more personal data than the workflow truly needs.