VTB experts share crucial strategies for preventing fabricated facts and errors in neural networks.

Generative neural networks frequently produce text that appears credible but contains inaccuracies or entirely fabricated information. This phenomenon, often termed «hallucinations,» stems from the models` reliance on statistical probabilities rather than a genuine understanding of context or meaning. To mitigate these risks, experts from VTB recommend several key practices: formulating precise queries, diligently verifying outputs manually, and integrating additional validation tools. Furthermore, it is vital to train AI systems on high-quality, pre-verified data and to implement robust protective mechanisms capable of halting the model if it generates unreliable content.

VTB experts discussed how to avoid fabricated facts and neural network errors. — Photo: freepik-ru.freepik.com

Identifying and combating AI «hallucinations»—the generation of plausible but factually incorrect text—is paramount. Such errors can lead to misinformation for users and pose substantial financial and reputational threats to businesses.

Lev Merkushov, Head of AI Solutions Development at VTB, explained: «A neural network might suggest non-existent books or invent product features. It doesn`t fact-check; instead, it predicts the most probable responses. Therefore, it`s crucial to phrase requests clearly. However, the ultimate control measure remains human verification.»

According to Alexey Pustynnikov, the team lead for model development, understanding the underlying causes of these errors is fundamental to preventing them. Language models do not grasp the intrinsic meaning of information and do not perform real-time verification. Consequently, they may distort facts, invent data, or fail to follow instructions.

«Hallucinations can be categorized into three types,» Pustynnikov clarified. «First, factual errors, where the model gets known data wrong, like providing an incorrect date. Second, fabrication, where the AI invents or exaggerates information. Third, instruction execution errors, where the model disregards context or makes logical mistakes, for instance, claiming that two plus two equals six.»

The roots of these inaccuracies lie in the models` training methodologies. They generate responses based on probabilities rather than comprehension. When information is scarce, the AI tends to «fill in» the gaps. The limitations of the training dataset also contribute; models may be unaware of events post-training and cannot verify facts in real-time. Errors can also arise from insufficient processing of uncommon topics or inaccuracies embedded within the training data itself.

Merkushov further stated: «Complex and abstract tasks are another factor that elevates the risk of errors. To minimize these, tasks must be formulated with utmost precision. The `chain-of-thought` method, which involves breaking down a query into smaller, manageable steps, proves highly effective. Furthermore, systems that consult verified databases before generating an answer are employed. Models are also fine-tuned using specialized datasets, and protective mechanisms, known as AI guardrails, monitor the model`s output and can intervene if errors occur.»

VTB actively implements cascading solutions, where multiple AI models sequentially process and cross-validate information, checking each other`s outputs. This multi-layered approach is utilized in various applications such as speech recognition, forecasting cash withdrawals, and managing ATMs. For generative AI, the bank is developing cascading models specifically for intelligent search functionalities within its corporate knowledge bases.

Significant emphasis is placed on the quality of the training data. Pustynnikov added: «Our filtering process ensures that only texts with minimal unreliable information are included. Sources undergo rigorous expert review, which, while improving data quality, also increases the costs associated with model training.»

Experts underscore that the successful deployment of AI necessitates not only advanced technology but also a responsible commitment to data quality, algorithmic transparency, and meticulous oversight of results. This comprehensive approach empowers businesses to minimize errors and build stronger client trust.

Minimizing AI Hallucination Risks