
Corporate adoption of generative AI technologies has accelerated rapidly, with 35% of companies incorporating GenAI into their operations in 2022. But security and ethical regulations have not kept pace, and careless use of GenAI technologies can proliferate harmful information and unethical decisions made on the basis of AI inaccuracies.
As aggregation engines, GenAI models are dependent on the learning data they are trained on. Biases are inherited and amplified, and if left unaddressed pose a risk to the model’s integrity. Knowing how to capture, triage, and process learning data securely and responsibly is a critical prerequisite to efficacy and ethical use.
In this blog, we examine 2 major risk areas inherent to the GenAI learning process, why they occur, and measures businesses can take to identify and prevent them.

Ghosts in the Machine: AI Hallucinations
What is an AI hallucination?
AI hallucinations occur when Large Language Models (LLMs) generate false information. False information encapsulates both deviations from external facts and “internal” errors – problems encountered within the AI’s contextual logic.
AI hallucinations illustrate a fundamental GenAI limitation: they can only produce content based on learning data, and cannot evaluate outputs against reality.
There are 4 broad types of AI hallucination:
•
Sentence contradictions generate sentences that contradict other sentences
•
Prompt contradictions produce content contrary to prompt specifications
•
Factual contradictions present fictitious information as factual
•
Random contradictions introduce information with no connection to inputs or outputs
Why do AI hallucinations occur?
While the precise causes differ from model to model, there are general factors that affect the likelihood of hallucinations. These include:
•
Data provenance and quality. Inaccurate information in the AI’s learning data will manifest in its output. Incorporating data sets from less reputable sources can increase the likelihood of assimilating incorrect information.
•
Generation and learning processes. Training procedures can introduce errors over time. Biases towards specific words and phrases can create faulty patterns as minor inaccuracies accumulate over successive generations.
•
Input quality. Accurate data sets and thorough training procedures will always struggle with inconsistent or contradictory prompts.
Minimizing hallucinations
Hallucinations can be difficult to spot because LLMs are trained to sound fluent and plausible. Deploy the following preventative and reactive countermeasures to minimize hallucination instances:
•
Fact checking. Data Science teams maintaining the AI application and its learning data should conduct frequent checks to remove blatantly erroneous results.
•
Clear and specific prompts. Providing context can help the AI eliminate nonsensical interpretations and guide the application towards intended output. Practices include:
○
Limiting possible output formats and types
○
Providing relevant, factual data sources as references
○
Framing the query within a role (e.g. “you are a programmer tasked with coding”), to clarify tone and positioning
•
Filtering and ranking methodologies. Experimentation with the model’s built-in parameters can reveal setting configurations that produce desired content.
•
Multi-shot prompting. Providing complete examples of the target format, tone, and positioning can help it recognize patterns and refine generated content.

At the Point of Capture: Data Collection, Privacy, and Compliance
The act of selecting, capturing, moving, and storing data for learning purposes is fraught with legal and ethical risks. Major dangers and best practices to manage them include:
•
Copyright and legal exposure. The vast volumes of data involved in GenAI learning risk producing outputs based on stolen intellectual property. Such theft can provoke legal action, leading to costly reputational and financial damage.
•
Get ahead on compliance by formulating internal ethical practices regulating the target type, source, capture, transit, and storage of data bound for GenAI applications. Internal guidelines will serve as a basis for future adaptations to legal precedents as industry regulations catch up.
•
Data privacy and consent. Many GenAI learning data sets inadvertently incorporate Personally Identifiable Information (PII) without individual consent. Text prompts can elicit said data, posing a serious risk to data privacy. As many LLMs are proprietary, it is also difficult to locate personal information.
•
Institute frequent and regular checks to ensure deployed LLMs are not embedding PII in their data sets. Alternatively, favor open source LLMs with transparent data processes over proprietary ones. Communications channels can also help individuals request PII deletion.
•
Changes to workforce roles. Workers are being increasingly displaced as GenAI assumes low-level run tasks, such as writing, coding, and analysis. Businesses must develop pathways for workers to stay relevant and contribute value.
•
Prepare employees for new roles created by generative AI applications, like prompt engineering. Review organizational and operational structures to map affected roles and resultant skill gaps, with a comprehensive view of changes needed. Retraining also serves to retool the workforce for growth.
Ultimately, both AI hallucinations and data collection challenges are inherent to the GenAI reliance on learning data.
Securing learning data is a bespoke challenge, with risks and requirements changing with enterprise needs. Expert advisory is recommended to manage learning data safely and optimize AI deployments.
Talk to a Wavestone expert for guidance on the challenges of generative intelligence and how to leverage GenAI applications for enhanced business performance.
6 Operational and Strategic Benefits of GenAI-Driven Tech Procurement
Nov 30, 2023
The procurement of technology services stands at a fascinating crossroads, with the introduction of generative AI marking a transformative shift in how organizations approach this critical function. Read our blog for 6 key operational and strategic capabilities enabled by GenAI-driven tech procurement.
Navigating Complex Procurement: 5 Challenges and Best Practices
Nov 23, 2023
Effective procurement drives efficiency, cost savings, and supply chain reliability, and comes with its fair share of complex challenges. Overcoming them requires a multifaceted approach integrating strategic thinking, innovative solutions, collaboration, and proactive risk management. Read our blog for a detailed examination of 5 major procurement challenges and top-line strategies for success.
Have a Question? Just Ask
Whether you're looking for practical advice or just plain curious, our experienced principals are here to help. Check back weekly as we publish the most interesting questions and answers right here.