2026-01-13

Reliable AI Pitfalls

To achieve reliable AI outcomes, businesses must navigate a complex landscape of technical bottlenecks and strategic organisational shifts. The sources describe this current state as a mix of rapid progress and significant "choke points" that require targeted investment to overcome.

Technical Challenges

The transition from generative AI assistants to reliable agentic systems hinges on several technical hurdles:

Evaluation and Validation: A primary challenge is the lack of established, systematic methods to measure AI accuracy. Because every agent or domain involves a unique "corpus of knowledge", testing them against known answers becomes highly complex.
Interoperability and Standardisation: For AI agents to work effectively across different systems and databases, a dominant AI protocol (similar to TCP/IP for the internet) must emerge to standardise communication and prevent vendor lock-in.
Infrastructure for Real-Time Data: AI agents require transactional databases, such as Postgres, that are optimised for high-frequency, low-latency operations to store and access real-time data points instantly.
Scaling Context and Memory: While context windows are expected to grow from thousands to millions of tokens, the exponentially higher cost associated with processing more tokens remains a significant barrier for enterprise-wide adoption.
Security and Safety: Businesses must defend against adversarial attacks, prompt injection, and hallucinations. Furthermore, connecting agents to external data expands the "attack surface", necessitating sophisticated product research into defensive capabilities.

Strategic Challenges

Beyond technical capability, the reliability of AI is determined by how an enterprise prepares its data and its people:

Data Strategy and Governance: AI outcomes are directly limited by the quality and governance of the underlying data. Organisations must ensure that proprietary data is not exposed and that users only see information they are permitted to access.
Documenting Decision-making Logic: A significant strategic hurdle is that many companies do not explicitly understand their own decision-making processes. Since higher-order business logic is often informal or undocumented, agents currently work best in tactical, structured scenarios rather than nuanced strategic ones.
Establishing Feedback Loops: Reliability is improved through continuous feedback loops where users accept, modify, or reject AI outputs, allowing the system to refine its responses over time.
Workforce Upskilling: The "AI-augmented workforce" requires a shift from technical execution to strategic thinking and orchestration. Workers must move from simply writing code or performing tasks to clearly describing desired outcomes and managing "digital subordinates".
Management of Agents: Successful adoption requires the management of agents not just their rollout. This includes defining what "a job well done" looks like for an AI and measuring it against ethical guardrails and expected outcomes.

To understand the evolution of these systems, one might think of the development of agentic AI like grading a student: to ensure they are learning correctly, you must verify every step of their workflow, provide a clear rubric for success, and only allow them to progress once they achieve a "minimum acceptable grade-point average" for the entire task.