Why Your Modernisation Strategy Must Account for AI Before You Need It
The architectural decisions you make today will determine whether you can adopt AI in 12 months — or spend 18 months rebuilding to get there.
Introduction
“In the last 18 months, I have watched three enterprise customers complete successful cloud modernisations — and immediately begin re-architecting for AI. Not because the modernisation failed. Because it succeeded in solving the problem they had in 2024, without anticipating the problem they would have in 2026. The CRM we containerised on ECS Fargate runs beautifully. It just cannot feed a Bedrock knowledge base without a 6-month data engineering project that nobody budgeted for. That gap is preventable. This post is about how to prevent it.”
1. The Modernisation Debt You Don’t See Coming
Every enterprise I work with in presales is having two conversations simultaneously: “How do we modernize our legacy platforms?” and “How do we adopt AI?” The problem is, these conversations happen in different rooms — with different stakeholders, different timelines, and different budgets.
The result is predictable: organisations invest 12–18 months modernising their infrastructure — containerizing applications, automating deployments, implementing observability — and then discover that their freshly modernised platform is fundamentally unprepared for AI workloads. The data is in the wrong format, the wrong place, or the wrong structure.
The lesson is clear: modernisation without AI-readiness is incomplete modernisation. Not because every enterprise needs AI today — but because the cost of retrofitting AI-ready data patterns later is 3–5x higher than embedding them from the start.
2. Why This Matters Now: The Data Foundation Is the AI Bottleneck
The market has shifted. AI is no longer an R&D experiment — it’s a board-level strategic priority. But the gap between AI ambition and AI execution is almost always a data problem, not a model problem.
According to Gartner’s February 2025 research, the data foundation gap is so severe that organisations will abandon 60% of AI projects unsupported by AI-ready data through 2026 (Gartner, “Lack of AI-Ready Data Puts AI Projects at Risk,” February 2025). The differentiator is not which model you choose. It’s whether your data is structured, accessible, and governed in a way that AI systems can consume.
Three market forces are making AI-ready data foundations urgent:
- Agentic AI demands structured, accessible enterprise data — Gartner’s Top Strategic Technology Trends for 2025 predicts that 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. These agents need semantically indexed, permission-aware, real-time data access. If your data isn’t ready, your agents are useless.
- The cost of retrofitting is exponential — Adding vector search, embedding pipelines, and governance to an existing platform retrofits every layer. Designing for it during modernisation adds ~15% to initial project cost. Retrofitting later adds 3–5x that amount.
- Competitive differentiation is shifting to data quality — Every enterprise has access to the same foundation models. The organisations that win are those whose proprietary data is clean, structured, and accessible to AI systems. Your data is your moat — but only if it’s AI-ready.
3. What “AI-Ready” Actually Means: A Practical Definition
“AI-ready data” is not a marketing term — it’s a set of specific architectural properties that determine whether your data can be consumed by AI systems without significant re-engineering.
| Property | What It Means | Why AI Needs It |
|---|---|---|
| Structured & catalogued | Data has metadata, schemas, and lineage tracking | AI systems need to discover and understand data without human interpretation |
| Semantically searchable | Data can be queried by meaning, not just keywords | RAG, agents, and copilots search by intent, not exact match |
| Embeddable | Data can be converted to vector representations | Foundation models consume embeddings, not raw database rows |
| Governed & permissioned | Access controls, retention policies, PII classification | AI systems must respect the same data boundaries as humans |
| Fresh & synchronised | Data reflects current state, not stale snapshots | AI answers are only as good as the data they’re grounded in |
| Multi-modal accessible | Text, documents, images, structured data all queryable | Modern AI is multi-modal — your data layer should be too |
The reference architecture below connects the four layers that together form an AI-ready data foundation. The layers are sequential — data flows from enterprise sources through ingestion and storage into AI consumption — with governance enforced at every stage.
Architecture Diagram

Layer 1: Data Ingestion & Integration
The problem this solves: Enterprise data lives in dozens of sources — CRM databases, ERP systems, document repositories, SaaS applications, operational logs. AI systems need unified access without building point-to-point integrations for each source.
- AWS Glue — Serverless ETL for batch data integration, schema discovery, and data cataloguing
- Amazon Kinesis Data Streams — Real-time data ingestion for operational events
- AWS Glue Data Catalog — Centralised metadata repository that makes data discoverable to AI systems
- Zero-ETL integrations — Direct data flow between operational databases and analytics environments without pipeline management
Strategic decision: Why Glue + Zero-ETL over custom pipelines?
Custom ETL pipelines give you maximum control — but they also give you maximum maintenance burden. AWS Glue handles the undifferentiated work — schema discovery, job scheduling, error handling, auto-scaling — while Zero-ETL integrations eliminate pipelines entirely for supported source-destination pairs.
The real value for AI readiness: Glue Data Catalog creates a metadata layer that AI systems can query to understand what data exists, where it lives, and what it means — without human intervention. This is the foundation for autonomous AI agents that can discover and access enterprise data on their own.
# AWS Glue job — Transform CRM data and generate embeddings for AI consumption
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from pyspark.context import SparkContext
import boto3, json
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
# Read from CRM database via Glue Data Catalog
crm_data = glueContext.create_dynamic_frame.from_catalog(
database="crm_database",
table_name="customer_interactions"
)
# Transform: flatten, standardise timestamps, concatenate text fields
transformed = crm_data.apply_mapping([
("customer_id", "string", "customer_id", "string"),
("interaction_date", "timestamp", "interaction_date", "timestamp"),
("interaction_type", "string", "interaction_type", "string"),
("subject", "string", "subject", "string"),
("body", "string", "body", "string"),
("resolution_status", "string", "resolution_status", "string")
])
# Write to S3 in Parquet — AI-ready for analytics AND embedding pipelines
glueContext.write_dynamic_frame.from_options(
frame=transformed,
connection_type="s3",
connection_options={
"path": "s3://crm-data-lake/ai-ready/customer-interactions/",
"partitionKeys": ["interaction_type"]
},
format="parquet"
)
Key design decision: Writing to Parquet format partitioned by interaction type serves dual purposes — analytics tools (Athena, Redshift Spectrum) can query it efficiently, AND embedding generation pipelines can process it in parallel by partition. One write, two AI consumption patterns.
Layer 2: Vector Storage & Semantic Search
Traditional databases answer “find records where customer_id = 12345.” AI systems need to answer “find interactions similar to this customer complaint about delivery delays.” That requires vector representations of your data — embeddings that capture semantic meaning.
| Option | Best For | Trade-off |
|---|---|---|
| S3 Vectors | Massive scale (billions of vectors), cost-sensitive workloads, AI agent memory | Highest scale, lowest cost per vector — GA since Dec 2025 |
| Aurora pgvector | Applications already using PostgreSQL, need transactional + vector in one DB | Familiar tooling, but vector performance limited at very large scale |
| OpenSearch | Hybrid search (keyword + semantic), log analytics + AI | Excellent search, but higher operational cost |
| Bedrock Knowledge Bases | Fastest time-to-value, fully managed RAG, no infrastructure management | Least control, but zero operational burden |
My recommendation: Start with Bedrock Knowledge Bases for your first AI use case — it gets you from zero to working RAG in days, not months. Then evaluate S3 Vectors or Aurora pgvector for production workloads where you need more control. The mistake I see most often: teams spending 3 months evaluating vector databases before validating that their AI use case delivers business value.
# Create an S3 Vectors vector index for CRM customer interaction embeddings
aws s3vectors create-vector-bucket \
--vector-bucket-name crm-ai-embeddings
aws s3vectors create-vector-index \
--vector-bucket-name crm-ai-embeddings \
--vector-index-name customer-interactions \
--dimension 1024 \
--distance-metric cosine \
--metadata-configuration '{
"fields": {
"customer_id": {"dataType": "str"},
"interaction_type": {"dataType": "str"},
"interaction_date": {"dataType": "str"},
"resolution_status": {"dataType": "str"}
}
}'
The metadata configuration is the critical AI-readiness pattern. By attaching structured metadata to each vector, you enable filtered vector search — “find similar complaints, but only for enterprise customers in the last 90 days.” Without metadata, vector search returns semantically similar results with no business context filtering.
Layer 3: AI Consumption — Knowledge Bases & Agents
Raw data and vector embeddings are not useful to end users. The AI consumption layer connects your data foundation to the applications and agents that deliver business value.
- Amazon Bedrock Knowledge Bases — Managed RAG that connects foundation models to your enterprise data
- Amazon Bedrock AgentCore — Platform for building, deploying, and managing AI agents at scale (launched 2025). Provides memory, tool execution, and multi-agent orchestration.
- Amazon Q Business — Enterprise AI assistant that connects to corporate data sources with zero custom development. Unlike Bedrock Knowledge Bases (which requires developer effort to build an application layer), Q Business provides a ready-made conversational interface for non-technical employees — think of it as “enterprise ChatGPT over your internal data” that IT can deploy in days without writing application code.
- Amazon Bedrock Guardrails — Content filtering, topic blocking, and PII redaction for AI outputs
- AWS Lambda — Serverless compute for AI orchestration and custom logic
{
"name": "crm-customer-knowledge-base",
"knowledgeBaseConfiguration": {
"type": "VECTOR",
"vectorKnowledgeBaseConfiguration": {
"embeddingModelArn": "arn:aws:bedrock:ap-south-1::foundation-model/amazon.titan-embed-text-v2:0"
}
},
"storageConfiguration": {
"type": "OPENSEARCH_SERVERLESS",
"opensearchServerlessConfiguration": {
"collectionArn": "arn:aws:aoss:ap-south-1:<account-id>:collection/<collection-id>",
"vectorIndexName": "crm-interactions-index",
"fieldMapping": {
"vectorField": "embedding",
"textField": "content",
"metadataField": "metadata"
}
}
}
}
Bedrock AgentCore — Orchestrating AI Agents Over Your Data Foundation:
AgentCore provides the runtime for AI agents that can autonomously discover, reason over, and act on enterprise data. Here’s a simplified agent configuration that connects to the CRM knowledge base:
{
"agentName": "crm-customer-insights-agent",
"foundationModel": "anthropic.claude-sonnet-4-20250514",
"instruction": "You are a customer insights agent. Use the CRM knowledge base to answer questions about customer interaction history, resolution patterns, and service trends. Always cite specific interaction records when providing answers.",
"idleSessionTTLInSeconds": 1800,
"knowledgeBases": [
{
"knowledgeBaseId": "crm-customer-knowledge-base",
"description": "CRM customer interaction history including support tickets, complaints, and resolutions"
}
],
"actionGroups": [
{
"actionGroupName": "CRMActions",
"description": "Actions for retrieving and summarising CRM data",
"actionGroupExecutor": {
"lambda": "arn:aws:lambda:ap-south-1:<account-id>:function:crm-agent-actions"
}
}
],
"memoryConfiguration": {
"enabledMemoryTypes": ["SESSION_SUMMARY"],
"storageDays": 30
}
}
This agent configuration demonstrates the key AgentCore capabilities: it connects to the knowledge base for RAG-grounded answers, has action groups for executing business logic (e.g., creating follow-up tickets), and uses session memory so conversations maintain context across interactions.
A customer support agent can now ask “What similar issues have we resolved for this customer segment?” and get answers grounded in actual CRM interaction history — not hallucinated responses from a general-purpose model. Meanwhile, Amazon Q Business gives non-technical employees the same AI-powered access through a conversational interface — no custom application development required.
Layer 4: Governance & Security
AI systems that access enterprise data must respect the same access controls, data classification, and audit requirements as human users. Without governance, AI becomes a data exfiltration risk.
The most common AI governance failure: An AI system is deployed with access to a broad data lake, and six months later someone realises it can surface PII from HR records in customer-facing responses. Retrofitting governance after deployment means re-engineering data access patterns, re-testing AI outputs, and potentially recalling responses that already reached end users.
Designing governance from the start means:
- Data classification — happens during ingestion (Layer 1), not after AI deployment
- Access controls — inherited by AI services through IAM roles — the same permission model your human users follow
- PII identification — tagged by Amazon Macie before it enters the vector store
- Bedrock Guardrails — filter AI outputs at the application layer as a defence-in-depth measure
{
"name": "crm-ai-guardrail",
"sensitiveInformationPolicyConfig": {
"piiEntitiesConfig": [
{"type": "EMAIL", "action": "ANONYMIZE"},
{"type": "PHONE", "action": "ANONYMIZE"},
{"type": "NAME", "action": "ANONYMIZE"},
{"type": "ADDRESS", "action": "BLOCK"}
]
},
"topicPolicyConfig": {
"topicsConfig": [{
"name": "internal-financials",
"definition": "Questions about company revenue, margins, or financial performance",
"type": "DENY"
}]
}
}
This guardrail configuration demonstrates defence-in-depth: even if the underlying data contains PII, the AI output layer anonymises or blocks sensitive information before it reaches the end user.
5. The CRM Modernisation Revisited: What We’d Add for AI-Readiness
In my previous posts, I documented the modernisation of an enterprise CRM platform — containerisation, CI/CD automation, observability, security hardening. That architecture is solid for operational excellence. But if we were designing it today with AI-readiness as a requirement, here’s what would change:
| Layer | Current State | AI-Ready Addition | Business Value |
|---|---|---|---|
| Data storage | RDS MySQL (transactional) | + S3 data lake with Parquet exports | Analytics + AI consumption without impacting production DB |
| Search | SQL queries only | + OpenSearch with vector search | Natural language search across CRM records |
| Knowledge access | Manual reports, dashboards | + Bedrock Knowledge Base over CRM data | AI-powered customer insights on demand |
| Employee AI access | None | + Amazon Q Business connected to CRM data | Non-technical staff get AI answers without custom apps |
| Data governance | IAM + encryption | + Lake Formation + Macie + Guardrails | AI-safe data access with PII protection |
| Integration | Direct database queries | + Glue ETL + Data Catalog | Discoverable, catalogued data for AI agents |
If we add these capabilities as a retrofit in 12 months, the estimated effort is 8–12 weeks of engineering work — because we’d need to build data export pipelines from production RDS, design a new data lake schema, implement embedding generation, and add governance controls that don’t exist today.
If we’d included them in the original design, the incremental effort would have been 2–3 weeks — because the data flows, access patterns, and governance model would have been designed holistically from the start. That 4–5x cost multiplier is the AI-readiness tax that enterprises pay when they treat modernisation and AI as separate initiatives.
6. A Presales Perspective: How to Position AI-Ready Data in Customer Conversations
In presales engagements, the AI-readiness conversation has become the most effective way to expand modernisation scope — not by overselling AI, but by helping customers avoid a predictable and expensive mistake.
The conversation I have most often goes like this:
Customer: “We want to modernise our CRM/ERP/portal. We’re not thinking about AI yet — that’s a 2027 initiative.”
My response: “That’s fine — you don’t need to build AI capabilities today. But let me show you what happens if we design the data layer without considering AI, and then you want to add it in 18 months.”
Then I walk through the retrofit cost comparison: 2–3 weeks of incremental design now vs. 8–12 weeks of re-engineering later. The math is simple, and it resonates with CFOs who hate paying twice for the same outcome.
Three questions that surface AI-readiness gaps in discovery:
- “If your CEO asked tomorrow for AI-powered insights from this platform’s data, how long would it take to deliver?” — Most customers answer “months” or “I don’t know.” That gap is the opportunity.
- “Where does your unstructured data live, and can any system search it by meaning rather than keywords?” — This surfaces the vector search gap that blocks RAG and agent use cases.
- “Who controls what data your AI systems can access, and how is that audited?” — This surfaces the governance gap that blocks enterprise AI deployment in regulated industries.
7. The Business Case: Why CFOs Should Fund AI-Ready Data Now
For technology leaders building the business case, here are the numbers that resonate in CFO conversations:
Cost avoidance
- Retrofit cost for AI-readiness after modernisation: 8–12 weeks engineering effort (~$150K–$300K for mid-market enterprises)
- Incremental cost to include AI-readiness during modernisation: 2–3 weeks (~$40K–$75K)
- Net savings per platform: $110K–$225K — and most enterprises have 3–5 platforms that will need AI capabilities
Time-to-value acceleration
- Time to first AI use case without AI-ready data: 6–9 months (data re-engineering + AI development)
- Time to first AI use case with AI-ready data: 4–8 weeks (AI development only — data is already prepared)
- Competitive advantage: 4–7 months faster to market with AI-powered features
Risk reduction
- Organisations that deploy AI without governance face an average of 2–3 data incidents in the first year
- Each incident costs $50K–$500K in remediation depending on regulatory exposure
- Governance-first design reduces incident probability by ~80%
8. Cross-Layer Architecture Summary
| Layer | AWS Services | Purpose | AI Enablement |
|---|---|---|---|
| Ingestion | Glue, Kinesis, Zero-ETL | Unified data integration | Makes data discoverable and processable |
| Storage | S3 Data Lake, RDS, S3 Vectors | Structured + vector storage | Supports both operational and AI query patterns |
| Semantic Search | OpenSearch, Aurora pgvector, Bedrock KB | Meaning-based retrieval | Enables RAG, agents, and natural language access |
| AI Consumption | Bedrock, AgentCore, Q Business, Lambda | Application layer | Delivers AI value to end users |
| Governance | Lake Formation, Macie, Guardrails, IAM | Access control + compliance | Ensures AI respects data boundaries |
These layers are not independent — they form a pipeline. Data flows from ingestion through storage into semantic search, consumed by AI applications, governed at every stage. Skip any layer and the pipeline breaks.
9. Lessons for Technology Leaders
- Modernisation without AI-readiness is incomplete modernisation — Design for the requirements your board will mandate in 12–18 months, not just today’s requirements.
- The data foundation is the AI bottleneck, not the model — Every enterprise has access to the same foundation models. Your competitive advantage is the quality, structure, and accessibility of your proprietary data.
- Start with cataloguing, prove with managed services, optimise with purpose-built infrastructure — Glue Data Catalog first, Bedrock Knowledge Bases to prove value, then S3 Vectors or pgvector for production scale.
- Governance enables AI adoption — it doesn’t slow it down — Enterprises that skip governance don’t deploy faster. They deploy and retract. Build governance in from day one.
- The 15% investment today saves the 3–5x retrofit tomorrow — Frame it as insurance to your CFO. It gets funded.
- AI-ready data is a presales differentiator — When you help customers avoid the retrofit tax, you establish trust and expand engagement scope.
Conclusion
“The best time to build an AI-ready data foundation is during your cloud modernisation. The second-best time is now. But there is a third option that most enterprises choose: waiting until the AI use case is approved, the budget is allocated, and the business is asking why it’s taking 18 months to get a Bedrock pilot into production. That third option is the most expensive — and the most common.”
About the Author
Rajat Jindal is VP – Presales at AeonX Digital Technology Limited, where he architects winning cloud strategies for enterprise customers and translates modernization into measurable business value. He is a strong advocate of AWS, committed to sharing thought leadership that helps technology leaders make faster, better-informed decisions.
