AINLQ — Natural Language Query System

A web application that translates plain English questions into SQL queries, executes them, and renders the results as interactive charts — automatically selecting the best visualization type. Ships with 5 industry datasets (Restaurant, Ecommerce, SaaS, HR, Auto Repair) so users can explore a domain close to their own business.

The novel contribution — #[AiSafe] attribute pattern: The biggest risk in connecting an LLM to your database isn’t SQL injection (parameterized queries handle that). It’s schema exposure — the AI needs to see your table structure to write SQL, but dumping the full schema exposes passwords, tokens, PII, and internal fields in the prompt. The #[AiSafe] PHP attribute solves this: only fields explicitly marked with the attribute are visible to the AI. Unmarked fields don’t exist as far as the LLM is concerned. It’s opt-in exposure, not opt-out hiding — the safe default is invisible.

The full pipeline:

User picks an industry dataset and types a question in plain English
Schema extraction dynamically reads Doctrine entity metadata, filtering to only #[AiSafe] fields
Schema + question are sent to GPT-4o-mini with prompt engineering that enforces column naming conventions per visualization type (so results map cleanly to ECharts)
Response is sanitized (handles markdown fences, various GPT response formats) and parsed as JSON
Chart type is validated against a server-side whitelist (6 allowed types)
Parameterized SQL executes via Doctrine (no raw string concatenation)
Results render as bar, line, pie, gauge, KPI metric, or data table — with the generated SQL displayed below for full transparency

Architecture highlights:

Service-oriented pipeline — NLQProcessorService orchestrates, SchemaExtractorService handles dynamic Doctrine metadata introspection, clean separation between AI orchestration and presentation
Multi-dataset architecture — 5 industry domains with 17 total entities, each with realistic seeded data featuring seasonality patterns, day-of-week weighting, and popularity distributions. Users pick a dataset and the schema extraction scopes to that domain automatically.
Cost tracking — CostTrackingService enforces a daily API budget with a friendly rate-limit message when the cap is hit. Shows production-readiness thinking — you can’t put an open LLM endpoint on the internet without guardrails.
Chart type intelligence — prompt engineering instructs the LLM to select visualization type based on the question’s nature, with enforced column naming conventions so the frontend doesn’t need to guess at data mapping

Tech stack: Symfony 7.2, PHP 8.2+, Doctrine ORM 3.3, OpenAI GPT-4o-mini, ECharts, Bootstrap 5, Docker, Apache.