About E2M

E2M Solutions works as a trusted white-label partner for digital agencies. We support agencies with consistent and reliable delivery through services such as website design, web development, ecommerce, SEO, AI SEO, PPC, AI automation, and content writing. Founded on strong business ethics, we are an equal opportunity organization powered by 300+ experienced professionals, partnering with 400+ digital agencies across the US, UK, Canada, Europe, and Australia. At E2M, we value ownership, consistency, and people who are committed to doing meaningful work and growing together.

Job Summary

We're looking for an AI QA Engineer to own quality across our growing portfolio of AI-powered products custom dashboards, AI CRMs, RFP platforms, lead-gen systems, agentic workflows, and automations delivered to agency clients across the US, UK, and AU. This role spans both the AI layer (prompts, agents, RAG, model outputs) and the full application stack around it (web app UI, APIs, databases, integrations, multi-tenant access). You'll catch issues that traditional QA misses and the ones that pure ML evaluation misses the messy middle where a working model still ships a broken product.

Key Responsibilities

AI & LLM Quality

Design test strategies for LLM-powered features, agents, RAG systems, and workflow automations (n8n, custom pipelines, MCP-based systems)
Build and maintain golden datasets, evaluation suites, and prompt regression tests so model or prompt changes don't silently break production
Validate output quality across factual accuracy, format adherence, tone, hallucination rate, instruction-following, and refusal behavior
Test agentic and multi-step workflows end-to-end tool calls, handoffs, error recovery, loop detection, cost and latency under load
Stress-test RAG pipelines: retrieval accuracy, chunking quality, context relevance, citation correctness
Run adversarial testing prompt injection, jailbreak attempts, malformed inputs, edge-case client data

Custom App & Platform Quality

Own end-to-end QA for custom-built applications: dashboards, AI CRMs, client portals, and internal tools
Functional, regression, cross-browser, and responsive testing across web UIs (Next.js, React, similar stacks)
API testing for REST endpoints, webhooks, and third-party integrations (CRMs, email, calendars, payment, file storage)
Database and data-integrity testing especially for multi-tenant architectures where one client's data must never leak to another
Auth and access-control testing: SSO, OAuth, role-based permissions, session handling
Performance, load, and stability testing on production-bound features
File upload, processing, and storage flows a recurring failure point in AI products

Cross-Cutting

Monitor production systems for drift, degradation, errors, and cost/token anomalies; set up observability and alerting
Collaborate with AI engineers, full-stack developers, prompt designers, and delivery leads to catch issues before they hit clients
Automate testing in Python and/or JavaScript; integrate into CI/CD pipelines
Document findings, build QA dashboards, and contribute to E2M's internal AI QA playbook

Required Skills & Qualifications

Bachelor's degree in Computer Science, IT, Engineering, or related field
3+ years QA experience covering both web applications and AI-enabled products, with at least 1 year hands-on testing LLM-powered features, AI workflows, or agent systems
Strong Python skills for evaluation harnesses and automation; working JavaScript/TypeScript familiarity for web app testing
Hands-on experience with at least one LLM evaluation framework: Promptfoo, DeepEval, RAGAs, LangSmith, Braintrust, or similar
Web app testing experience with Playwright, Cypress, or Selenium
API testing with Postman, REST Assured, or equivalent
Solid understanding of prompt engineering, prompt regression testing, and golden-dataset design
Database fundamentals (SQL, basic schema reasoning) enough to verify data flows and spot multi-tenancy bugs
Familiarity with LLM observability tools (Langfuse, Helicone, LangSmith) and token/cost/latency monitoring
Git, CI/CD principles, and at least one cloud platform (AWS, GCP, or Azure)
Working knowledge of vector databases, embeddings, and how RAG systems fail

Nice to Have

Experience testing multi-tenant SaaS platforms or client-facing custom apps
Workflow automation platforms (n8n, Make, Zapier) or agent frameworks
Exposure to MCP (Model Context Protocol) servers or tool-calling architectures
Familiarity with Supabase, Postgres, or similar backends commonly used in our stack
Experience with Claude, OpenAI, or open-source LLM APIs in production
Agency or client-services background you understand what "client-ready" means
Prompt injection / red-teaming experience
SOC 2 or compliance-testing exposure

Soft Skills

Sharp critical eye you assume AI output (and the app around it) is broken until you've verified it
Comfortable working in ambiguity; AI QA doesn't have settled playbooks yet
Clear written communication bug reports for AI failures need precise reproduction steps
Collaborative across AI engineers, full-stack devs, delivery managers, and account leads
Proactive about catching issues before they reach the client demo

Required Skills

Python Generative AI QA

View all job openings

AI QA Engineer (AI Products, LLM Workflows & Custom Apps)