About E2M
E2M Solutions works as a trusted white-label partner for digital agencies. We support agencies with consistent and reliable delivery through services such as website design, web development, ecommerce, SEO, AI SEO, PPC, AI automation, and content writing. Founded on strong business ethics, we are an equal opportunity organization powered by 300+ experienced professionals, partnering with 400+ digital agencies across the US, UK, Canada, Europe, and Australia. At E2M, we value ownership, consistency, and people who are committed to doing meaningful work and growing together.

Job Summary
We're looking for an AI QA Engineer to own quality across our growing portfolio of AI-powered products custom dashboards, AI CRMs, RFP platforms, lead-gen systems, agentic workflows, and automations delivered to agency clients across the US, UK, and AU. This role spans both the AI layer (prompts, agents, RAG, model outputs) and the full application stack around it (web app UI, APIs, databases, integrations, multi-tenant access). You'll catch issues that traditional QA misses and the ones that pure ML evaluation misses the messy middle where a working model still ships a broken product.

Key Responsibilities

AI & LLM Quality
  • Design test strategies for LLM-powered features, agents, RAG systems, and workflow automations (n8n, custom pipelines, MCP-based systems)
  • Build and maintain golden datasets, evaluation suites, and prompt regression tests so model or prompt changes don't silently break production
  • Validate output quality across factual accuracy, format adherence, tone, hallucination rate, instruction-following, and refusal behavior
  • Test agentic and multi-step workflows end-to-end tool calls, handoffs, error recovery, loop detection, cost and latency under load
  • Stress-test RAG pipelines: retrieval accuracy, chunking quality, context relevance, citation correctness
  • Run adversarial testing prompt injection, jailbreak attempts, malformed inputs, edge-case client data
Custom App & Platform Quality
  • Own end-to-end QA for custom-built applications: dashboards, AI CRMs, client portals, and internal tools
  • Functional, regression, cross-browser, and responsive testing across web UIs (Next.js, React, similar stacks)
  • API testing for REST endpoints, webhooks, and third-party integrations (CRMs, email, calendars, payment, file storage)
  • Database and data-integrity testing especially for multi-tenant architectures where one client's data must never leak to another
  • Auth and access-control testing: SSO, OAuth, role-based permissions, session handling
  • Performance, load, and stability testing on production-bound features
  • File upload, processing, and storage flows a recurring failure point in AI products
Cross-Cutting
  • Monitor production systems for drift, degradation, errors, and cost/token anomalies; set up observability and alerting
  • Collaborate with AI engineers, full-stack developers, prompt designers, and delivery leads to catch issues before they hit clients
  • Automate testing in Python and/or JavaScript; integrate into CI/CD pipelines
  • Document findings, build QA dashboards, and contribute to E2M's internal AI QA playbook

Required Skills & Qualifications
  • Bachelor's degree in Computer Science, IT, Engineering, or related field
  • 3+ years QA experience covering both web applications and AI-enabled products, with at least 1 year hands-on testing LLM-powered features, AI workflows, or agent systems
  • Strong Python skills for evaluation harnesses and automation; working JavaScript/TypeScript familiarity for web app testing
  • Hands-on experience with at least one LLM evaluation framework: Promptfoo, DeepEval, RAGAs, LangSmith, Braintrust, or similar
  • Web app testing experience with Playwright, Cypress, or Selenium
  • API testing with Postman, REST Assured, or equivalent
  • Solid understanding of prompt engineering, prompt regression testing, and golden-dataset design
  • Database fundamentals (SQL, basic schema reasoning) enough to verify data flows and spot multi-tenancy bugs
  • Familiarity with LLM observability tools (Langfuse, Helicone, LangSmith) and token/cost/latency monitoring
  • Git, CI/CD principles, and at least one cloud platform (AWS, GCP, or Azure)
  • Working knowledge of vector databases, embeddings, and how RAG systems fail

Nice to Have
  • Experience testing multi-tenant SaaS platforms or client-facing custom apps
  • Workflow automation platforms (n8n, Make, Zapier) or agent frameworks
  • Exposure to MCP (Model Context Protocol) servers or tool-calling architectures
  • Familiarity with Supabase, Postgres, or similar backends commonly used in our stack
  • Experience with Claude, OpenAI, or open-source LLM APIs in production
  • Agency or client-services background you understand what "client-ready" means
  • Prompt injection / red-teaming experience
  • SOC 2 or compliance-testing exposure

Soft Skills
  • Sharp critical eye you assume AI output (and the app around it) is broken until you've verified it
  • Comfortable working in ambiguity; AI QA doesn't have settled playbooks yet
  • Clear written communication bug reports for AI failures need precise reproduction steps
  • Collaborative across AI engineers, full-stack devs, delivery managers, and account leads
  • Proactive about catching issues before they reach the client demo

Required Skills

Python Generative AI QA