F

Senior AI Systems Engineer

Flock Safety

🌍 North America 🏠 Remote ⏱ FullTime 💼 Senior Level 🗓 1 weeks ago

WHO IS FLOCK?

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security. Our hardware and software suite connects cities, law enforcement, businesses, schools, and neighborhoods in a nationwide public-private safety network. Trusted by over 5,000 communities, 4,500 law enforcement agencies, and 1,000 businesses, Flock delivers real-time intelligence while prioritizing privacy and responsible innovation.

We’re a high-performance, low-ego team driven by urgency, collaboration, and bold thinking. Working at Flock means tackling big challenges, moving fast, and continuously improving. It’s intense but deeply rewarding for those who want to make an impact.

With nearly $700M in venture funding and a $7.5B valuation, we’re scaling intentionally and seeking top talent to help build the impossible. If you value teamwork, ownership, and solving tough problems, Flock could be the place for you.

THE OPPORTUNITY

We’re hiring a Sr. AI Systems Engineer to help support our emerging product, Night Shift, an AI copilot that amplifies the impact of investigators by automating the tedious, repetitive steps involved in working a case. This role sits within the Machine Learning team and will work closely with partners in Engineering (Backend, Frontend, and Design) in a fast-paced environment. You will be one of the earliest technical contributors to our system architecture for agentic AI, and will own our AI evaluation framework. The outcome we’re after is clear and ambitious: measurably faster, more accurate leads for every officer and every shift.

THE SKILLSET

ML Platform expertise: 5+ years building and shipping ML/LLM systems to production; experience in the following areas:

- ML Inference (PyTorch, TensorRT, NVIDIA Triton), ideally in multimodal domains (text/image/video)

- LLM Inference (LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs)

- Compute orchestration (Kubernetes, Prefect, Ray)

- Cloud Infrastructure (AWS, Terraform, VPC, Networking)

- Observability (Prometheus, Grafana, OpenTelemetry, LangSmith/Langfuse)

- Data (ClickHouse, Postgres, Redis)

- Web services (Express/FastAPI, REST, SSE, JWTs)

- Backend JS (e.g. NodeJS) familiarity required; Typescript and Python familiarity welcome

Familiarity with Agentic Systems: Hands-on experience with LLM agents including:

- Agent Design: tool use (via MCP), retrieval, memory, grounding/attribution for claims, and guardrails.

- Architectural patterns: planning and hand-off for multi-agent systems, context management

- RAG: vector/hybrid search (e.g. pgvector, turbopuffer, chroma), re-rankers (e.g. Cohere, JinaAI)

Experience with LLM Evaluations at scale: You’ve built offline/online eval harnesses and are familiar with the methodologies and metrics to measure:

- Search, retrieval, and recommendation performance

- Agentic task success, trajectory quality, preference learning (SFT, DPO, RLHF, LLM-as-judge)

- Safety & robustness (security, compliance, red-teaming, regression testing)

- Cost, performance and latency trade-offs

Feeling uneasy that you haven’t ticked every box? That’s okay; we’ve felt that way too. Studies have shown women and minorities are less likely to apply unless they meet all qualifications. We encourage you to break the status quo and apply to roles that would make you excited to come to work every day.

90 DAYS AT FLOCK

We are a results-oriented culture and believe job descriptions are a thing of the past. We prescribe 90 day plans and believe that good days lead to good weeks, which lead to good months. This serves as a preview of the 90 day plan you will receive if you were to be hired in this role at Flock Safety.

The First 30 Days

- Immerse yourself in the current system design and agent/tooling landscape. Understand the core customer use cases and data flows.

- Support the team by shipping a few quick wins (e.g., refining tool APIs, prompt engineering, fixing bugs)

- Stand up the foundational eval and observability scaffolding (datasets, metrics, KPIs, reporting)

- Propose a technical architecture and implementation plan for an agent evaluation framework.

The First 60 Days

- Deliver the MVP evaluation harness to produce initial metrics, enable debugging and perform regression testing.

- Take on a system feature that offers demonstrated improvement against your MVP evaluation suite

90 Days & Beyond

- Productionize the evaluation and observability platform and make it the source of truth for quality and safety. (e.g. Online/offline tracing, alerting, dashboards, evaluations and PR-gated regression suite)

- Own the roadmap for evolving the agent evaluation platform

- Lead deeper R&D threads (e.g., lightweight fine-tuned projection layers, specialized embeddings, multimodal understanding) that can improve system performance on core metrics.

If you’re excited to build AI that tangibly amplifies real-

Share this job: