G

Machine Learning Evaluation Specialist - Remote

G2i

🌍 North America 🏠 Remote ⏱ Part-time 💼 Mid-level 🗓 1 weeks ago

Machine Learning Evaluation Specialist (Remote)

List of accepted countries and locations https://docs.google.com/document/d/1FK0v1X3O3rqY0oB2k5xt0u5eiYaoYYKv_E4XS3kHXUs/edit?tab=t.0#heading=h.8jwvoue7ks7z

Important for US applicants: This is a 1099 independent contractor role and is not compatible with F-1 OPT, STEM OPT, or other visa statuses that require W-2 employment, guaranteed hours, or employer sponsorship. We are unable to provide offer letters or employment verification for this role.

Help design the hardest ML problems state-of-the-art AI hasn't solved yet.

We're hiring domain experts to build evaluation tasks that challenge the frontier of AI. This is not an ML engineering role — it's a research role. You'll use deep expertise in your field to create problems that general ML knowledge can't touch.

What you'll do

- Propose and frame original, research-grade ML problems rooted in your domain

- Design evaluation tasks that require specialized knowledge well beyond standard pipelines

- Assess AI-generated solutions for correctness, creativity, and methodological rigor — and explain exactly where and why they fall short

- Document problem difficulty, required domain knowledge, and expected failure modes

What you need

- Graduate-level expertise (MS or PhD preferred) in a scientific or technical domain that intersects with ML

- Strong working knowledge of ML methods — model selection, feature engineering, evaluation metrics

- Deep familiarity with active research problems in your field — you know where general ML knowledge runs out

- Excellent written communication — you can articulate complex problems clearly and precisely. This cannot be overstated.

- Self-motivated and comfortable working independently on intellectually demanding tasks

What you don't need

- No prior AI training or RLHF experience required

- No software engineering background needed — domain expertise and research instincts are what matter

Domains we're especially...

Share this job: