AI/ML OR LLM Scoring & Evaluation Specialist (Remote)

This job has been expired
Remote
Contract
Position : LLM Scoring & Evaluation Specialist
Job : Remote
Role : long term
Must Skill: 
Claude, GPT, NLP, LLM, frameworks, evaluation libraries, Jupyter, Python
About the Role:
We are seeking an analytical and detail-oriented LLM Scoring & Evaluation Specialist to support multiple projects involving Large Language Models (LLMs). This role will be pivotal in ensuring the quality, accuracy, and relevance of AI-generated outputs across different use cases and domains. You’ll work closely with cross-functional teams to establish and implement evaluation frameworks, perform human-in-the-loop assessments, and help fine-tune models for optimal performance.
________________________________________
Key Responsibilities:
• Perform manual scoring and qualitative evaluations of LLM-generated responses across multiple use cases.
• Develop and maintain scoring guidelines and rubrics to ensure consistency and objectivity.
• Collaborate with data scientists, product managers, and engineering teams to align scoring with project goals.
• Assist in the creation and labeling of high-quality evaluation datasets for prompt tuning or model fine-tuning.
• Utilize NLP-based metrics and tools (e.g., ROUGE, BLEU, cosine similarity) for automated scoring support.
• Document scoring patterns, common model errors, and improvement opportunities.
• Contribute to prompt experimentation and help compare effectiveness of different prompt strategies.
________________________________________
Qualifications:
• Prior experience with LLMs (e.g., GPT, Claude, LLaMA, etc.) or AI/NLP projects is highly preferred.
• Strong analytical skills and attention to detail, especially in assessing language quality.
• Familiarity with prompt engineering, generative AI, or conversational AI tools is a plus.
• Hands-on experience with Python, Jupyter, or evaluation libraries (optional but desirable).
• Experience working with evaluation frameworks or annotation tools (Label Studio, Prodigy, etc.) is a bonus.
• Excellent written and verbal communication skills.
________________________________________
Nice to Have:
• Experience in user research, linguistics, content writing/editing, or QA of content-heavy systems.
• Exposure to Agile workflows or cross-functional project teams.
• Interest in AI ethics, bias mitigation, and model safety evaluation.
Scroll to Top