---
library_name: peft
license: mit
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: question-answering
tags:
- peft
- Universal
- ORKGSyn
- 33 disciplines
---

<div align="center">
     <img src="https://raw.githubusercontent.com/sciknoworg/YESciEval/main/images/logo.png" alt="YESciEval Logo" width="500"/>
     <a href="https://github.com/sciknoworg/YESciEval"><img src="https://img.shields.io/badge/GitHub-YESciEval-blue?logo=github" /></a>
</div>


Large Language Models (LLMs) have become pivotal in powering scientific question-answering across modern search engines, yet their evaluation robustness remains largely underexplored. To address this gap, we introduce **YESciEval** — an open-source framework that leverages fine-grained rubric-based assessments combined with reinforcement learning to reduce optimism bias in LLM evaluators. YESciEval provides a comprehensive library for evaluating the quality of synthesized scientific answers using predefined rubrics and sophisticated LLM-based judgment models. This framework enables you to assess answers on key criteria by utilizing pretrained judges and parsing LLM outputs into structured JSON formats for detailed analysis.

**The `YESciEval-ASK-Llama-3.1-8B` is a multidisciplinary judge tuned on the [ORKGSyn](https://data.uni-hannover.de/dataset/yescieval-corpus) dataset from the Open Research Knowledge Graph.**

## Usage

First of all, install the `YESciEval` library via PiP:

```bash
pip install yescieval
```

Get started with YESciEval in just a few lines of code. This guide demonstrates how to initialize inputs, load the judge, and initiate the rubric for evaluation of the answer.

``` python
from yescieval import Readability, AskAutoJudge

# Sample papers with following format {"title": "abstract", ... }
papers = {
    "A Study on AI": "This paper discusses recent advances in artificial intelligence, including deep learning.",
    "Machine Learning Basics": "An overview of supervised learning methods such as decision trees and SVMs.",
    "Neural Networks Explained": "Explains backpropagation and gradient descent for training networks.",
    "Ethics in AI": "Explores ethical concerns in automated decision-making systems.",
    "Applications of AI in Healthcare": "Details how AI improves diagnostics and personalized medicine."
}

# Question and synthesized answer
question = "How is AI used in modern healthcare systems?"
answer = (
    "AI is being used in healthcare for diagnosing diseases, predicting patient outcomes, "
    "and assisting in treatment planning. It also supports personalized medicine and medical imaging."
)

# Step 1: Create a rubric
rubric = Readability(papers=papers, question=question, answer=answer)

# Step 2: Load a judge model
  judge = AskAutoJudge()
judge.from_pretrained(token="your_huggingface_token")

# Step 3: Evaluate the answer
result = judge.evaluate(rubric=rubric)
print("Raw Evaluation Output:")
print(result)
```

A total of nine evaluation rubrics were defined as part of the YESciEval test framework and can be used via ``yescieval``. The following simple example shows how to import rubrics in your code:

```python
from yescieval import Informativeness, Correctness, Completeness, 
                      Coherence, Relevancy, Integration, 
                      Cohesion, Readability, Conciseness
```

A complete list of rubrics is available at YESciEval [📚 Rubrics](https://yescieval.readthedocs.io/rubrics.html) page. For more detailed documentation, visit [📚 https://yescieval.readthedocs.io](https://yescieval.readthedocs.io)


## Citation

If you find our work helpful, feel free to give us a cite.


```bibtex
@article{d2025yescieval,
      title={YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering},
      author={D'Souza, Jennifer and Giglou, Hamed Babaei and M{\"u}nch, Quentin},
      journal={arXiv preprint arXiv:2505.14279},
      year={2025}
}
```