Evaluations/LLM As A Judge
test-one-shot
qa_test.parquet
texttext
OpenAIOpenAI/GPT-4o
OpenAI OpenAI
is_correct
Are these two lists the same? Answer with true or false, one word, all lowercase.

List 1: {answer_spans}
List 2: {prediction}
Oct 17, 2024, 11:52 PM UTC
Oct 17, 2024, 11:53 PM UTC
100 rows
5041 tokens
100 rows processed, 5041 tokens used
completed
7 columns, 100 rows