Evaluations/c964c5f6-7fbf-4e98-94ad-c6ae6de17b2d
test-one-shot
qa_test.parquet
texttext
OpenAIOpenAI/GPT-4o
OpenAI OpenAI
is_correct
Are these two lists the same?

List 1: {answer_spans}
List 2: {prediction}
Oct 17, 2024, 11:51 PM UTC
Oct 17, 2024, 11:51 PM UTC
5 row sample
341 tokens
5 rows processed, 341 tokens used
Sample Results completed
7 columns, 1-5 of 100 rows