Evaluations/b875fe36-cdca-4e04-ac7f-f17280684e48
one-shot-results
qa_train.parquet
texttext
OpenAIOpenAI/GPT-4o
OpenAI OpenAI
is_correct
are these lists equivalent? answer with one word "true" or "false"

List 1: {answer_spans}
List 2: {prediction}
Oct 17, 2024, 4:42 PM UTC
Oct 17, 2024, 4:42 PM UTC
5 row sample
228 tokens
5 rows processed, 228 tokens used
Sample Results completed
7 columns, 1-5 of 300 rows