Evaluation df8baca0-3e33-47d2-8982-95614a499ffd - ox/CommaQA

Total running cost: $0.0000

	Prompt	Rows	Type	Model	Target	Status	Runtime	Run	By	Tokens
Run	Are these two lists the same? Answer with true or false, one word, all lowercase. List 1: {answer_spans} List 2: {prediction}	100	text → text	OpenAI/GPT-4o	b9f34df75a65e1bb	completed	00:01:11	10 months ago	ox	5041 tokens
Sample	Are these two lists the same? Answer with true or false, one word, all lowercase. List 1: {answer_spans} List 2: {prediction}	5	text → text	OpenAI/GPT-4o	Sample - N/A	completed	00:00:03	10 months ago	ox	264 tokens
Sample	Are these two lists the same? List 1: {answer_spans} List 2: {prediction}	5	text → text	OpenAI/GPT-4o	Sample - N/A	completed	00:00:04	10 months ago	ox	341 tokens