Evaluations/LLM As A Judge/Iteration history
History
Total running cost: $0.0000
PromptRowsTypeModelTargetStatusRuntimeRunByTokensCost
Run
Are these two lists the same? Answer with true or false, one word, all lowercase. List 1: {answer_spans} List 2: {prediction}
100texttextOpenAIOpenAI/GPT-4ob9f34df75a65e1bb completed 00:01:118 months agoox5041 tokens
Sample
Are these two lists the same? Answer with true or false, one word, all lowercase. List 1: {answer_spans} List 2: {prediction}
5texttextOpenAIOpenAI/GPT-4oSample - N/A completed 00:00:038 months agoox264 tokens
Sample
Are these two lists the same? List 1: {answer_spans} List 2: {prediction}
5texttextOpenAIOpenAI/GPT-4oSample - N/A completed 00:00:048 months agoox341 tokens