A model or factory/constructor function to be evaluated. It can be a Runnable instance, a factory function that returns a Runnable, or a user-defined function or factory.
The name of the dataset against which the evaluation will be performed. This dataset should already be defined and contain the relevant data for evaluation.
Optional
options: RunOnDatasetParams(Optional) Additional parameters for the evaluation process:
evaluators
(RunEvalType[]): Evaluators to apply to a dataset run.formatEvaluatorInputs
(EvaluatorInputFormatter): Convert the evaluation data into formats that can be used by the evaluator.projectName
(string): Name of the project for logging and tracking.projectMetadata
(Record<string, unknown>): Additional metadata for the project.client
(Client): Client instance for LangSmith service interaction.maxConcurrency
(number): Maximum concurrency level for dataset processing.A promise that resolves to an EvalResults
object. This object includes
detailed results of the evaluation, such as execution time, run IDs, and feedback
for each entry in the dataset.
// Example usage for evaluating a model on a dataset
async function evaluateModel() {
const chain = /* ...create your model or chain...*//
const datasetName = 'example-dataset';
const client = new Client(/* ...config... *//);
const results = await runOnDataset(chain, datasetName, {
evaluators: [/* ...evaluators... *//],
client,
});
console.log('Evaluation Results:', results);
}
evaluateModel();
In this example, runOnDataset
is used to evaluate a language model (or a chain of models) against
a dataset named 'example-dataset'. The evaluation process is configured using RunOnDatasetParams["evaluators"]
, which can
include both standard and custom evaluators. The Client
instance is used to interact with LangChain services.
The function returns the evaluation results, which can be logged or further processed as needed.
Generated using TypeDoc
Evaluates a given model or chain against a specified LangSmith dataset.
This function fetches example records from the specified dataset, runs the model or chain against each example, and returns the evaluation results.