Interface LlamaCppInputs

Note that the modelPath is the only required parameter. For testing you can set this in the environment variable LLAMA_PATH.

interface LlamaCppInputs {
    modelPath: string;
    batchSize?: number;
    contextSize?: number;
    embedding?: boolean;
    f16Kv?: boolean;
    gbnf?: string;
    gpuLayers?: number;
    jsonSchema?: object;
    logitsAll?: boolean;
    maxTokens?: number;
    prependBos?: boolean;
    seed?: null | number;
    temperature?: number;
    threads?: number;
    topK?: number;
    topP?: number;
    trimWhitespaceSuffix?: boolean;
    useMlock?: boolean;
    useMmap?: boolean;
    vocabOnly?: boolean;
}

Hierarchy (view full)

LlamaBaseCppInputs
Toolkit
- LlamaCppInputs

Properties

modelPath

modelPath: string

Path to the model on the filesystem.

`Optional` batchSize

batchSize?: number

Prompt processing batch size.

`Optional` contextSize

contextSize?: number

Text context size.

`Optional` embedding

embedding?: boolean

Embedding mode only.

`Optional` f16Kv

f16Kv?: boolean

Use fp16 for KV cache.

`Optional` gbnf

gbnf?: string

GBNF string to be used to format output. Also known as grammar.

`Optional` gpuLayers

gpuLayers?: number

Number of layers to store in VRAM.

`Optional` jsonSchema

jsonSchema?: object

JSON schema to be used to format output. Also known as grammar.

`Optional` logitsAll

logitsAll?: boolean

The llama_eval() call computes all logits, not just the last one.

`Optional` maxTokens

maxTokens?: number

`Optional` prependBos

prependBos?: boolean

Add the begining of sentence token.

`Optional` seed

seed?: null | number

If null, a random seed will be used.

`Optional` temperature

temperature?: number

The randomness of the responses, e.g. 0.1 deterministic, 1.5 creative, 0.8 balanced, 0 disables.

`Optional` threads

threads?: number

Number of threads to use to evaluate tokens.

`Optional` topK

topK?: number

Consider the n most likely tokens, where n is 1 to vocabulary size, 0 disables (uses full vocabulary). Note: only applies when temperature > 0.

`Optional` topP

topP?: number

Selects the smallest token set whose probability exceeds P, where P is between 0 - 1, 1 disables. Note: only applies when temperature > 0.

`Optional` trimWhitespaceSuffix

trimWhitespaceSuffix?: boolean

Trim whitespace from the end of the generated text Disabled by default.

`Optional` useMlock

useMlock?: boolean

Force system to keep model in RAM.

`Optional` useMmap

useMmap?: boolean

Use mmap if possible.

`Optional` vocabOnly

vocabOnly?: boolean

Only load the vocabulary, no weights.

Interface LlamaCppInputs

Hierarchy (view full)

Index

Properties

Properties

modelPath

`Optional` batchSize

`Optional` contextSize

`Optional` embedding

`Optional` f16Kv

`Optional` gbnf

`Optional` gpuLayers

`Optional` jsonSchema

`Optional` logitsAll

`Optional` maxTokens

`Optional` prependBos

`Optional` seed

`Optional` temperature

`Optional` threads

`Optional` topK

`Optional` topP

`Optional` trimWhitespaceSuffix

`Optional` useMlock

`Optional` useMmap

`Optional` vocabOnly

Settings

Member Visibility

Theme

On This Page

Interface LlamaCppInputs

Hierarchy (view full)

Index

Properties

Properties

modelPath

Optional batchSize

Optional contextSize

Optional embedding

Optional f16Kv

Optional gbnf

Optional gpuLayers

Optional jsonSchema

Optional logitsAll

Optional maxTokens

Optional prependBos

Optional seed

Optional temperature

Optional threads

Optional topK

Optional topP

Optional trimWhitespaceSuffix

Optional useMlock

Optional useMmap

Optional vocabOnly

Settings

Member Visibility

Theme

On This Page

`Optional` batchSize

`Optional` contextSize

`Optional` embedding

`Optional` f16Kv

`Optional` gbnf

`Optional` gpuLayers

`Optional` jsonSchema

`Optional` logitsAll

`Optional` maxTokens

`Optional` prependBos

`Optional` seed

`Optional` temperature

`Optional` threads

`Optional` topK

`Optional` topP

`Optional` trimWhitespaceSuffix

`Optional` useMlock

`Optional` useMmap

`Optional` vocabOnly