mistral_v0_2.lib.generate#

class mistral_v0_2.lib.generate.Beam(ids: Array, score: Array, kv_cache: Array | None)[source]#

Bases: NamedTuple

Represents a single beam in beam search.

  • ids (Array): The array of token IDS in the current beam. This array represents the sequence of tokens generated so far in this beam.

  • score (Array): The cumulative score of the tokens in ids. This score is used to rank beams and decide which ones to keep during the beam search process.

  • kv_cache (KVCache): A KVCache object storing the past cache generated by the model.

ids: Array#

Alias for field number 0

kv_cache: Array | None#

Alias for field number 2

score: Array#

Alias for field number 1

mistral_v0_2.lib.generate.generate(params, tokenizer, sentences, max_length, max_new_tokens, *, key=None, top_k=None, top_p=None, temperature=1.0, beam_nums=None)[source]#

Generates text completions or continuations for a given list of input sentences using the specified language model parameters and tokenizer.

This function utilizes various generation strategies (such as sampling and beam search) based on the provided parameters to generate text that is contextually relevant to the input sentences. The generation can be customized using parameters like top_k, top_p, and temperature to control the diversity and creativity of the output.

Parameters:
  • params (MistralLMParams) – Parameters of the Mistral model to be used for text generation.

  • tokenizer (AutoTokenizer) – The tokenizer used for encoding input sentences and decoding output tokens.

  • sentences (list[str]) – A list of input sentences for which text completions are generated.

  • max_length (int) – The maximum length of the generated text (in tokens), including the input sentence.

  • max_new_tokens (int) – The maximum number of new tokens to generate, excluding the input sentence.

  • key (Array | None, optional) – A pseudo-random number generator (PRNG) key key for sampling. Defaults to None.

  • top_k (int | None, optional) – The number of highest probability vocabulary tokens to keep for top-k sampling. Defaults to None, indicating no top-k sampling.

  • top_p (float | None, optional) – The cumulative probability threshold for top-p (nucleus) sampling. Defaults to None, indicating no top-p sampling.

  • temperature (float, optional) – A scaling factor to apply to logits before sampling, affecting the distribution sharpness. Defaults to 1.0, indicating no scaling.

  • beam_nums (int | None, optional) – The number of beams for beam search. Defaults to None, indicating that beam search is not used.

Returns:

An array of generated token IDS corresponding to the continuations for the sentence in this batch.

Return type:

Array

Example

>>> generate(params, tokenizer, sentences, max_length, max_new_tokens, key=subkey, top_k=5, top_p=0.8, temperature=0.9)
>>> generate(params, tokenizer, sentences, max_length, max_new_tokens, beam_nums=5)

Performs a greedy search to select the token with the highest logit.

Parameters:

logits (Array) – An array of logits representing the model’s predictions for the next token in the sequence.

Returns:

The indices of the selected logits.

Return type:

Array

mistral_v0_2.lib.generate.prob_beams_n(input_beam, beam_nums, ids_out, score_out, kv_cache_out, prob_beams, ids_beams, kv_cache)[source]#

Expands one input beam into multiple output beams based on the probabilities of potential next n tokens.

Parameters:
  • input_beam (Beam) – The current beam to expand.

  • beam_nums (int) – The number of beams to select for the next step of generation.

  • ids_out (Array | None) – An array to store the token IDS of previous selected beams. If provided, it will be updated with the new selections.

  • score_out (Array | None) – An array to store the scores of previous selected beams. If provided, it will be updated with the new scores.

  • kv_cache_out (Array | None) – An array to store the updated KVCache of the selected beams. If provided, it will be updated accordingly.

  • prob_beams (Array) – The probabilities of the next tokens for the input beam.

  • ids_beams (Array) – The token IDS corresponding to prob_beams.

  • kv_cache (KVCache) – The previous KVCache, to be updated based on the tokens added to the beams.

Returns:

A tuple containing the updated ids_out, score_out, and kv_cache_out, respectively. Each of these arrays represents the selected output beams’ token IDS, scores, and KVCache.

Return type:

tuple[Array | None, Array | None, Array | None]

mistral_v0_2.lib.generate.process_fun(input_beam, ids_beams, ids_scores)[source]#

Updates an input beam with a set of new token IDs and scores, expanding the beam sequence and updating scores.

Parameters:
  • input_beam (Beam) – The input beam to be updated.

  • ids_beams (Array) – An array of new candidate token IDS to be appended to the input beam’s sequence. Each represents a potential next token.

  • ids_scores (Array) – An array of scores associated with each candidate token IDS in ids_beams.

Returns:

A tuple containing two elements:
  • The first element is an array of updated token IDS, with each sequence expanded by one of the candidate token IDS.

  • The second element is an array of updated scores, with each score adjusted based on the corresponding candidate token’s score.

Return type:

tuple[Array, Array]

mistral_v0_2.lib.generate.sampling(sampling_logits, tokens_ids, key=None)[source]#

Performs sampling from the given logits to select the next token in the sequence.

Parameters:
  • sampling_logits (Array) – The logits to do sampling.

  • tokens_ids (Array) – The indices of tokens corresponding to the logits.

  • key (Array | None, optional) – A pseudo-random number generator (PRNG) key key for sampling. Defaults to None.

Returns:

An array containing the selected token IDS after sampling.

Return type:

Array

mistral_v0_2.lib.generate.sort_beams(ids_out, score_out, kv_cache_out, beam_nums)[source]#

Sorts and trims a list of output beams to the top beam_nums beams based on their scores.

After expanding each input beam into multiple output beams, this function is used to sort all generated output beams by their scores and keep only the top beam_nums beams.

Parameters:
  • ids_out (Array | None) – An array containing the token IDS for all output beams generated in the current step.

  • score_out (Array | None) – An array containing the scores for each output beam. These scores are used to rank the beams.

  • kv_cache_out (Array | None) – An array containing KVCache associated with each output beam.

  • beam_nums (int) – The number of top beams to retain after sorting. This parameter determines how many of the highest-scoring beams are kept for the next step of the beam search.

Returns:

A list of output Beam.

Return type:

list[Beam]

mistral_v0_2.lib.generate.top_k_logits(logits, *, temperature=1.0, top_k)[source]#

Filters the logits to keep only the top_k highest values, optionally scaling by temperature.

Parameters:
  • logits (Array) – The logits array from a model’s output.

  • temperature (float, optional) – A scaling factor for logits. Defaults to 1.0 (no scaling).

  • top_k (int) – The number of top elements to select from the logits.

Returns:

A tuple containing two arrays:
  • The top_k logits after apply temperature.

  • The indices of top_k logits.

Return type:

tuple[Array, Array]

mistral_v0_2.lib.generate.top_p_logits(logits, *, tokens_ids=None, temperature=1.0, top_p)[source]#

Filters the logits based on the top_p cumulative probability threshold, optionally scaling by temperature.

Parameters:
  • logits (Array) – The logits array from a model’s output.

  • tokens_ids (Array | None, optional) – The indices of tokens corresponding to the logits. Defaults to None.

  • temperature (float, optional) – A scaling factor for logits. Defaults to 1.0 (no scaling).

  • top_p (float) – The cumulative probability threshold for selecting top logits.

Returns:

A tuple containing two arrays:
  • The top_k logits after apply temperature.

  • The indices of top_k logits.

Return type:

tuple[Array, Array]