surveyeval.core_evaluation_lenses module

Core set of instrument evaluation lenses.

class surveyeval.core_evaluation_lenses.BiasEvaluationLens(evaluation_engine: EvaluationEngine)

Bases: EvaluationLens

Lens for evaluating a survey excerpt for any of the following:

  • Stereotypical representations of gender, ethnicity, origin, religion, or other social categories.

  • Distorted or biased representations of events, topics, groups, or individuals.

  • Use of discriminatory or insensitive language towards certain groups or topics.

  • Implicit or explicit assumptions made in the text or unquestioningly adopted that could be based on prejudices.

  • Prejudiced descriptions or evaluations of abilities, characteristics, or behaviors.

__init__(evaluation_engine: EvaluationEngine)

Override default constructor to provide lens-specific prompt template and followup questions.

Parameters:

evaluation_engine (EvaluationEngine) – EvaluationEngine instance to use for evaluation.

async a_evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default a_evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

format_result(result: dict | None = None, minimum_importance: int = 0) str

Format the evaluation result as a human-readable string.

Parameters:
  • result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

  • minimum_importance (int) – Minimum importance score for filtering results (defaults to 0, which doesn’t filter).

Returns:

Formatted evaluation result.

Return type:

str

standardize_result(result: dict | None = None) list[dict]

Reorganize the evaluation result into a list of recommendations in a standardized format.

Parameters:

result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

Returns:

List of recommendations, each of which is a dict with the following keys: importance (int 1-5), replacement_original (str), replacement_suggested (str), explanation (str).

Return type:

list[dict]

class surveyeval.core_evaluation_lenses.PhrasingEvaluationLens(evaluation_engine: EvaluationEngine)

Bases: EvaluationLens

Lens for identifying phrasing issues that might be flagged during piloting or cognitive interviewing.

__init__(evaluation_engine: EvaluationEngine)

Override default constructor to provide lens-specific prompt template and followup questions.

Parameters:

evaluation_engine (EvaluationEngine) – EvaluationEngine instance to use for evaluation.

async a_evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', survey_question: str = '', **kwargs) dict

Override default a_evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • survey_question (str) – Specific question to focus on.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', survey_question: str = '', **kwargs) dict

Override default evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument (for context).

  • survey_question (str) – Specific question to focus on.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

format_result(result: dict | None = None, minimum_importance: int = 0) str

Format the evaluation result as a human-readable string.

Parameters:
  • result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

  • minimum_importance (int) – Minimum importance score for filtering results (defaults to 0, which doesn’t filter).

Returns:

Formatted evaluation result.

Return type:

str

standardize_result(result: dict | None = None) list[dict]

Reorganize the evaluation result into a list of recommendations in a standardized format.

Parameters:

result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

Returns:

List of recommendations, each of which is a dict with the following keys: importance (int 1-5), replacement_original (str), replacement_suggested (str), explanation (str).

Return type:

list[dict]

class surveyeval.core_evaluation_lenses.TranslationEvaluationLens(evaluation_engine: EvaluationEngine)

Bases: EvaluationLens

Lens for identifying translation issues that could lead to differing response patterns from respondents.

__init__(evaluation_engine: EvaluationEngine)

Override default constructor to provide lens-specific prompt template and followup questions.

Parameters:

evaluation_engine (EvaluationEngine) – EvaluationEngine instance to use for evaluation.

async a_evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default a_evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

format_result(result: dict | None = None, minimum_importance: int = 0) str

Format the evaluation result as a human-readable string.

Parameters:
  • result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

  • minimum_importance (int) – Minimum importance score for filtering results (defaults to 0, which doesn’t filter).

Returns:

Formatted evaluation result.

Return type:

str

standardize_result(result: dict | None = None) list[dict]

Reorganize the evaluation result into a list of recommendations in a standardized format.

Parameters:

result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

Returns:

List of recommendations, each of which is a dict with the following keys: importance (int 1-5), replacement_original (str), replacement_suggested (str), explanation (str).

Return type:

list[dict]

class surveyeval.core_evaluation_lenses.ValidatedInstrumentEvaluationLens(evaluation_engine: EvaluationEngine)

Bases: EvaluationLens

Lens for identifying validated questions, instruments, or tools that either were used or could be used to measure what the excerpt is attempting to measure.

__init__(evaluation_engine: EvaluationEngine)

Override default constructor to provide lens-specific prompt template and followup questions.

Parameters:

evaluation_engine (EvaluationEngine) – EvaluationEngine instance to use for evaluation.

async a_evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default a_evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

evaluate(chat_history: list | None = None, survey_context: str = '', survey_locations: str = '', survey_excerpt: str = '', **kwargs) dict

Override default evaluate method.

Parameters:
  • chat_history (list) – Chat history to use for the evaluation chain (or None for none).

  • survey_context (str) – Information about the survey context.

  • survey_locations (str) – Information about the survey location(s).

  • survey_excerpt (str) – Excerpt from the survey instrument to evaluate.

  • kwargs (Any) – Keyword arguments to use for formatting the task system prompt and question.

Returns:

A dict with result (“success” or “error”), error (if result is “error”), response (a dict), and history (a list with the full history of the evaluation chain, each item of which is a list with two strings, a prompt and a response).

Return type:

dict

format_result(result: dict | None = None, minimum_importance: int = 0) str

Format the evaluation result as a human-readable string.

Parameters:
  • result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

  • minimum_importance (int) – Minimum importance score for filtering results (defaults to 0, which doesn’t filter).

Returns:

Formatted evaluation result.

Return type:

str

standardize_result(result: dict | None = None) list[dict]

Reorganize the evaluation result into a list of recommendations in a standardized format.

Parameters:

result (dict | None) – Evaluation result to format (or None to use the evaluation_result attribute).

Returns:

List of recommendations, each of which is a dict with the following keys: importance (int 1-5), replacement_original (str), replacement_suggested (str), explanation (str).

Return type:

list[dict]