surveyeval.survey_parser module

Utility functions for reading and parsing survey files.

class surveyeval.survey_parser.SurveyInterface(openai_api_key: str | None = None, openai_model: str | None = None, temperature: float = 0.0, reasoning_effort: str | None = None, total_response_timeout_seconds: int = 600, number_of_retries: int = 2, seconds_between_retries: int = 5, azure_api_key: str | None = None, azure_api_engine: str | None = None, azure_api_base: str | None = None, azure_api_version: str | None = None, langsmith_api_key: str | None = None, langsmith_project: str = 'surveyeval', langsmith_endpoint: str = 'https://api.smith.langchain.com', json_retries: int = 2, anthropic_api_key: str | None = None, anthropic_model: str | None = None, bedrock_model: str | None = None, bedrock_region: str = 'us-east-1', bedrock_aws_profile: str | None = None, max_tokens: int = 4096)

Bases: object

Interface for interacting with surveys.

__init__(openai_api_key: str | None = None, openai_model: str | None = None, temperature: float = 0.0, reasoning_effort: str | None = None, total_response_timeout_seconds: int = 600, number_of_retries: int = 2, seconds_between_retries: int = 5, azure_api_key: str | None = None, azure_api_engine: str | None = None, azure_api_base: str | None = None, azure_api_version: str | None = None, langsmith_api_key: str | None = None, langsmith_project: str = 'surveyeval', langsmith_endpoint: str = 'https://api.smith.langchain.com', json_retries: int = 2, anthropic_api_key: str | None = None, anthropic_model: str | None = None, bedrock_model: str | None = None, bedrock_region: str = 'us-east-1', bedrock_aws_profile: str | None = None, max_tokens: int = 4096)

Initialize a new survey interface with an LLM to help parse survey contents. Must supply LLM parameters for OpenAI (direct or via Azure) or Anthropic (direct or via AWS Bedrock).

Parameters:
  • openai_api_key (str) – OpenAI API key for accessing the LLM. Default is None.

  • openai_model (str) – OpenAI model name. Default is None.

  • temperature (float) – Temperature setting for the LLM. Default is 0.0.

  • reasoning_effort (str) – Reasoning effort setting for the LLM (e.g., “low”, “medium”, “high”). Only supported by certain models. Default is None.

  • total_response_timeout_seconds (int) – Timeout for LLM responses in seconds. Default is 600.

  • number_of_retries (int) – Number of retries for LLM calls. Default is 2.

  • seconds_between_retries (int) – Seconds between retries for LLM calls. Default is 5.

  • azure_api_key (str) – API key for Azure LLM. Default is None.

  • azure_api_engine (str) – Azure API engine name (deployment name; assumed to be the same as the OpenAI model name). Default is None.

  • azure_api_base (str) – Azure API base URL. Default is None.

  • azure_api_version (str) – Azure API version. Default is None.

  • langsmith_api_key (str) – API key for LangSmith. Default is None.

  • langsmith_project (str) – LangSmith project name. Default is ‘ai_workflows’.

  • langsmith_endpoint (str) – LangSmith endpoint URL. Default is ‘https://api.smith.langchain.com’.

  • json_retries (int) – Number of automatic retries for invalid JSON responses. Default is 2.

  • anthropic_api_key (str) – API key for Anthropic. Default is None.

  • anthropic_model (str) – Anthropic model name. Default is None.

  • bedrock_model (str) – AWS Bedrock model name. Default is None.

  • bedrock_region (str) – AWS Bedrock region. Default is “us-east-1”.

  • bedrock_aws_profile (str) – AWS profile for Bedrock access. Default is None.

  • max_tokens (int) – Maximum tokens for LLM responses. Default is 4096.

static output_parsed_data_to_xlsform(data: dict, form_id: str, form_title: str, output_file: str)

Output parsed data to an XLSForm file.

Parameters:
  • data (dict) – Parsed data to output.

  • form_id (str) – Form ID to set in the XLSForm file.

  • form_title (str) – Form title to set in the XLSForm file.

  • output_file (str) – Path to the output XLSForm file.

parse_survey_contents(survey_contents: str | dict, survey_context: str = '', max_chunk_size: int = 3000, min_chunk_size: int = 1000) dict

Parse raw survey contents into structured data.

Parameters:
  • survey_contents (str | dict) – Raw survey contents, typically as returned by read_survey_contents(). If a string, it should be in Markdown format. If a dict, it should already be in the parsed data structure.

  • survey_context (str) – Context for the survey contents, if any. Default is an empty string. Survey context can be used to provide additional information to the parser, such as the type of survey or the survey’s purpose.

  • max_chunk_size (int) – Maximum chunk size for LLM processing. Default is 3000.

  • min_chunk_size (int) – Minimum chunk size for LLM processing. Default is 1000.

Returns:

A dict with modules (a dict with questions organized by module).

Return type:

dict

read_survey_contents(file_path: str, use_llm: bool = True) str | dict

Read the raw contents of a survey file.

Parameters:
  • file_path (str) – Path to the survey file.

  • use_llm (bool) – Whether to use the LLM to read the survey contents. Default is True. (False will extract text from the file using local methods.)

Returns:

Raw contents of the survey file, either as a dict (XLSForm or REDCap) or as a Markdown string.

Return type:

str | dict