{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Code Generation\n",
    "\n",
    "In this example, we are building a workflow for code generation. The benchmark dataset used is [HumanEval](https://github.com/openai/human-eval).\n",
    "\n",
    "The workflow is adopted from [Agents framework](https://github.com/aiwaves-cn/agents/tree/master/examples/humaneval), including two agents:\n",
    "- **Draft agent**: completes the function body as an initial draft.\n",
    "- **Refine agent**: checks and refines the function body.\n",
    "\n",
    "> **Note**: function is not executed in the refine agent.\n",
    "\n",
    "![codegen](../imgs/codegen.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1) Setup\n",
    "\n",
    "First, let's set the environment for workflow execution. We use openai model in this example, please set your key in `.env` file as:\n",
    "\n",
    "OPENAI_API_KEY=\"your-openai-key\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2) Check Codegen Workflow\n",
    "\n",
    "The implementation is based on `langchain` and is avaibale in `workflow.py`. Try it out with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<result>\n",
      "    balance = 0\n",
      "    for char in brackets:\n",
      "        if char == '(':\n",
      "            balance += 1\n",
      "        elif char == ')':\n",
      "            balance -= 1\n",
      "        if balance < 0:\n",
      "            return False\n",
      "    return balance == 0\n",
      "</result>\n"
     ]
    }
   ],
   "source": [
    "%run workflow.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3) Optimize The Workflow\n",
    "\n",
    "The workflow entry point is already registered using annotation `cognify.register_workflow`.\n",
    "\n",
    "Here we configure the optimization pipeline:\n",
    "1. Define the evaluation method\n",
    "2. Define the data loader\n",
    "3. Config the optimizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Tell Cognify how to evaluate the generation\n",
    "\n",
    "To evaluate the generation, we first parse the function body since the useful content is wrapped with `<result`>`</result`> tags.\n",
    "\n",
    "Then we execute the function with predefine set of test cases.\n",
    "\n",
    "If pass all tests, the score of this generation is `1.0`, otherwise `0.0`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cognify\n",
    "from humaneval.humaneval import check_correctness_thread\n",
    "\n",
    "@cognify.register_evaluator\n",
    "def pass_test(problem, finalized_code):\n",
    "    split_completion = finalized_code.split('\\n')\n",
    "    parsed_lines = []\n",
    "    for line in split_completion:\n",
    "        if \"<result>\" in line or \"</result>\" in line or \"```\" in line or \"python\" in line:\n",
    "            continue\n",
    "        parsed_lines.append(line)\n",
    "    completion = '\\n'.join(parsed_lines)\n",
    "\n",
    "    result = check_correctness_thread(problem, completion, timeout=3.0)\n",
    "    return 1.0 if result[\"passed\"] else 0.0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Tell Cognify what data to use\n",
    "\n",
    "The data is available in `humaneval` folder. The raw data looks like follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'task_id': 'HumanEval/0',\n",
       " 'prompt': 'from typing import List\\n\\n\\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\\n    \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\\n    given threshold.\\n    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\\n    False\\n    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\\n    True\\n    \"\"\"\\n',\n",
       " 'entry_point': 'has_close_elements',\n",
       " 'canonical_solution': '    for idx, elem in enumerate(numbers):\\n        for idx2, elem2 in enumerate(numbers):\\n            if idx != idx2:\\n                distance = abs(elem - elem2)\\n                if distance < threshold:\\n                    return True\\n\\n    return False\\n',\n",
       " 'test': \"\\n\\nMETADATA = {\\n    'author': 'jt',\\n    'dataset': 'test'\\n}\\n\\n\\ndef check(candidate):\\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\\n    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\\n    assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\\n    assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\\n    assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\\n\\n\"}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from humaneval.humaneval import HumanEvalDataset\n",
    "raw_dataset = HumanEvalDataset()\n",
    "\n",
    "problem = raw_dataset.data[0]\n",
    "problem"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Our workflow takes as input this `problem` dictionary and generates `finalized_code`.\n",
    "\n",
    "The evaluator function expects both `problem` and the `finalized_code`.\n",
    "\n",
    "> **Note:**\n",
    ">\n",
    "> Cognify will also forward workflow input to the evalautor function (if required in the function signature):\n",
    "> - to cater for cases like *llm as a judge* where the question is also needed in the evaluation\n",
    "\n",
    "Thus we only need to pass `problem` as input and set ground truth to empty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from humaneval.humaneval import HumanEvalDataset\n",
    "import random\n",
    "\n",
    "@cognify.register_data_loader\n",
    "def load_data():\n",
    "    raw_dataset = HumanEvalDataset()\n",
    "    size = len(raw_dataset.data)\n",
    "    # shuffle the data\n",
    "    random.seed(42)\n",
    "    random.shuffle(raw_dataset.data)\n",
    "    \n",
    "    data = []\n",
    "    for i in range(size):\n",
    "        problem = raw_dataset.data[i]\n",
    "        input = {'problem': problem}\n",
    "        ground_truth = {}\n",
    "        data.append((input, ground_truth))\n",
    "    train, val, test = data[:40], data[40:60], data[60:]\n",
    "    return train, val, test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.3 Config the optimizer\n",
    "\n",
    "Let's use the predefined search space for code generation, the search space includes:\n",
    "\n",
    "- Top Layer:\n",
    "    - whether to spawn multiple workers for each agent\n",
    "- Bottom Layer:\n",
    "    - 4 fewshot examples to add for each agent\n",
    "    - whether to apply Chain-of-thought to each agent\n",
    "\n",
    "> **Note:** \n",
    "> workers spawned in top-layer is treated as new tunable targets in the bottom layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "## search\n",
    "from cognify.hub.search import codegen\n",
    "\n",
    "search_settings = codegen.create_search(evaluator_batch_size=40)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4) Start the Optimization\n",
    "\n",
    "You can save the above configs in `config.py` file and use Cognify's CLI to fire the optimization with:\n",
    "\n",
    "```console\n",
    "$ cognify optimize workflow.py\n",
    "```\n",
    "\n",
    "Alternatively you can run the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train, val, dev = load_data()\n",
    "\n",
    "opt_cost, pareto_frontier, opt_logs = cognify.optimize(\n",
    "    script_path=\"workflow.py\",\n",
    "    control_param=search_settings,\n",
    "    train_set=train,\n",
    "    val_set=val,\n",
    "    eval_fn=pass_test,\n",
    "    force=True, # This will overwrite the existing results\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Optimization Results\n",
    "\n",
    "Cognfiy will output each optimized workflow to a `.cog` file. For this workflow, the optimizer chooses the following optimizations:\n",
    "- add chain-of-thought reasoning to the code completion step\n",
    "- ensemble the code refinement step\n",
    "- add few-shot examples to the ensembled code refinement step\n",
    "\n",
    "The final optimized workflow is depicted below, with optimizations highlighted in green.\n",
    "\n",
    "![codegen-opt](../imgs/codegen_optimized.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 4 selected few-shot examples for the code refinement module include both the incomplete and completed code, as well as the finalized output. Here is an example:\n",
    "\n",
    "> **Incomplete Function**\n",
    "> ```python\n",
    "> def valid_date(date):\n",
    ">     \"\"\"You have to write a function which validates a given date string and\n",
    ">     returns True if the date is valid otherwise False.\n",
    ">     The date is valid if all of the following ... (truncated for brevity)\n",
    ">     \"\"\"\n",
    "> ```\n",
    "> \n",
    "> **Completed Code**\n",
    "> ```python\n",
    "> def valid_date(date):\n",
    ">     if not date:\n",
    ">         return False\n",
    ">     try:\n",
    ">         month, day, year = map(int, date.split('-'))\n",
    ">     except ValueError:\n",
    ">         return False\n",
    ">     if month < 1 or month > 12:\n",
    ">         ... (truncated for brevity)\n",
    "> ```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check out more details on [how to interpret optimization results](https://cognify-ai.readthedocs.io/en/latest/user_guide/tutorials/interpret.html#detailed-transformation-trace)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fresh_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}