{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Code Generation\n", "\n", "In this example, we are building a workflow for code generation. The benchmark dataset used is [HumanEval](https://github.com/openai/human-eval).\n", "\n", "The workflow is adopted from [Agents framework](https://github.com/aiwaves-cn/agents/tree/master/examples/humaneval), including two agents:\n", "- **Draft agent**: completes the function body as an initial draft.\n", "- **Refine agent**: checks and refines the function body.\n", "\n", "> **Note**: function is not executed in the refine agent.\n", "\n", "![codegen](../imgs/codegen.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1) Setup\n", "\n", "First, let's set the environment for workflow execution. We use openai model in this example, please set your key in `.env` file as:\n", "\n", "OPENAI_API_KEY=\"your-openai-key\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2) Check Codegen Workflow\n", "\n", "The implementation is based on `langchain` and is avaibale in `workflow.py`. Try it out with:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " balance = 0\n", " for char in brackets:\n", " if char == '(':\n", " balance += 1\n", " elif char == ')':\n", " balance -= 1\n", " if balance < 0:\n", " return False\n", " return balance == 0\n", "\n" ] } ], "source": [ "%run workflow.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3) Optimize The Workflow\n", "\n", "The workflow entry point is already registered using annotation `cognify.register_workflow`.\n", "\n", "Here we configure the optimization pipeline:\n", "1. Define the evaluation method\n", "2. Define the data loader\n", "3. Config the optimizer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Tell Cognify how to evaluate the generation\n", "\n", "To evaluate the generation, we first parse the function body since the useful content is wrapped with `` tags.\n", "\n", "Then we execute the function with predefine set of test cases.\n", "\n", "If pass all tests, the score of this generation is `1.0`, otherwise `0.0`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import cognify\n", "from humaneval.humaneval import check_correctness_thread\n", "\n", "@cognify.register_evaluator\n", "def pass_test(problem, finalized_code):\n", " split_completion = finalized_code.split('\\n')\n", " parsed_lines = []\n", " for line in split_completion:\n", " if \"\" in line or \"\" in line or \"```\" in line or \"python\" in line:\n", " continue\n", " parsed_lines.append(line)\n", " completion = '\\n'.join(parsed_lines)\n", "\n", " result = check_correctness_thread(problem, completion, timeout=3.0)\n", " return 1.0 if result[\"passed\"] else 0.0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Tell Cognify what data to use\n", "\n", "The data is available in `humaneval` folder. The raw data looks like follows:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'task_id': 'HumanEval/0',\n", " 'prompt': 'from typing import List\\n\\n\\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\\n \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\\n given threshold.\\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\\n False\\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\\n True\\n \"\"\"\\n',\n", " 'entry_point': 'has_close_elements',\n", " 'canonical_solution': ' for idx, elem in enumerate(numbers):\\n for idx2, elem2 in enumerate(numbers):\\n if idx != idx2:\\n distance = abs(elem - elem2)\\n if distance < threshold:\\n return True\\n\\n return False\\n',\n", " 'test': \"\\n\\nMETADATA = {\\n 'author': 'jt',\\n 'dataset': 'test'\\n}\\n\\n\\ndef check(candidate):\\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\\n assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\\n\\n\"}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from humaneval.humaneval import HumanEvalDataset\n", "raw_dataset = HumanEvalDataset()\n", "\n", "problem = raw_dataset.data[0]\n", "problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our workflow takes as input this `problem` dictionary and generates `finalized_code`.\n", "\n", "The evaluator function expects both `problem` and the `finalized_code`.\n", "\n", "> **Note:**\n", ">\n", "> Cognify will also forward workflow input to the evalautor function (if required in the function signature):\n", "> - to cater for cases like *llm as a judge* where the question is also needed in the evaluation\n", "\n", "Thus we only need to pass `problem` as input and set ground truth to empty." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from humaneval.humaneval import HumanEvalDataset\n", "import random\n", "\n", "@cognify.register_data_loader\n", "def load_data():\n", " raw_dataset = HumanEvalDataset()\n", " size = len(raw_dataset.data)\n", " # shuffle the data\n", " random.seed(42)\n", " random.shuffle(raw_dataset.data)\n", " \n", " data = []\n", " for i in range(size):\n", " problem = raw_dataset.data[i]\n", " input = {'problem': problem}\n", " ground_truth = {}\n", " data.append((input, ground_truth))\n", " train, val, test = data[:40], data[40:60], data[60:]\n", " return train, val, test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Config the optimizer\n", "\n", "Let's use the predefined search space for code generation, the search space includes:\n", "\n", "- Top Layer:\n", " - whether to spawn multiple workers for each agent\n", "- Bottom Layer:\n", " - 4 fewshot examples to add for each agent\n", " - whether to apply Chain-of-thought to each agent\n", "\n", "> **Note:** \n", "> workers spawned in top-layer is treated as new tunable targets in the bottom layer." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "## search\n", "from cognify.hub.search import codegen\n", "\n", "search_settings = codegen.create_search(evaluator_batch_size=40)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4) Start the Optimization\n", "\n", "You can save the above configs in `config.py` file and use Cognify's CLI to fire the optimization with:\n", "\n", "```console\n", "$ cognify optimize workflow.py\n", "```\n", "\n", "Alternatively you can run the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train, val, dev = load_data()\n", "\n", "opt_cost, pareto_frontier, opt_logs = cognify.optimize(\n", " script_path=\"workflow.py\",\n", " control_param=search_settings,\n", " train_set=train,\n", " val_set=val,\n", " eval_fn=pass_test,\n", " force=True, # This will overwrite the existing results\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Optimization Results\n", "\n", "Cognfiy will output each optimized workflow to a `.cog` file. For this workflow, the optimizer chooses the following optimizations:\n", "- add chain-of-thought reasoning to the code completion step\n", "- ensemble the code refinement step\n", "- add few-shot examples to the ensembled code refinement step\n", "\n", "The final optimized workflow is depicted below, with optimizations highlighted in green.\n", "\n", "![codegen-opt](../imgs/codegen_optimized.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 4 selected few-shot examples for the code refinement module include both the incomplete and completed code, as well as the finalized output. Here is an example:\n", "\n", "> **Incomplete Function**\n", "> ```python\n", "> def valid_date(date):\n", "> \"\"\"You have to write a function which validates a given date string and\n", "> returns True if the date is valid otherwise False.\n", "> The date is valid if all of the following ... (truncated for brevity)\n", "> \"\"\"\n", "> ```\n", "> \n", "> **Completed Code**\n", "> ```python\n", "> def valid_date(date):\n", "> if not date:\n", "> return False\n", "> try:\n", "> month, day, year = map(int, date.split('-'))\n", "> except ValueError:\n", "> return False\n", "> if month < 1 or month > 12:\n", "> ... (truncated for brevity)\n", "> ```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check out more details on [how to interpret optimization results](https://cognify-ai.readthedocs.io/en/latest/user_guide/tutorials/interpret.html#detailed-transformation-trace)." ] } ], "metadata": { "kernelspec": { "display_name": "fresh_env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 2 }