{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Code Generation\n",
"\n",
"In this example, we are building a workflow for code generation. The benchmark dataset used is [HumanEval](https://github.com/openai/human-eval).\n",
"\n",
"The workflow is adopted from [Agents framework](https://github.com/aiwaves-cn/agents/tree/master/examples/humaneval), including two agents:\n",
"- **Draft agent**: completes the function body as an initial draft.\n",
"- **Refine agent**: checks and refines the function body.\n",
"\n",
"> **Note**: function is not executed in the refine agent.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1) Setup\n",
"\n",
"First, let's set the environment for workflow execution. We use openai model in this example, please set your key in `.env` file as:\n",
"\n",
"OPENAI_API_KEY=\"your-openai-key\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2) Check Codegen Workflow\n",
"\n",
"The implementation is based on `langchain` and is avaibale in `workflow.py`. Try it out with:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
" balance = 0\n",
" for char in brackets:\n",
" if char == '(':\n",
" balance += 1\n",
" elif char == ')':\n",
" balance -= 1\n",
" if balance < 0:\n",
" return False\n",
" return balance == 0\n",
"\n"
]
}
],
"source": [
"%run workflow.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3) Optimize The Workflow\n",
"\n",
"The workflow entry point is already registered using annotation `cognify.register_workflow`.\n",
"\n",
"Here we configure the optimization pipeline:\n",
"1. Define the evaluation method\n",
"2. Define the data loader\n",
"3. Config the optimizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.1 Tell Cognify how to evaluate the generation\n",
"\n",
"To evaluate the generation, we first parse the function body since the useful content is wrapped with `` tags.\n",
"\n",
"Then we execute the function with predefine set of test cases.\n",
"\n",
"If pass all tests, the score of this generation is `1.0`, otherwise `0.0`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cognify\n",
"from humaneval.humaneval import check_correctness_thread\n",
"\n",
"@cognify.register_evaluator\n",
"def pass_test(problem, finalized_code):\n",
" split_completion = finalized_code.split('\\n')\n",
" parsed_lines = []\n",
" for line in split_completion:\n",
" if \"\" in line or \"\" in line or \"```\" in line or \"python\" in line:\n",
" continue\n",
" parsed_lines.append(line)\n",
" completion = '\\n'.join(parsed_lines)\n",
"\n",
" result = check_correctness_thread(problem, completion, timeout=3.0)\n",
" return 1.0 if result[\"passed\"] else 0.0"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.2 Tell Cognify what data to use\n",
"\n",
"The data is available in `humaneval` folder. The raw data looks like follows:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'task_id': 'HumanEval/0',\n",
" 'prompt': 'from typing import List\\n\\n\\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\\n \"\"\" Check if in given list of numbers, are any two numbers closer to each other than\\n given threshold.\\n >>> has_close_elements([1.0, 2.0, 3.0], 0.5)\\n False\\n >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)\\n True\\n \"\"\"\\n',\n",
" 'entry_point': 'has_close_elements',\n",
" 'canonical_solution': ' for idx, elem in enumerate(numbers):\\n for idx2, elem2 in enumerate(numbers):\\n if idx != idx2:\\n distance = abs(elem - elem2)\\n if distance < threshold:\\n return True\\n\\n return False\\n',\n",
" 'test': \"\\n\\nMETADATA = {\\n 'author': 'jt',\\n 'dataset': 'test'\\n}\\n\\n\\ndef check(candidate):\\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True\\n assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False\\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.95) == True\\n assert candidate([1.0, 2.0, 5.9, 4.0, 5.0], 0.8) == False\\n assert candidate([1.0, 2.0, 3.0, 4.0, 5.0, 2.0], 0.1) == True\\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 1.0) == True\\n assert candidate([1.1, 2.2, 3.1, 4.1, 5.1], 0.5) == False\\n\\n\"}"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from humaneval.humaneval import HumanEvalDataset\n",
"raw_dataset = HumanEvalDataset()\n",
"\n",
"problem = raw_dataset.data[0]\n",
"problem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our workflow takes as input this `problem` dictionary and generates `finalized_code`.\n",
"\n",
"The evaluator function expects both `problem` and the `finalized_code`.\n",
"\n",
"> **Note:**\n",
">\n",
"> Cognify will also forward workflow input to the evalautor function (if required in the function signature):\n",
"> - to cater for cases like *llm as a judge* where the question is also needed in the evaluation\n",
"\n",
"Thus we only need to pass `problem` as input and set ground truth to empty."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"from humaneval.humaneval import HumanEvalDataset\n",
"import random\n",
"\n",
"@cognify.register_data_loader\n",
"def load_data():\n",
" raw_dataset = HumanEvalDataset()\n",
" size = len(raw_dataset.data)\n",
" # shuffle the data\n",
" random.seed(42)\n",
" random.shuffle(raw_dataset.data)\n",
" \n",
" data = []\n",
" for i in range(size):\n",
" problem = raw_dataset.data[i]\n",
" input = {'problem': problem}\n",
" ground_truth = {}\n",
" data.append((input, ground_truth))\n",
" train, val, test = data[:40], data[40:60], data[60:]\n",
" return train, val, test"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3.3 Config the optimizer\n",
"\n",
"Let's use the predefined search space for code generation, the search space includes:\n",
"\n",
"- Top Layer:\n",
" - whether to spawn multiple workers for each agent\n",
"- Bottom Layer:\n",
" - 4 fewshot examples to add for each agent\n",
" - whether to apply Chain-of-thought to each agent\n",
"\n",
"> **Note:** \n",
"> workers spawned in top-layer is treated as new tunable targets in the bottom layer."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"## search\n",
"from cognify.hub.search import codegen\n",
"\n",
"search_settings = codegen.create_search(evaluator_batch_size=40)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4) Start the Optimization\n",
"\n",
"You can save the above configs in `config.py` file and use Cognify's CLI to fire the optimization with:\n",
"\n",
"```console\n",
"$ cognify optimize workflow.py\n",
"```\n",
"\n",
"Alternatively you can run the following:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"train, val, dev = load_data()\n",
"\n",
"opt_cost, pareto_frontier, opt_logs = cognify.optimize(\n",
" script_path=\"workflow.py\",\n",
" control_param=search_settings,\n",
" train_set=train,\n",
" val_set=val,\n",
" eval_fn=pass_test,\n",
" force=True, # This will overwrite the existing results\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Optimization Results\n",
"\n",
"Cognfiy will output each optimized workflow to a `.cog` file. For this workflow, the optimizer chooses the following optimizations:\n",
"- add chain-of-thought reasoning to the code completion step\n",
"- ensemble the code refinement step\n",
"- add few-shot examples to the ensembled code refinement step\n",
"\n",
"The final optimized workflow is depicted below, with optimizations highlighted in green.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 4 selected few-shot examples for the code refinement module include both the incomplete and completed code, as well as the finalized output. Here is an example:\n",
"\n",
"> **Incomplete Function**\n",
"> ```python\n",
"> def valid_date(date):\n",
"> \"\"\"You have to write a function which validates a given date string and\n",
"> returns True if the date is valid otherwise False.\n",
"> The date is valid if all of the following ... (truncated for brevity)\n",
"> \"\"\"\n",
"> ```\n",
"> \n",
"> **Completed Code**\n",
"> ```python\n",
"> def valid_date(date):\n",
"> if not date:\n",
"> return False\n",
"> try:\n",
"> month, day, year = map(int, date.split('-'))\n",
"> except ValueError:\n",
"> return False\n",
"> if month < 1 or month > 12:\n",
"> ... (truncated for brevity)\n",
"> ```\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Check out more details on [how to interpret optimization results](https://cognify-ai.readthedocs.io/en/latest/user_guide/tutorials/interpret.html#detailed-transformation-trace)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "fresh_env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}