This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce “halucinations” by providing each bot a more specific task.
Similar to hallucinations encountered when using LLMs for natural language querying, directly
generating entire software systems using LLMs can result in severe code hallucinations, such as
incomplete implementation, missing dependencies, and undiscovered bugs. These hallucinations
may stem from the lack of specificity in the task and the absence of cross-examination in decision-
making. To address these limitations, as Figure 1 shows, we establish a virtual chat -powered software tech nology company – CHATDEV, which comprises of recruited agents from diverse social identities, such as chief officers, professional programmers, test engineers, and art designers. When presented with a task, the diverse agents at CHATDEV collaborate to develop a required software, including
an executable system, environmental guidelines, and user manuals. This paradigm revolves around leveraging large language models as the core thinking component, enabling the agents to simulate the entire software development process, circumventing the need for additional model training and mitigating undesirable code hallucinations to some extent.
This research seems to be more focused on whether the bots would interoperate in different roles to coordinate on a task than about creating the actual software. The idea is to reduce “halucinations” by providing each bot a more specific task.
The paper goes into more about this:
I assume the endgame of this is the boardroom suggestion
guybot asking “is this based on real facts? / does this actually function?”