Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot | Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (2024)

Advanced Search
Browse
About
- Sign in
- Register

Advanced Search

icse

research-article

Open Access

Authors:
Ionut Daniel fa*gadau University of Milano - Bicocca, Milan, Italy

University of Milano - Bicocca, Milan, Italy

https://orcid.org/0009-0007-8464-8435
Search about this author

,
Leonardo Mariani University of Milano - Bicocca, Milan, Italy

University of Milano - Bicocca, Milan, Italy

https://orcid.org/0000-0001-9527-7042
Search about this author

,
Daniela Micucci University of Milano - Bicocca, Milan, Italy

University of Milano - Bicocca, Milan, Italy

https://orcid.org/0000-0003-1261-2234
Search about this author

,
Oliviero Riganelli University of Milano - Bicocca, Milan, Italy

University of Milano - Bicocca, Milan, Italy

https://orcid.org/0000-0003-2120-2894
Search about this author

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program ComprehensionApril 2024Pages 24–34https://doi.org/10.1145/3643916.3644409

Published:13 June 2024Publication History

1citation
1
Downloads

Metrics

Total Citations1Total Downloads1

Last 12 Months1

Last 6 weeks1

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot

Pages 24–34

PreviousChapterNextChapter

ABSTRACT

Generative AI is changing the way developers interact with software systems, providing services that can produce and deliver new content, crafted to satisfy the actual needs of developers. For instance, developers can ask for new code directly from within their IDEs by writing natural language prompts, and integrated services based on generative AI, such as Copilot, immediately respond to prompts by providing ready-to-use code snippets. Formulating the prompt appropriately, and incorporating the useful information while avoiding any information overload, can be an important factor in obtaining the right piece of code. The task of designing good prompts is known as prompt engineering.

In this paper, we systematically investigate the influence of eight prompt features on the style and the content of prompts, on the level of correctness, complexity, size, and similarity to the developers' code of the generated code. We specifically consider the task of using Copilot with 124,800 prompts obtained by systematically combining the eight considered prompt features to generate the implementation of 200 Java methods. Results show how some prompt features, such as the presence of examples and the summary of the purpose of the method, can significantly influence the quality of the result.

References

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. 2021. Program Synthesis with Large Language Models. arXiv:2108.07732Google Scholar
Shraddha Barke, Michael B. James, and Nadia Polikarpova. 2023. Grounded Copilot: How Programmers Interact with Code-Generating Models. Proceedings of the ACM Programming Languages 7, OOPSLA1 (2023). Google ScholarDigital Library
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models Are Few-Shot Learners. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS).Google Scholar
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating large language models trained on code. arXiv:2107.03374Google Scholar
Vincenzo Corso, Daniela Mariani, Leonardo Micucci, and Oliviero Riganelli. 2024. Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants. In Proceedings of the International Conference on Program Comprehension (ICPC). Google ScholarDigital Library
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the ACM Technical Symposium on Computer Science Education (SIGCSE TS). Google ScholarDigital Library
Thomas Dohmke. 2023. The economic impact of the AI-powered developer lifecycle and lessons from GitHub Copilot. https://github.blog/2023-06-27-the-economic-impact-of-the-ai-powered-developer-lifecycle-and-lessons-from-github-copilot/.Google Scholar
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, and Mike Lewis. 2023. In-Coder: A Generative Model for Code Infilling and Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
GitHub. 2023. Copilot. https://github.com/features/copilot.Google Scholar
GitHub. 2023. GitHub. https://github.com/.Google Scholar
GitHub. 2023. GitHub Copilot in VS Code. https://code.visualstudio.com/docs/editor/github-copilot.Google Scholar
Google. 2023. Bard. https://bard.google.com.Google Scholar
LeetCode. 2023. LeetCode. https://leetcode.com.Google Scholar
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals. 2022. Competition-level code generation with AlphaCode. Science 378, 6624 (2022), 1092--1097. Google ScholarCross Ref
Zongjie Li, Chaozheng Wang, Zhibo Liu, Haoxuan Wang, Dong Chen, Shuai Wang, and Cuiyun Gao. 2023. CCTEST: Testing and Repairing Code Completion Systems. In Proceedings of the International Conference on Software Engineering (ICSE). Google ScholarDigital Library
Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). Google ScholarDigital Library
Leo S. Lo. 2023. The Art and Science of Prompt Engineering: A New Literacy in the Information Age. Internet Reference Services Quarterly 27, 4 (2023), 203--210. Google ScholarCross Ref
Thomas W. MacFarland and Jan M. Yates. 2016. Kruskal-Wallis H-test for oneway analysis of variance (ANOVA) by ranks. Introduction to nonparametric statistics for the biological sciences using R (2016), 177--211. Google ScholarCross Ref
Antonio Mastropaolo, Luca Pascarella, Emanuela Guglielmi, Matteo Ciniselli, Simone Scalabrino, Rocco Oliveto, and Gabriele Bavota. 2023. On the Robustness of Code Generation Techniques: An Empirical Study on GitHub Copilot. In proceedings of the International Conference on Software Engineering (ICSE). Google ScholarDigital Library
Patrick Mcknight and Julius Najab. 2010. Mann-Whitney U Test. the Corsini encyclopedia of psychology (2010). Google ScholarCross Ref
Nhan Nguyen and Sarah Nadi. 2022. An empirical evaluation of GitHub copilot's code suggestions. In Proceedings of the International Conference on Mining Software Repositories (MSR). Google ScholarDigital Library
Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. 2023. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).Google Scholar
OpenAI. 2023. ChatGPT. https://openai.com/chatgpt.Google Scholar
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. 2020. CodeBLEU: a Method for Automatic Evaluation of Code Synthesis. arXiv:2009.10297Google Scholar
Xiaoxue Ren, Xinyuan Ye, Dehai Zhao, Zhenchang Xing, and Xiaohu Yang. 2023. From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE). Google ScholarCross Ref
Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Proceedings of the Conference on Human Factors in Computing Systems (CHI). Google ScholarDigital Library
Alberto D. Rodriguez, Katherine R. Dearstyne, and Jane Cleland-Huang. 2023. Prompts Matter: Insights and Strategies for Prompt Engineering in Automated Software Traceability. In Proceedings of the Software and Systems Traceability Workshop (SST) at the International Requirements Engineering Conference (RE).Google ScholarCross Ref
Jules White, Quchen Fu, Sam Hays, Michael Sandborn, Carlos Olea, Henry Gilbert, Ashraf Elnashar, Jesse Spencer-Smith, and Douglas C Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv:2302.11382Google Scholar
Jules White, Sam Hays, Quchen Fu, Jesse Spencer-Smith, and Douglas C. Schmidt. 2023. ChatGPT Prompt Patterns for Improving Code Quality, Refactoring, Requirements Elicitation, and Software Design. arXiv:2303.07839Google Scholar
Burak Yetistiren, Isik Ozsoy, and Eray Tuzun. 2022. Assessing the Quality of GitHub Copilot's Code Generation. In Proceedings of the Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE). Google ScholarDigital Library
Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Peter Liu. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In Proceedings of the International Conference on International Conference on Machine Learning (ICML).Google Scholar

Cited By

View all

Index Terms

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot
1. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments

Recommendations

How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
Background: Recent advancements in large language models have motivated the practical use of such models in code generation and program synthesis. However, little is known about the effects of such tools on code readability and visual attention in ...
Read More
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?
Abstract
Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a ...
Read More
See Also
3.1. Cross-validation: evaluating estimator performance
IDLGen: Automated Code Generation forInter-parameter Dependencies inWeb APIs
Service-Oriented Computing
Abstract
The generation of code templates from web API specifications is a common practice in industry. However, existing tools neglect the dependencies among input parameters (so-called inter-parameter dependencies), extremely common in practice and ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Information
Contributors

Published in
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
April 2024
487 pages
ISBN:9798400705861
DOI:10.1145/3643916
Chair:
Igor Steinmacher,
Co-chair:
Mario Linares-Vasquez,
Program Chair:
Kevin Patrick Moran,
Program Co-chair:
Olga Baysal
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2024
Check for updates
Author Tags
prompt engineering
code generation
copilot
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations1

Article Metrics
- 1
  Total Citations
  View Citations
- 1
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

View Digital Edition

Figures
Other

Close Figure Viewer

Browse AllReturnChange zoom level

Caption

View Table of Contents

Export Citations

Your Search Results Download Request

We are preparing your search results for download ...

We will inform you here when the file is ready.

Download now!

Your Search Results Download Request

Your file of search results citations is now ready.

Download now!

Your Search Results Download Request

Your search export query has expired. Please try again.

Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot | Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (2024)

New Citation Alert added!

New Citation Alert!

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

ABSTRACT

References

Cited By

Index Terms

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Export Citations

References