Skip to content

Commit 8c0eac6

Browse files
author
Miltos Allamanis
committed
Merge "retrieval"->"search" tag
1 parent a9c6915 commit 8c0eac6

5 files changed

Lines changed: 5 additions & 5 deletions

File tree

_publications/derezendemartins2020concra.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ bibkey: derezendemartins2020concra
88
additional_links:
99
- {name: "ArXiV", url: "https://arxiv.org/abs/2009.01959"}
1010
- {name: "code", url: "https://github.com/mrezende/concra"}
11-
tags: ["retrieval"]
11+
tags: ["search"]
1212
---
1313
Software developers routinely search for code using general-purpose search engines. However, these search engines cannot find code semantically unless it has an accompanying description. We propose a technique for semantic code search: A Convolutional Neural Network approach to code retrieval (CoNCRA). Our technique aims to find the code snippet that most closely matches the developer's intent, expressed in natural language. We evaluated our approach's efficacy on a dataset composed of questions and code snippets collected from Stack Overflow. Our preliminary results showed that our technique, which prioritizes local interactions (words nearby), improved the state-of-the-art (SOTA) by 5% on average, retrieving the most relevant code snippets in the top 3 (three) positions by almost 80% of the time. Therefore, our technique is promising and can improve the efficacy of semantic code retrieval.
1414

_publications/husain2019codesearchnet.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ additional_links:
99
- {name: "ArXiV", url: "https://arxiv.org/abs/1909.09436"}
1010
- {name: "Code and other info", url: "https://github.com/github/CodeSearchNet"}
1111
- {name: "Leaderboard", url: "https://app.wandb.ai/github/codesearchnet/benchmark"}
12-
tags: ["dataset", "retrieval", "search"]
12+
tags: ["dataset", "search"]
1313
---
1414
Semantic code search is the task of retrieving relevant code given a natural language query. While related to other information retrieval tasks, it requires bridging the gap between the language used in code (often abbreviated and highly technical) and natural language more suitable to describe vague concepts and ideas.
1515

_publications/luan2019aroma.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ authors: Sifei Luan, Di Yang, Celeste Barnaby, Koushik Sen, Satish Chandra
55
conference: PACMPL
66
year: 2015
77
bibkey: luan2019aroma
8-
tags: ["retrieval"]
8+
tags: ["search"]
99
---
1010
Programmers often write code that has similarity to existing code written somewhere. A tool that could help programmers to search such similar code would be immensely useful. Such a tool could help programmers to extend partially written code snippets to completely implement necessary functionality, help to discover extensions to the partial code which are commonly included by other programmers, help to cross-check against similar code written by other programmers, or help to add extra code which would fix common mistakes and errors. We propose Aroma, a tool and technique for code recommendation via structural code search. Aroma indexes a huge code corpus including thousands of open-source projects, takes a partial code snippet as input, searches the corpus for method bodies containing the partial code snippet, and clusters and intersects the results of the search to recommend a small set of succinct code snippets which both contain the query snippet and appear as part of several methods in the corpus. We evaluated Aroma on 2000 randomly selected queries created from the corpus, as well as 64 queries derived from code snippets obtained from Stack Overflow, a popular website for discussing code. We implemented Aroma for 4 different languages, and developed an IDE plugin for Aroma. Furthermore, we conducted a study where we asked 12 programmers to complete programming tasks using Aroma, and collected their feedback. Our results indicate that Aroma is capable of retrieving and recommending relevant code snippets efficiently.

_publications/sun2020pscs.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ year: 2020
77
bibkey: sun2020pscs
88
additional_links:
99
- {name: "ArXiV", url: "https://arxiv.org/abs/2008.03042"}
10-
tags: ["AST", "retrieval", "search"]
10+
tags: ["AST", "search"]
1111
---
1212
To obtain code snippets for reuse, programmers prefer to search for related documents, e.g., blogs or Q&A, instead of code itself. The major reason is due to the semantic diversity and mismatch between queries and code snippets. Deep learning models have been proposed to address this challenge. Compared with approaches using information retrieval techniques, deep learning models do not suffer from the information loss caused by refining user intention into keywords. However, the performance of previous works is not satisfactory because they ignore the importance of code structure. When the semantics of code (e.g., identifier names, APIs) are ambiguous, code structure may be the only feature for the model to utilize. In that case, previous works relearn the structural information from lexical tokens of code, which is extremely difficult for a model without any domain knowledge. In this work, we propose PSCS, a path-based neural model for semantic code search. Our model encodes both the semantics and structures of code represented by AST paths. We train and evaluate our model over 330k-19k query-function pairs, respectively. The evaluation results demonstrate that PSCS achieves a SuccessRate of 47.6% and a Mean Reciprocal Rank (MRR) of 30.4% when considering the top-10 results with a match. The proposed approach significantly outperforms both DeepCS, the first approach that applies deep learning to code search task, and CARLCS, a state-of-the-art approach that introduces a co-attentive representation learning model on the basis of DeepCS. The importance of code structure is demonstrated with an ablation study on code features, which enlightens model design for further studies.

_publications/wang2020trans.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ year: 2020
77
bibkey: wang2020trans
88
additional_links:
99
- {name: "ArXiV", url: "https://arxiv.org/abs/2003.03238"}
10-
tags: ["retrieval", "documentation"]
10+
tags: ["search", "documentation"]
1111
---
1212
Code summarization and code search have been widely adopted in sofwaredevelopmentandmaintenance. However, fewstudieshave explored the efcacy of unifying them. In this paper, we propose TranS^3 , a transformer-based framework to integrate code summarization with code search. Specifcally, for code summarization,TranS^3 enables an actor-critic network, where in the actor network, we encode the collected code snippets via transformer- and tree-transformer-based encoder and decode the given code snippet to generate its comment. Meanwhile, we iteratively tune the actor network via the feedback from the critic network for enhancing the quality of the generated comments. Furthermore, we import the generated comments to code search for enhancing its accuracy. To evaluatetheefectivenessof TranS^3 , we conduct a set of experimental studies and case studies where the experimental results suggest that TranS^3 can signifcantly outperform multiple state-of-the-art approaches in both code summarization and code search and the study results further strengthen the efcacy of TranS^3 from the developers' points of view.

0 commit comments

Comments
 (0)