Skip to content

Commit 5093558

Browse files
bdqnghimallamanis
authored andcommitted
fix tag
1 parent 5299f3d commit 5093558

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

_publications/bui2021efficient.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ year: 2021
77
bibkey: bui2020efficient
88
additional_links:
99
- {name: "ArXiV", url: "https://arxiv.org/abs/2009.02731"}
10-
tags: ["self-supervised, pretraining, code-search"]
10+
tags: ["pretraining", "search"]
1111
---
1212
We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in a fine-tuning process for tasks that might still require label data such as code summarization. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through a contrastive learning objective. To do so, we use a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent. Through extensive experiments, we have shown that the code models pretrained by Corder substantially outperform the other baselines for code-to-code retrieval, text-to-code retrieval, and code-to-text summarization tasks.

_publications/deze2021mulcode.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,6 @@ year: 2021
77
bibkey: deze2021mulcode
88
additional_links:
99
- {name: "PDF", url: "https://yuyue.github.io/res/paper/mulcode_saner2021.pdf"}
10-
tags: ["representation, multi task"]
10+
tags: ["representation"]
1111
---
1212
Recent years have witnessed the significant rise of Deep Learning (DL) techniques applied to source code. Researchers exploit DL for a multitude of tasks and achieve impressive results. However, most tasks are explored separately, resulting in a lack of generalization of the solutions. In this work, we propose MulCode, a multi-task learning approach for source code understanding that learns unified representation space for tasks, with the pre-trained BERT model for the token sequence and the Tree-LSTM model for abstract syntax trees. Furthermore, we integrate two source code views into a hybrid representation via the attention mechanism and set learnable uncertainty parameters to adjust the tasks’ relationship. We train and evaluate MulCode in three downstream tasks: comment classification, author attribution, and duplicate function detection. In all tasks, MulCode outperforms the state-of-theart techniques. Moreover, experiments on three unseen tasks demonstrate the generalization ability of MulCode compared with state-of-the-art embedding methods.

0 commit comments

Comments
 (0)