fix tag

bdqnghi · mallamanis · commit 5093558699d8 · 2021-06-01T13:33:48.000+01:00
diff --git a/_publications/bui2021efficient.markdown b/_publications/bui2021efficient.markdown
@@ -7,6 +7,6 @@ year: 2021
 bibkey: bui2020efficient
 additional_links:
    - {name: "ArXiV", url: "https://arxiv.org/abs/2009.02731"}
-tags: ["self-supervised, pretraining, code-search"]
+tags: ["pretraining", "search"]
 ---
 We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in a fine-tuning process for tasks that might still require label data such as code summarization. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through a contrastive learning objective. To do so, we use a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent. Through extensive experiments, we have shown that the code models pretrained by Corder substantially outperform the other baselines for code-to-code retrieval, text-to-code retrieval, and code-to-text summarization tasks.
diff --git a/_publications/deze2021mulcode.markdown b/_publications/deze2021mulcode.markdown
@@ -7,6 +7,6 @@ year: 2021
 bibkey: deze2021mulcode
 additional_links:
    - {name: "PDF", url: "https://yuyue.github.io/res/paper/mulcode_saner2021.pdf"}
-tags: ["representation, multi task"]
+tags: ["representation"]
 ---
 Recent years have witnessed the significant rise of Deep Learning (DL) techniques applied to source code. Researchers exploit DL for a multitude of tasks and achieve impressive results. However, most tasks are explored separately, resulting in a lack of generalization of the solutions. In this work, we propose MulCode, a multi-task learning approach for source code understanding that learns unified representation space for tasks, with the pre-trained BERT model for the token sequence and the Tree-LSTM model for abstract syntax trees. Furthermore, we integrate two source code views into a hybrid representation via the attention mechanism and set learnable uncertainty parameters to adjust the tasks’ relationship. We train and evaluate MulCode in three downstream tasks: comment classification, author attribution, and duplicate function detection. In all tasks, MulCode outperforms the state-of-theart techniques. Moreover, experiments on three unseen tasks demonstrate the generalization ability of MulCode compared with state-of-the-art embedding methods.