Skip to content

Commit 8818a20

Browse files
author
Miltos Allamanis
committed
Add recent paper.
1 parent e663af3 commit 8818a20

1 file changed

Lines changed: 12 additions & 0 deletions

File tree

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
---
2+
layout: publication
3+
title: "DOBF: A Deobfuscation Pre-Training Objective for Programming Languages"
4+
authors: Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample
5+
conference:
6+
year: 2021
7+
bibkey: roziere2021dobf
8+
additional_links:
9+
- {name: "ArXiV", url: "https://arxiv.org/abs/2102.07492"}
10+
tags: ["pretraining"]
11+
---
12+
Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 13% in unsupervised code translation, and 24% in natural language code search. Incidentally, we found that our pre-trained model is able to de-obfuscate fully obfuscated source files, and to suggest descriptive variable names.

0 commit comments

Comments
 (0)