Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,5 @@ obj/**
tmp/**
.python-version
.vs/
test-results.xml
!build/
Binary file added images/JavascriptProfiler.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion news/3 Code Health/4852.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Hash imports so we can get every import and not just known ones. We have to unhash them on the other side to determine if a specific package or not.
Hash imports so we can get every import and not just known ones. Should happen at most once every 5 seconds.
48 changes: 42 additions & 6 deletions src/client/telemetry/importTracker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@ import * as path from 'path';
import { TextDocument } from 'vscode';

import { sendTelemetryEvent } from '.';
import { noop, sleep } from '../../test/core';
import { noop } from '../../test/core';
import { IDocumentManager } from '../common/application/types';
import { isTestExecution } from '../common/constants';
import { IHistoryProvider } from '../datascience/types';
import { ICodeExecutionManager } from '../terminals/types';
import { EventName } from './constants';
Expand All @@ -18,9 +19,14 @@ import { IImportTracker } from './types';
const ImportRegEx = /^(?!['"#]).*from\s+([a-zA-Z0-9_\.]+)\s+import.*(?!['"])|^(?!['"#]).*import\s+([a-zA-Z0-9_\., ]+).*(?!['"])/;

@brettcannon brettcannon Mar 21, 2019

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const ImportRegEx = /^(?!['"#]).*from\s+([a-zA-Z0-9_\.]+)\s+import.*(?!['"])|^(?!['"#]).*import\s+([a-zA-Z0-9_\., ]+).*(?!['"])/;
const ImportRegEx = /^\s*(?from|import)\s+([^.]+)/;

This should capture only absolute imports and only the top-level package which is all we would be interested in. #WontFix

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would miss something like
import sklearn.foo

And it would not handle the case where an import was in a string or a comment.


In reply to: 267956790 [](ancestors = 267956790)

@brettcannon brettcannon Mar 21, 2019

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right because I forgot to strip out the trailing \s. 😄

As for missing imports in a string or comment, why would we want that? If the imports aren't being used then why would we care about that case? And that should be rare enough of a case that it wouldn't skew learning what the most popular public packages used are. #WontFix

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant the opposite, your string would capture ones in a string I believe. Oh maybe it wouldn't. Let me try it.


In reply to: 267960402 [](ancestors = 267960402)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it does work with strings.

But makes it more difficult to parse 'from x import b' statements. I have to second parse to get rid of the 'import' value. I think I'll just leave the one I have there.


In reply to: 267961343 [](ancestors = 267961343,267960402)

const MAX_DOCUMENT_LINES = 1000;

// Capture isTestExecution on module load so that a test can turn it off and still
// have this value set.
const testExecution = isTestExecution();

@injectable()
export class ImportTracker implements IImportTracker {

private pendingDocs = new Map<string, NodeJS.Timer>();
private sentMatches: Set<string> = new Set<string>();
// tslint:disable-next-line:no-require-imports
private hashFn = require('hash.js').sha256;

@brettcannon brettcannon Mar 21, 2019

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you aren't using HMAC? Either way you need to add a salt to the hash. #Pending

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would the salt be stored? Seems like just obfuscation.


In reply to: 267956915 [](ancestors = 267956915)

@brettcannon brettcannon Mar 21, 2019

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use HMAC you can initialize it with the salt. #Pending

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that if we deem it necessary to add another layer of obfuscation. We're internally going to generate a table of known hashes, so generating the table with the salt in them shouldn't be any more work.


In reply to: 267961221 [](ancestors = 267961221)

Expand All @@ -44,21 +50,51 @@ export class ImportTracker implements IImportTracker {
}

public async activate(): Promise<void> {
// Act like all of our open documents just opened. Don't do this now though. We don't want
// to hold up the activate.
await sleep(1000);
// Act like all of our open documents just opened. Our timeout will make sure this is delayed
this.documentManager.textDocuments.forEach(d => this.onOpenedOrSavedDocument(d));
}

private getDocumentLines(document: TextDocument) : string [] {
return Array.apply(null, {length: Math.min(document.lineCount, MAX_DOCUMENT_LINES)}).map((a, i) => {
const line = document.lineAt(i);
if (line && !line.isEmptyOrWhitespace) {
return line.text;
}
return undefined;
}).filter((f: string | undefined) => f);
}

private onOpenedOrSavedDocument(document: TextDocument) {
// Make sure this is a python file.
if (path.extname(document.fileName) === '.py') {
// Parse the contents of the document, looking for import matches on each line
const lines = document.getText().splitLines({ trim: true, removeEmptyEntries: true });
this.lookForImports(lines.slice(0, Math.min(lines.length, MAX_DOCUMENT_LINES)), EventName.KNOWN_IMPORT_FROM_FILE);
this.scheduleDocument(document);
}
}

private scheduleDocument(document: TextDocument) {
// If already scheduled, cancel.
if (this.pendingDocs.has(document.fileName)) {
clearTimeout(this.pendingDocs.get(document.fileName));
this.pendingDocs.delete(document.fileName);
}

// Now schedule a new one.
if (testExecution) {
// During a test, check right away. It needs to be synchronous.
this.checkDocument(document);
} else {
// Wait five seconds to make sure we don't already have this document pending.
this.pendingDocs.set(document.fileName, setTimeout(() => this.checkDocument(document), 5000));
}
}

private checkDocument(document: TextDocument) {
this.pendingDocs.delete(document.fileName);
const lines = this.getDocumentLines(document);
this.lookForImports(lines, EventName.KNOWN_IMPORT_FROM_FILE);
}

private onExecutedCode(code: string) {
const lines = code.splitLines({ trim: true, removeEmptyEntries: true });
this.lookForImports(lines, EventName.KNOWN_IMPORT_FROM_EXECUTION);
Expand Down
1 change: 1 addition & 0 deletions src/test/datascience/editor-integration/helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ export function createDocument(inputText: string, fileName: string, fileVersion:
const testRange = new Range(index, 0, index, line.length);
textLine.setup(l => l.text).returns(() => line);
textLine.setup(l => l.range).returns(() => testRange);
textLine.setup(l => l.isEmptyOrWhitespace).returns(() => line.trim().length === 0);
return textLine;
});
document.setup(d => d.lineAt(TypeMoq.It.isAnyNumber())).returns((index: number) => textLines[index].object);
Expand Down