Skip to content

Python: rework documentation#5357

Merged
RasmusWL merged 18 commits into
github:mainfrom
yoff:python-rework-documentation
Mar 17, 2021
Merged

Python: rework documentation#5357
RasmusWL merged 18 commits into
github:mainfrom
yoff:python-rework-documentation

Conversation

@yoff
Copy link
Copy Markdown
Contributor

@yoff yoff commented Mar 8, 2021

This PR is a step towards the structure suggested in https://github.com/github/codeql-python-team/issues/319. It

  • rearranges sections
  • removes the unused sections
  • adds a new section on data flow
  • removes all references to the old data-flow implementation from surrounding documents

Remaining sections can then be added later. The section on data flow is an adaptation from the one for C# with exercises removed. It has running examples and should be accurate if not comprehensive. For example, it only introduces LocalSourceNode very briefly (but does illustrate the concept with an example).

I am not sure if we are ready to merge this PR as is or if we would rather build a tower of PRs on top, adding the remaining planned sections. But I open it up for review now, so people (especially from the doc team) can get involved.

yoff added 2 commits March 7, 2021 14:02
@github-actions github-actions Bot added the Python label Mar 9, 2021
@yoff yoff marked this pull request as ready for review March 10, 2021 15:48
@yoff yoff requested a review from a team as a code owner March 10, 2021 15:48
@yoff yoff added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Mar 10, 2021
@github-actions
Copy link
Copy Markdown
Contributor

Hello @github/docs-content-codeql - this PR is ready for docs review.

@yoff yoff added the no-change-note-required This PR does not need a change note label Mar 10, 2021
@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Mar 10, 2021

Does this need a change-note? It is not really tied to the code, right?

@adityasharad adityasharad added ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. and removed ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. labels Mar 10, 2021
@github-actions
Copy link
Copy Markdown
Contributor

Hello @github/docs-content-codeql - this PR is ready for docs review.

Copy link
Copy Markdown
Contributor

@shati-patel shati-patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this update! A couple of small suggestions, but otherwise LGTM. It's great to see more consistency across the languages 😃

Does this need a change-note? It is not really tied to the code, right?

You're right, the changes you've made here don't need a change note! I'm assuming there was a change note when you added the new libraries themselves?

I am not sure if we are ready to merge this PR as is or if we would rather build a tower of PRs on top, adding the remaining planned sections. But I open it up for review now, so people (especially from the doc team) can get involved.

From my perspective, it's fine to merge as-is. Assuming the information/examples are accurate (I defer to the Python team for that 😅), this looks like a good change in itself. We can always expand on the details and add more sections later!

Comment thread .vscode/settings.json Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
.. code-block:: ql

import csharp
import semmle.code.csharp.dataflow.flowsources.PublicCallableParameter
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import semmle.code.csharp.dataflow.flowsources.PublicCallableParameter
import semmle.code.python.dataflow.flowsources.PublicCallableParameter

I'm assuming the rest of the import path is correct? Or should it be semmle.python.dataflow.new.RemoteFlowSources?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be the latter, thank you for catching all these non-runnable examples!

Comment thread docs/codeql/codeql-language-guides/codeql-for-python.rst Outdated
yoff and others added 2 commits March 11, 2021 15:57
Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>
@yoff yoff requested a review from shati-patel March 11, 2021 14:59
shati-patel
shati-patel previously approved these changes Mar 11, 2021
Copy link
Copy Markdown
Contributor

@shati-patel shati-patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

tausbn
tausbn previously requested changes Mar 11, 2021
Copy link
Copy Markdown
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good to me, but I feel like we should be emphasising LocalSourceNode more as the right abstraction to use. If we can get people started on using these early, then type trackers can be presented as using LocalSourceNode everywhere (and hence reap the corresponding performance benefits).

from DataFlow::CallCfgNode call, DataFlow::ParameterNode p
where
call = API::moduleImport("os").getMember("open").getACall() and
DataFlow::localFlow(p, call.getArg(0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hot take: we should be able to use flowsTo here, and hence ParameterNode should in fact extend LocalSourceNode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make a lot of sense, morally all parameters are local sources. I am not sure if some technical SSA/CfgNode discrepancy has to be overcome first..

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging by this, all ParameterNodes are in fact LocalSourceNodes already.

from DataFlow::CallCfgNode call, DataFlow::ParameterNode p
where
call = API::moduleImport("os").getMember("open").getACall() and
TaintTracking::localTaint(p, call.getArg(0))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of my comment in the data-flow version of this example, I wonder if we should have taintFlowsTo as a method on LocalSourceNode and perhaps discourage the use of localFlow/localTaint entirely.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be good.

@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Mar 12, 2021

The desirable api enhancements identified here are in #5398. We will have to wait until that is merged and live until we update the documentation to refer to them. I have added a section calling attention to local source nodes. I hope this addresses the concern for now.

also remove references to `parameterNode` which is not available yet.
@yoff yoff requested review from shati-patel and tausbn March 13, 2021 07:16
Copy link
Copy Markdown
Contributor

@shati-patel shati-patel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small edits, but the docs look good overall 🥇

Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
yoff and others added 2 commits March 15, 2021 17:56
Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>
Copy link
Copy Markdown
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good 💪 A few minor things, then it's ready to be merged from my point of view 🥇

Personally, I don't feel strongly about flowsTo vs DataFlow::localFlow (or the taint version of this). I suggest we don't postpone getting the PR merged because of this, and can solve this in a follow up discussion next week.

Many of the built-in path queries included in CodeQL follow a simple structure, which depends on how the language you are analyzing is modeled with CodeQL.

For C/C++, C#, Java, and JavaScript you should use the following template:
You should use the following template:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice that we're able to do this now 💪

Only problem for python is that you import python is not enough, and you need to do import semmle.python.dataflow.new.DataFlow as well. I looked through other languages:

  • for C#/JS/Go: Works by only import <language>
  • Python/C++/Java: Need to import DataFlow library manually

So I don't think it's breaking, and I'll make a follow up PR to try and improve on this situation 👍

Comment thread docs/codeql/codeql-language-guides/codeql-library-for-python.rst
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Mar 16, 2021

Sorry to keep updating this, but I was reminded by https://github.com/github/codeql-python-team/issues/319 that we needed to describe attributes and saw a natural place to do it. I think now that the remaining description of type trackers are best left to a separate PR, so this is the final tweak. 🤞

Comment on lines +296 to +304
- ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process.
- ``Concepts::FileSystemAccess`` - a data flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on.
- ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths.
- ``Concepts::Decoding`` - a data-flow node that decodes data from a binary or textual format. A decoding (automatically) preserves taint from input to output. However, it can also be a problem in itself, for example if it allows code execution or could result in denial-of-service.
- ``Concepts::Encoding`` - a data-flow node that encodes data to a binary or textual format. An encoding (automatically) preserves taint from input to output.
- ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code.
- ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements.
- ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server.
- ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates a HTTP response on a server.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great to mention Concepts 👍

Do you want to build an exhaustive list here? (meaning we need to update it every time we add a new concept) otherwise we could give a few examples, and say that more can be found in the file 🤷

(I'm leaning towards it being a pain point to list them all manually, but interested to see how you feel about it)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure, actually. When I saw that the list was not very long, I just took them all, as I felt it could be useful to have such an index somewhere, and this is probably one of the first places people would look.

yoff and others added 2 commits March 16, 2021 19:58
…on.rst

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
@yoff yoff requested a review from RasmusWL March 16, 2021 23:10
@calumgrant
Copy link
Copy Markdown
Contributor

@xcorail would really like to open up Python to the bug bounty programme this week so could this be merged asap. Am I right in thinking that once this has been merged, the documentation is in a good enough state again?

Copy link
Copy Markdown
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

juuuust a few last things 😊

Comment thread docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst Outdated
Comment thread docs/codeql/codeql-language-guides/codeql-library-for-python.rst
@yoff
Copy link
Copy Markdown
Contributor Author

yoff commented Mar 17, 2021

Am I right in thinking that once this has been merged, the documentation is in a good enough state again?

This and #5389, I would say.

…on.rst

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
@yoff yoff requested a review from RasmusWL March 17, 2021 13:23
@RasmusWL RasmusWL dismissed tausbn’s stale review March 17, 2021 13:25

We're going to postpone flowsTo vs DataFlow::localFlow discussion, so we can get this merged now 👍

@RasmusWL RasmusWL merged commit 1ecee2d into github:main Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation no-change-note-required This PR does not need a change note Python ready-for-doc-review This PR requires and is ready for review from the GitHub docs team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants