Python: rework documentation#5357
Conversation
Initilally a copy of the one from C#
python queries.
|
Hello @github/docs-content-codeql - this PR is ready for docs review. |
|
Does this need a change-note? It is not really tied to the code, right? |
|
Hello @github/docs-content-codeql - this PR is ready for docs review. |
shati-patel
left a comment
There was a problem hiding this comment.
Thanks for this update! A couple of small suggestions, but otherwise LGTM. It's great to see more consistency across the languages 😃
Does this need a change-note? It is not really tied to the code, right?
You're right, the changes you've made here don't need a change note! I'm assuming there was a change note when you added the new libraries themselves?
I am not sure if we are ready to merge this PR as is or if we would rather build a tower of PRs on top, adding the remaining planned sections. But I open it up for review now, so people (especially from the doc team) can get involved.
From my perspective, it's fine to merge as-is. Assuming the information/examples are accurate (I defer to the Python team for that 😅), this looks like a good change in itself. We can always expand on the details and add more sections later!
| .. code-block:: ql | ||
|
|
||
| import csharp | ||
| import semmle.code.csharp.dataflow.flowsources.PublicCallableParameter |
There was a problem hiding this comment.
| import semmle.code.csharp.dataflow.flowsources.PublicCallableParameter | |
| import semmle.code.python.dataflow.flowsources.PublicCallableParameter |
I'm assuming the rest of the import path is correct? Or should it be semmle.python.dataflow.new.RemoteFlowSources?
There was a problem hiding this comment.
Yes, it should be the latter, thank you for catching all these non-runnable examples!
Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>
tausbn
left a comment
There was a problem hiding this comment.
Generally this looks good to me, but I feel like we should be emphasising LocalSourceNode more as the right abstraction to use. If we can get people started on using these early, then type trackers can be presented as using LocalSourceNode everywhere (and hence reap the corresponding performance benefits).
| from DataFlow::CallCfgNode call, DataFlow::ParameterNode p | ||
| where | ||
| call = API::moduleImport("os").getMember("open").getACall() and | ||
| DataFlow::localFlow(p, call.getArg(0)) |
There was a problem hiding this comment.
Hot take: we should be able to use flowsTo here, and hence ParameterNode should in fact extend LocalSourceNode.
There was a problem hiding this comment.
That would make a lot of sense, morally all parameters are local sources. I am not sure if some technical SSA/CfgNode discrepancy has to be overcome first..
There was a problem hiding this comment.
Judging by this, all ParameterNodes are in fact LocalSourceNodes already.
| from DataFlow::CallCfgNode call, DataFlow::ParameterNode p | ||
| where | ||
| call = API::moduleImport("os").getMember("open").getACall() and | ||
| TaintTracking::localTaint(p, call.getArg(0)) |
There was a problem hiding this comment.
In light of my comment in the data-flow version of this example, I wonder if we should have taintFlowsTo as a method on LocalSourceNode and perhaps discourage the use of localFlow/localTaint entirely.
|
The desirable api enhancements identified here are in #5398. We will have to wait until that is merged and live until we update the documentation to refer to them. I have added a section calling attention to local source nodes. I hope this addresses the concern for now. |
also remove references to `parameterNode` which is not available yet.
shati-patel
left a comment
There was a problem hiding this comment.
A few more small edits, but the docs look good overall 🥇
Co-authored-by: Shati Patel <42641846+shati-patel@users.noreply.github.com>
RasmusWL
left a comment
There was a problem hiding this comment.
Looks really good 💪 A few minor things, then it's ready to be merged from my point of view 🥇
Personally, I don't feel strongly about flowsTo vs DataFlow::localFlow (or the taint version of this). I suggest we don't postpone getting the PR merged because of this, and can solve this in a follow up discussion next week.
| Many of the built-in path queries included in CodeQL follow a simple structure, which depends on how the language you are analyzing is modeled with CodeQL. | ||
|
|
||
| For C/C++, C#, Java, and JavaScript you should use the following template: | ||
| You should use the following template: |
There was a problem hiding this comment.
Really nice that we're able to do this now 💪
Only problem for python is that you import python is not enough, and you need to do import semmle.python.dataflow.new.DataFlow as well. I looked through other languages:
- for C#/JS/Go: Works by only
import <language> - Python/C++/Java: Need to import
DataFlowlibrary manually
So I don't think it's breaking, and I'll make a follow up PR to try and improve on this situation 👍
|
Sorry to keep updating this, but I was reminded by https://github.com/github/codeql-python-team/issues/319 that we needed to describe attributes and saw a natural place to do it. I think now that the remaining description of type trackers are best left to a separate PR, so this is the final tweak. 🤞 |
| - ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process. | ||
| - ``Concepts::FileSystemAccess`` - a data flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on. | ||
| - ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths. | ||
| - ``Concepts::Decoding`` - a data-flow node that decodes data from a binary or textual format. A decoding (automatically) preserves taint from input to output. However, it can also be a problem in itself, for example if it allows code execution or could result in denial-of-service. | ||
| - ``Concepts::Encoding`` - a data-flow node that encodes data to a binary or textual format. An encoding (automatically) preserves taint from input to output. | ||
| - ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code. | ||
| - ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements. | ||
| - ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server. | ||
| - ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates a HTTP response on a server. |
There was a problem hiding this comment.
I think it's great to mention Concepts 👍
Do you want to build an exhaustive list here? (meaning we need to update it every time we add a new concept) otherwise we could give a few examples, and say that more can be found in the file 🤷
(I'm leaning towards it being a pain point to list them all manually, but interested to see how you feel about it)
There was a problem hiding this comment.
I am unsure, actually. When I saw that the list was not very long, I just took them all, as I felt it could be useful to have such an index somewhere, and this is probably one of the first places people would look.
…on.rst Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
|
@xcorail would really like to open up Python to the bug bounty programme this week so could this be merged asap. Am I right in thinking that once this has been merged, the documentation is in a good enough state again? |
This and #5389, I would say. |
…on.rst Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
We're going to postpone flowsTo vs DataFlow::localFlow discussion, so we can get this merged now 👍
This PR is a step towards the structure suggested in https://github.com/github/codeql-python-team/issues/319. It
Remaining sections can then be added later. The section on data flow is an adaptation from the one for C# with exercises removed. It has running examples and should be accurate if not comprehensive. For example, it only introduces
LocalSourceNodevery briefly (but does illustrate the concept with an example).I am not sure if we are ready to merge this PR as is or if we would rather build a tower of PRs on top, adding the remaining planned sections. But I open it up for review now, so people (especially from the doc team) can get involved.