Writing documentation#
Getting started#
General file structure#
All documentation is built from the docs/source. The docs/source
directory contains configuration files for Sphinx and reStructuredText
(ReST; .rst) files that are rendered to documentation pages.
Documentation is created in three ways. First, API documentation
(docs/source/api) is created by Sphinx from
the docstrings of the classes in the Teacher library. All docs/source/api are created
when the documentation is built. See Writing docstrings below.
Second, Teacher has narrative docs written in ReST in subdirectories of
docs/source/users/. If you would like to add new documentation that is suited
to an .rst file rather than a gallery or tutorial example, choose an
appropriate subdirectory to put it in, and add the file to the table of
contents of index.rst of the subdirectory. See
Writing ReST pages below.
Note
Don’t directly edit the .rst files in docs/source/api.
Sphinx regenerates these files in these directories when building documentation.
Setting up the doc build#
The documentation for Teacher is generated from reStructuredText (ReST) using the Sphinx documentation generation tool.
To build the documentation you will need to set up Teacher for development.
Building the docs#
The documentation sources are found in the docs/source/ directory in the trunk.
The configuration file for Sphinx is docs/source/conf.py. It controls which
directories Sphinx parses, how the docs are built, and how the extensions are
used. To build the documentation in html format, cd into docs/ and run:
make html
Other useful invocations include
# Delete built files. May help if you get errors about missing paths or
# broken links.
make clean
# Build pdf docs.
make latexpdf
Showing locally built docs#
The built docs are available in the folder docs/build/html.
Writing ReST pages#
Most documentation is either in the docstrings of individual
classes and methods, in explicit .rst files, or in examples and tutorials.
All of these use the ReST syntax and are processed by Sphinx.
The Sphinx reStructuredText Primer is a good introduction into using ReST. More complete information is available in the reStructuredText reference documentation.
This section contains additional information and conventions how ReST is used in the Teacher documentation.
Formatting and style conventions#
It is useful to strive for consistency in the Teacher documentation. Here are some formatting and style conventions that are used.
Section formatting#
For everything but top-level chapters, use Upper lower for
section titles, e.g., Possible hangups rather than Possible
Hangups
We aim to follow the recommendations from the Python documentation and the Sphinx reStructuredText documentation for section markup characters, i.e.:
#with overline, for parts. This is reserved for the main title inindex.rst. All other pages should start with “chapter” or lower.*with overline, for chapters=, for sections-, for subsections^, for subsubsections", for paragraphs
This may not yet be applied consistently in existing docs. Please open an issue to notify any inconsistencies.
Function arguments#
Function arguments and keywords within docstrings should be referred to using
the *emphasis* role. This will keep Teacher’s documentation consistent
with Python’s documentation:
Here is a description of *argument*
Do not use the `default role`:
Do not describe `argument` like this. As per the next section,
this syntax will (unsuccessfully) attempt to resolve the argument as a
link to a class or method in the library.
nor the ``literal`` role:
Do not describe ``argument`` like this.
Referring to other documents and sections#
Sphinx allows internal references between documents.
Documents can be linked with the :doc: directive:
See the :doc:`/users/installing/index`
will render as:
See the Installation
Sections can also be given reference names. For instance from the Installation link:
.. _install_from_source:
======================
Installing from source
======================
If you are interested in contributing to Teacher development,
running the latest source code, or just like to build everything
yourself, it is not difficult to build Teacher from source.
and refer to it using the standard reference syntax:
See :ref:`install_from_source`
will give the following link: Installing from source
To maximize internal consistency in section labeling and references,
use hyphen separated, descriptive labels for section references.
Keep in mind that contents may be reorganized later, so
avoid top level names in references like user or devel
unless necessary
In addition, since underscores are widely used by Sphinx itself, use hyphens to separate words.
Referring to other code#
To link to other methods, classes, or modules in Teacher you can use back ticks, for example:
`teacher.fuzzy.FuzzySet`
generates a link like this: teacher.fuzzy.FuzzySet.
Note: We use the sphinx setting default_role = 'obj' so that you don’t
have to use qualifiers like :class:, :func:, :meth: and the likes.
Often, you don’t want to show the full package and module name. As long as the target is unambiguous you can simply leave them out:
`.FuzzySet`
and the link still works: .FuzzySet.
Other packages can also be linked via intersphinx:
`numpy.mean`
will return this link: numpy.mean. This works for Python, Numpy, Scipy,
and Pandas (full list is in doc/conf.py). If external linking fails,
you can check the full list of referenceable objects with the following
commands:
python -m sphinx.ext.intersphinx 'https://docs.python.org/3/objects.inv'
python -m sphinx.ext.intersphinx 'https://numpy.org/doc/stable/objects.inv'
python -m sphinx.ext.intersphinx 'https://docs.scipy.org/doc/scipy/objects.inv'
python -m sphinx.ext.intersphinx 'https://pandas.pydata.org/pandas-docs/stable/objects.inv'
Including files#
Files can be included verbatim. For instance the LICENSE file is included
at License agreement using
.. literalinclude:: ../../../../LICENSE
Writing docstrings#
Most of the API documentation is written in docstrings. These are comment blocks in source code that explain how the code works.
Note
Some parts of the documentation do not yet conform to the current documentation style. If in doubt, follow the rules given here and not what you may see in the source code. Pull requests updating docstrings to the current style are very welcome.
All new or edited docstrings should conform to the numpydoc docstring guide.
Much of the ReST syntax discussed above (Writing ReST pages) can be
used for links and references. These docstrings eventually populate the
docs/source/api directory and form the reference documentation for the
library.
Example docstring#
An example docstring looks like:
def generate_dataset(df, columns, class_name, discrete, name):
"""Generate the dataset suitable for LORE usage
Parameters
----------
df : pandas.core.frame.DataFrame
Pandas DataFrame with the original data to prepare
columns : list
List of the columns used in the dataset
class_name : str
Name of the class column
discrete : list
List with all the columns to be considered to have discrete values
name : str
Name of the dataset
Returns
-------
dataset : dict
Dataset as a dictionary with the following elements:
name : Name of the dataset
df : Pandas DataFrame with the original data
columns : list of the columns of the DataFrame
class_name : name of the class variable
possible_outcomes : list with all the values of the class column
type_features : dict with all the variables grouped by type
features_type : dict with the type of each feature
discrete : list with all the columns to be considered to have discrete values
continuous : list with all the columns to be considered to have continuous values
idx_features : dict with the column name of each column once arranged in a NumPy array
label_encoder : label encoder for the discrete values
X : NumPy array with all the columns except for the class
y : NumPy array with the class column
"""
See the ~.datasets documentation for how this renders.
The Sphinx website also contains plenty of documentation concerning ReST markup and working with Sphinx in general.
Formatting conventions#
The basic docstring conventions are covered in the numpydoc docstring guide and the Sphinx documentation. Some Teacher-specific formatting conventions to keep in mind:
Quote positions#
The quotes for single line docstrings are on the same line (pydocstyle D200):
def _compare_rules_FID3(factual, counter_rule):
"""Compare two rules according to the `FID3` algorithm"""
The quotes for multi-line docstrings are on separate lines (pydocstyle D213):
def i_counterfactual(instance, rule_list, class_val, df_numerical_columns):
"""Returns a list that contains the counterfactual with respect to the instance
for each of the different class values not predicted, as explained in [ref]
[...]
"""
Function arguments#
Function arguments and keywords within docstrings should be referred to
using the *emphasis* role. This will keep Teacher’s documentation
consistent with Python’s documentation:
If *linestyles* is *None*, the default is 'solid'.
Do not use the `default role` or the ``literal`` role:
Neither `argument` nor ``argument`` should be used.
Quotes for strings#
Teacher does not have a convention whether to use single-quotes or double-quotes. There is a mixture of both in the current code.
Use simple single or double quotes when giving string values, e.g.
'entropy' uses fuzzy entropy to compute the fuzzy sets.
No ``'extra'`` literal quotes.
The use of extra literal quotes around the text is discouraged. While they slightly improve the rendered docs, they are cumbersome to type and difficult to read in plain-text docs.
Parameter type descriptions#
The main goal for parameter type descriptions is to be readable and understandable by humans. If the possible types are too complex use a simplification for the type description and explain the type more precisely in the text.
Generally, the numpydoc docstring guide conventions apply. The following rules expand on them where the numpydoc conventions are not specific.
Use float for a type that can be any number.
Use (float, float) to describe a 2D position. The parentheses should be
included to make the tuple-ness more obvious.
Use array-like for homogeneous numeric sequences, which could
typically be a numpy.array. Dimensionality may be specified using 2D,
3D, n-dimensional. If you need to have variables denoting the
sizes of the dimensions, use capital letters in brackets
((M, N) array-like). When referring to them in the text they are easier
read and no special formatting is needed. Use array instead of
array-like for return types if the returned object is indeed a numpy array.
float is the implicit default dtype for array-likes. For other dtypes
use array-like of int.
Some possible uses:
2D array-like
(N,) array-like
(M, N) array-like
(M, N, 3) array-like
array-like of int
Non-numeric homogeneous sequences are described as lists, e.g.:
list of str
list of `.Rule`
Referencing types#
Generally, the rules from referring-to-other-code apply. More specifically:
Use full references `~teacher.fuzzy.FuzzySet` with an
abbreviation tilde in parameter types. While the full name helps the
reader of plain text docstrings, the HTML does not need to show the full
name as it links to it. Hence, the ~-shortening keeps it more readable.
Use abbreviated links `.FuzzySet` in the text.
norm : `~teacher.fuzzy.FuzzySet`, optional
A `.FuzzySet` is used to represent a membership function in a range of the discourse universe
Default values#
As opposed to the numpydoc guide, parameters need not be marked as optional if they have a simple default:
use
{name} : {type}, default: {val}when possible.use
{name} : {type}, optionaland describe the default in the text if it cannot be explained sufficiently in the recommended manner.
The default value should provide semantic information targeted at a human reader. In simple cases, it restates the value in the function signature. If applicable, units should be added.
Prefer:
interval : int, default: 1000ms
over:
interval : int, default: 1000
If None is only used as a sentinel value for “parameter not specified”, do not document it as the default. Depending on the context, give the actual default, or mark the parameter as optional if not specifying has no particular effect.
Inheriting docstrings#
If a subclass overrides a method but does not change the semantics, we can reuse the parent docstring for the method of the child class. Python does this automatically, if the subclass method does not have a docstring.
Use a plain comment # docstring inherited to denote the intention to reuse
the parent docstring. That way we do not accidentally create a docstring in
the future:
class A:
def foo():
"""The parent docstring."""
pass
class B(A):
def foo():
# docstring inherited
pass