Writing documentation#

Getting started#

General file structure#

All documentation is built from the docs/source. The docs/source directory contains configuration files for Sphinx and reStructuredText (ReST; .rst) files that are rendered to documentation pages.

Documentation is created in three ways. First, API documentation (docs/source/api) is created by Sphinx from the docstrings of the classes in the Teacher library. All docs/source/api are created when the documentation is built. See Writing docstrings below.

Second, Teacher has narrative docs written in ReST in subdirectories of docs/source/users/. If you would like to add new documentation that is suited to an .rst file rather than a gallery or tutorial example, choose an appropriate subdirectory to put it in, and add the file to the table of contents of index.rst of the subdirectory. See Writing ReST pages below.

Note

Don’t directly edit the .rst files in docs/source/api. Sphinx regenerates these files in these directories when building documentation.

Setting up the doc build#

The documentation for Teacher is generated from reStructuredText (ReST) using the Sphinx documentation generation tool.

To build the documentation you will need to set up Teacher for development.

Building the docs#

The documentation sources are found in the docs/source/ directory in the trunk. The configuration file for Sphinx is docs/source/conf.py. It controls which directories Sphinx parses, how the docs are built, and how the extensions are used. To build the documentation in html format, cd into docs/ and run:

make html

Other useful invocations include

# Delete built files.  May help if you get errors about missing paths or
# broken links.
make clean

# Build pdf docs.
make latexpdf

Showing locally built docs#

The built docs are available in the folder docs/build/html.

Writing ReST pages#

Most documentation is either in the docstrings of individual classes and methods, in explicit .rst files, or in examples and tutorials. All of these use the ReST syntax and are processed by Sphinx.

The Sphinx reStructuredText Primer is a good introduction into using ReST. More complete information is available in the reStructuredText reference documentation.

This section contains additional information and conventions how ReST is used in the Teacher documentation.

Formatting and style conventions#

It is useful to strive for consistency in the Teacher documentation. Here are some formatting and style conventions that are used.

Section formatting#

For everything but top-level chapters, use Upper lower for section titles, e.g., Possible hangups rather than Possible Hangups

We aim to follow the recommendations from the Python documentation and the Sphinx reStructuredText documentation for section markup characters, i.e.:

  • # with overline, for parts. This is reserved for the main title in index.rst. All other pages should start with “chapter” or lower.

  • * with overline, for chapters

  • =, for sections

  • -, for subsections

  • ^, for subsubsections

  • ", for paragraphs

This may not yet be applied consistently in existing docs. Please open an issue to notify any inconsistencies.

Function arguments#

Function arguments and keywords within docstrings should be referred to using the *emphasis* role. This will keep Teacher’s documentation consistent with Python’s documentation:

Here is a description of *argument*

Do not use the `default role`:

Do not describe `argument` like this.  As per the next section,
this syntax will (unsuccessfully) attempt to resolve the argument as a
link to a class or method in the library.

nor the ``literal`` role:

Do not describe ``argument`` like this.

Referring to other documents and sections#

Sphinx allows internal references between documents.

Documents can be linked with the :doc: directive:

See the :doc:`/users/installing/index`

will render as:

See the Installation

Sections can also be given reference names. For instance from the Installation link:

.. _install_from_source:

======================
Installing from source
======================

If you are interested in contributing to Teacher development,
running the latest source code, or just like to build everything
yourself, it is not difficult to build Teacher from source.

and refer to it using the standard reference syntax:

See :ref:`install_from_source`

will give the following link: Installing from source

To maximize internal consistency in section labeling and references, use hyphen separated, descriptive labels for section references. Keep in mind that contents may be reorganized later, so avoid top level names in references like user or devel unless necessary

In addition, since underscores are widely used by Sphinx itself, use hyphens to separate words.

Referring to other code#

To link to other methods, classes, or modules in Teacher you can use back ticks, for example:

`teacher.fuzzy.FuzzySet`

generates a link like this: teacher.fuzzy.FuzzySet.

Note: We use the sphinx setting default_role = 'obj' so that you don’t have to use qualifiers like :class:, :func:, :meth: and the likes.

Often, you don’t want to show the full package and module name. As long as the target is unambiguous you can simply leave them out:

`.FuzzySet`

and the link still works: .FuzzySet.

Other packages can also be linked via intersphinx:

`numpy.mean`

will return this link: numpy.mean. This works for Python, Numpy, Scipy, and Pandas (full list is in doc/conf.py). If external linking fails, you can check the full list of referenceable objects with the following commands:

python -m sphinx.ext.intersphinx 'https://docs.python.org/3/objects.inv'
python -m sphinx.ext.intersphinx 'https://numpy.org/doc/stable/objects.inv'
python -m sphinx.ext.intersphinx 'https://docs.scipy.org/doc/scipy/objects.inv'
python -m sphinx.ext.intersphinx 'https://pandas.pydata.org/pandas-docs/stable/objects.inv'

Including files#

Files can be included verbatim. For instance the LICENSE file is included at License agreement using

.. literalinclude:: ../../../../LICENSE

Writing docstrings#

Most of the API documentation is written in docstrings. These are comment blocks in source code that explain how the code works.

Note

Some parts of the documentation do not yet conform to the current documentation style. If in doubt, follow the rules given here and not what you may see in the source code. Pull requests updating docstrings to the current style are very welcome.

All new or edited docstrings should conform to the numpydoc docstring guide. Much of the ReST syntax discussed above (Writing ReST pages) can be used for links and references. These docstrings eventually populate the docs/source/api directory and form the reference documentation for the library.

Example docstring#

An example docstring looks like:

def generate_dataset(df, columns, class_name, discrete, name):
    """Generate the dataset suitable for LORE usage

    Parameters
    ----------
    df : pandas.core.frame.DataFrame
        Pandas DataFrame with the original data to prepare
    columns : list
        List of the columns used in the dataset
    class_name : str
        Name of the class column
    discrete : list
        List with all the columns to be considered to have discrete values
    name : str
        Name of the dataset

    Returns
    -------
    dataset : dict
        Dataset as a dictionary with the following elements:
            name : Name of the dataset
            df : Pandas DataFrame with the original data
            columns : list of the columns of the DataFrame
            class_name : name of the class variable
            possible_outcomes : list with all the values of the class column
            type_features : dict with all the variables grouped by type
            features_type : dict with the type of each feature
            discrete : list with all the columns to be considered to have discrete values
            continuous : list with all the columns to be considered to have continuous values
            idx_features : dict with the column name of each column once arranged in a NumPy array
            label_encoder : label encoder for the discrete values
            X : NumPy array with all the columns except for the class
            y : NumPy array with the class column
    """

See the ~.datasets documentation for how this renders.

The Sphinx website also contains plenty of documentation concerning ReST markup and working with Sphinx in general.

Formatting conventions#

The basic docstring conventions are covered in the numpydoc docstring guide and the Sphinx documentation. Some Teacher-specific formatting conventions to keep in mind:

Quote positions#

The quotes for single line docstrings are on the same line (pydocstyle D200):

def _compare_rules_FID3(factual, counter_rule):
    """Compare two rules according to the `FID3` algorithm"""

The quotes for multi-line docstrings are on separate lines (pydocstyle D213):

def i_counterfactual(instance, rule_list, class_val, df_numerical_columns):
    """Returns a list that contains the counterfactual with respect to the instance
    for each of the different class values not predicted, as explained in [ref]

    [...]
    """

Function arguments#

Function arguments and keywords within docstrings should be referred to using the *emphasis* role. This will keep Teacher’s documentation consistent with Python’s documentation:

If *linestyles* is *None*, the default is 'solid'.

Do not use the `default role` or the ``literal`` role:

Neither `argument` nor ``argument`` should be used.

Quotes for strings#

Teacher does not have a convention whether to use single-quotes or double-quotes. There is a mixture of both in the current code.

Use simple single or double quotes when giving string values, e.g.

'entropy' uses fuzzy entropy to compute the fuzzy sets.

No ``'extra'`` literal quotes.

The use of extra literal quotes around the text is discouraged. While they slightly improve the rendered docs, they are cumbersome to type and difficult to read in plain-text docs.

Parameter type descriptions#

The main goal for parameter type descriptions is to be readable and understandable by humans. If the possible types are too complex use a simplification for the type description and explain the type more precisely in the text.

Generally, the numpydoc docstring guide conventions apply. The following rules expand on them where the numpydoc conventions are not specific.

Use float for a type that can be any number.

Use (float, float) to describe a 2D position. The parentheses should be included to make the tuple-ness more obvious.

Use array-like for homogeneous numeric sequences, which could typically be a numpy.array. Dimensionality may be specified using 2D, 3D, n-dimensional. If you need to have variables denoting the sizes of the dimensions, use capital letters in brackets ((M, N) array-like). When referring to them in the text they are easier read and no special formatting is needed. Use array instead of array-like for return types if the returned object is indeed a numpy array.

float is the implicit default dtype for array-likes. For other dtypes use array-like of int.

Some possible uses:

2D array-like
(N,) array-like
(M, N) array-like
(M, N, 3) array-like
array-like of int

Non-numeric homogeneous sequences are described as lists, e.g.:

list of str
list of `.Rule`

Referencing types#

Generally, the rules from referring-to-other-code apply. More specifically:

Use full references `~teacher.fuzzy.FuzzySet` with an abbreviation tilde in parameter types. While the full name helps the reader of plain text docstrings, the HTML does not need to show the full name as it links to it. Hence, the ~-shortening keeps it more readable.

Use abbreviated links `.FuzzySet` in the text.

norm : `~teacher.fuzzy.FuzzySet`, optional
     A `.FuzzySet` is used to represent a membership function in a range of the discourse universe

Default values#

As opposed to the numpydoc guide, parameters need not be marked as optional if they have a simple default:

  • use {name} : {type}, default: {val} when possible.

  • use {name} : {type}, optional and describe the default in the text if it cannot be explained sufficiently in the recommended manner.

The default value should provide semantic information targeted at a human reader. In simple cases, it restates the value in the function signature. If applicable, units should be added.

Prefer:
    interval : int, default: 1000ms
over:
    interval : int, default: 1000

If None is only used as a sentinel value for “parameter not specified”, do not document it as the default. Depending on the context, give the actual default, or mark the parameter as optional if not specifying has no particular effect.

Inheriting docstrings#

If a subclass overrides a method but does not change the semantics, we can reuse the parent docstring for the method of the child class. Python does this automatically, if the subclass method does not have a docstring.

Use a plain comment # docstring inherited to denote the intention to reuse the parent docstring. That way we do not accidentally create a docstring in the future:

class A:
    def foo():
        """The parent docstring."""
        pass

class B(A):
    def foo():
        # docstring inherited
        pass