Documenting your Python project. Part 1 - Code documentation


Introduction

Documentation is often the most overlooked aspect of software development, yet it’s crucial for maintainability, collaboration, and long-term project success. In this series, we’ll explore different aspects of documenting Python projects, starting with the foundation: code documentation.

Good code documentation serves multiple purposes:

  • Onboarding: New developers can understand your codebase quickly
  • Maintenance: Future you (or your team) can modify code without breaking existing functionality
  • API clarity: Users of your code know exactly what to expect
  • Quality assurance: Documentation often reveals design flaws or inconsistencies

1. The Motivation Behind Code Documentation

Why Document Code?

The primary goal of code documentation is to explain why something is implemented the way it is, not just what it does. The code itself should be self-explanatory for the “what” part.

Consider this example:

# Bad: Commenting the obvious
def calculate_total(items):
    total = 0
    for item in items:  # Loop through each item
        total += item.price  # Add price to total
    return total  # Return the total

# Good: Explaining the business logic
def calculate_total(items):
    """Calculate total with tax-exempt status for educational materials."""
    total = 0
    for item in items:
        # Educational materials are tax-exempt in our jurisdiction
        if item.category == 'educational':
            total += item.price
        else:
            total += item.price * 1.08  # 8% sales tax
    return total

The “Why” vs “What” Principle

The golden rule of commenting is: “Comment why you are doing it, not what you are doing.”

# Bad: Explaining what the code does
def process_user_data(user_data):
    # Convert string to lowercase
    user_data = user_data.lower()
    # Remove whitespace
    user_data = user_data.strip()
    # Replace spaces with underscores
    user_data = user_data.replace(' ', '_')
    return user_data

# Good: Explaining why we're doing it
def process_user_data(user_data):
    """Normalize user input for consistent database storage.
    
    Our database schema requires lowercase, underscore-separated values
    for the username field to ensure case-insensitive lookups work correctly.
    """
    user_data = user_data.lower()
    user_data = user_data.strip()
    user_data = user_data.replace(' ', '_')
    return user_data

2. What is a Docstring?

A docstring is a string literal that appears as the first statement in a module, function, class, or method definition. It serves as documentation for that object and is accessible at runtime through the __doc__ attribute.

def greet(name: str) -> str:
    """Return a personalized greeting message.
    
    This function creates a friendly greeting that can be used
    in user interfaces to make the application feel more personal.
    
    Args:
        name: The name of the person to greet
        
    Returns:
        A formatted greeting string
        
    Raises:
        ValueError: If name is empty or contains only whitespace
    """
    if not name or not name.strip():
        raise ValueError("Name cannot be empty")
    return f"Hello, {name}! Welcome to our application."

Accessing Docstrings

You can access docstrings programmatically:

# Get the docstring
print(greet.__doc__)

# Or use the help function
help(greet)

3. Docstring Formats in the Python World

Python has several established docstring formats, each with its own conventions and tooling support.

Google Style Guide

The Google style is clean, readable, and widely adopted:

def fetch_user_data(user_id: int, include_private: bool = False) -> dict:
    """Fetch user data from the database.
    
    Retrieves user information from the primary database. For security reasons,
    private fields are only included when explicitly requested.
    
    Args:
        user_id: The unique identifier of the user
        include_private: Whether to include private user data (default: False)
        
    Returns:
        A dictionary containing user information with the following keys:
            - id: User ID
            - name: Full name
            - email: Email address
            - created_at: Account creation timestamp
            - private_data: Dict of private fields (only if include_private=True)
            
    Raises:
        UserNotFoundError: If the user_id doesn't exist
        DatabaseConnectionError: If database connection fails
        
    Example:
        >>> user = fetch_user_data(123, include_private=True)
        >>> print(user['name'])
        'John Doe'
    """
    pass

NumPy Style Guide

NumPy style is popular in scientific computing and data science:

def calculate_statistics(data: list[float]) -> dict[str, float]:
    """
    Calculate basic statistical measures for a dataset.
    
    Parameters
    ----------
    data : list[float]
        Input data array. Must contain at least one element.
        
    Returns
    -------
    dict[str, float]
        Dictionary containing:
            - mean: Arithmetic mean of the data
            - median: Middle value when data is sorted
            - std: Standard deviation
            - min: Minimum value
            - max: Maximum value
            
    Raises
    ------
    ValueError
        If data is empty or contains non-numeric values
        
    See Also
    --------
    scipy.stats.describe : More comprehensive statistics
        
    Examples
    --------
    >>> data = [1, 2, 3, 4, 5]
    >>> stats = calculate_statistics(data)
    >>> print(stats['mean'])
    3.0
    """
    pass

reStructuredText (reST)

reStructuredText is the format used by Sphinx documentation:

def validate_email(email: str) -> bool:
    """Validate email address format.
    
    :param email: The email address to validate
    :type email: str
    :return: True if email is valid, False otherwise
    :rtype: bool
    :raises ValueError: If email parameter is not a string
    
    The validation checks for:
    
    * Basic email format (user@domain)
    * Valid characters in local and domain parts
    * Domain has at least one dot
    * No consecutive dots
    
    Example:
    
    .. code-block:: python
    
        >>> validate_email("user@example.com")
        True
        >>> validate_email("invalid-email")
        False
    """
    pass

Epytext

Epytext is used by Epydoc and is more concise:

def send_notification(message: str, recipients: list[str]) -> bool:
    """Send notification to multiple recipients.
    
    @param message: The notification message to send
    @type message: str
    @param recipients: List of recipient email addresses
    @type recipients: list[str]
    @return: True if all notifications were sent successfully
    @rtype: bool
    @raise ConnectionError: If the notification service is unavailable
    """
    pass

4. PyCharm Integration

PyCharm provides excellent support for docstring generation and validation:

Auto-generating Docstrings

  1. Place your cursor in a function or method
  2. Press Ctrl+Alt+D (Windows/Linux) or Cmd+Alt+D (Mac)
  3. Select your preferred format (Google, NumPy, reST, etc.)

PyCharm will generate a template based on your function signature:

def process_payment(amount: float, currency: str, payment_method: str) -> dict:
    """
    
    @param amount:
    @param currency:
    @param payment_method:
    @return:
    """
    pass

Live Templates

Create custom live templates for common docstring patterns:

  1. Go to Settings/PreferencesEditorLive Templates
  2. Create a new template for your team’s docstring format
  3. Use variables like $PARAM$, $RETURN$, etc.

Docstring Validation

PyCharm can validate your docstrings against your chosen format and highlight inconsistencies.

5. VSCode Integration

VSCode offers several extensions for Python docstring support:

Python Docstring Generator

Install the “Python Docstring Generator” extension:

  1. Ctrl+Shift+P → “Python Docstring Generator”
  2. Select format (Google, NumPy, Sphinx, etc.)
  3. Auto-generate docstrings

AutoDocstring

The “AutoDocstring” extension provides:

  • Hotkey generation: Ctrl+Shift+2 (Windows/Linux) or Cmd+Shift+2 (Mac)
  • Multiple formats: Google, NumPy, Sphinx, Epytext
  • Smart parameter detection: Automatically detects types from type hints

Configuration

Add to your VSCode settings:

{
    "autoDocstring.docstringFormat": "google",
    "autoDocstring.startOnNewLine": true,
    "autoDocstring.includeExtendedSummary": true
}

6. Best Practices

Consistency

Choose one format and stick to it throughout your project. Mixing formats creates confusion and inconsistency.

Keep It Updated

Docstrings should evolve with your code. Outdated documentation is worse than no documentation.

Use Type Hints

Type hints complement docstrings and make your code more self-documenting:

from typing import Optional, List, Dict, Any

def create_user(
    username: str,
    email: str,
    age: Optional[int] = None,
    preferences: Dict[str, Any] = None
) -> Dict[str, Any]:
    """Create a new user account.
    
    Args:
        username: Unique username for the account
        email: Valid email address
        age: User's age (optional)
        preferences: User preferences dictionary (optional)
        
    Returns:
        Dictionary containing the created user data
    """
    pass

Document Exceptions

Always document what exceptions your functions can raise:

def divide_numbers(a: float, b: float) -> float:
    """Divide two numbers.
    
    Args:
        a: Dividend
        b: Divisor
        
    Returns:
        Result of division
        
    Raises:
        ZeroDivisionError: If b is zero
        TypeError: If arguments are not numbers
    """
    if not isinstance(a, (int, float)) or not isinstance(b, (int, float)):
        raise TypeError("Arguments must be numbers")
    if b == 0:
        raise ZeroDivisionError("Cannot divide by zero")
    return a / b

Code Example

Here’s a complete example showing good documentation practices:

import logging
from typing import Optional, Dict, Any
from html import H1, H2, H3, H4, H5, H6

log = logging.getLogger(__name__)

class ReportGenerator:
    """Generate HTML reports from structured data.
    
    This class provides functionality to create formatted HTML reports
    with headings, paragraphs, and tables. It's designed to work with
    data from various sources and can generate reports in different formats.
    
    Attributes:
        document: List containing HTML elements of the report
        title: The main title of the report
        author: The author of the report
    """
    
    def __init__(self, title: str, author: str = "Unknown"):
        """Initialize the report generator.
        
        Args:
            title: The main title for the report
            author: The author name (default: "Unknown")
        """
        self.document = []
        self.title = title
        self.author = author
        log.info(f"Initialized report '{title}' by {author}")
    
    def add_heading(self, text: str, level: int, **kwargs) -> None:
        """Adds heading to the report document.

        Heading is represented as HTML H1, H2, H3, H4, H5 or H6 tag.
        Heading text is defined by 'text' argument and level is defined by 'level'.
        When not supported level is provided the H1 tag is used.

        Args:
            text: Text of the heading
            level: Level of the heading. Integer from range 1 to 6
            **kwargs: Additional HTML attributes for the heading tag

        Raises:
            ValueError: If level is not between 1 and 6
        """
        if not 1 <= level <= 6:
            raise ValueError("Heading level must be between 1 and 6")
            
        # Try to find appropriate HTML tag, if not found use H1
        _header = getattr(html, f"H{level}", html.H1)
        log.debug(f"Heading{level}: '{text}' added.")
        self.document.append(_header(text))
        return None
    
    def generate_html(self) -> str:
        """Generate the complete HTML document.
        
        Returns:
            Complete HTML document as a string with proper DOCTYPE,
            head section, and body containing all added elements.
        """
        html_content = "\n".join(str(element) for element in self.document)
        return f"""
<!DOCTYPE html>
<html>
<head>
    <title>{self.title}</title>
    <meta charset="utf-8">
</head>
<body>
    {html_content}
    <footer>Generated by {self.author}</footer>
</body>
</html>
        """.strip()

Conclusion

Good code documentation is an investment in your project’s future. It makes your code more maintainable, helps new team members onboard faster, and serves as a contract for your API users.

Remember:

  • Document why, not what
  • Choose a consistent format and stick to it
  • Keep documentation updated with code changes
  • Use type hints to complement docstrings
  • Leverage IDE tools for efficient documentation generation

In the next part of this series, we’ll explore architectural documentation and how to document your project’s overall structure and design decisions.

Resources