Analyzer

`ScanPyImports.analyzer` ¶

A module for processing and analyzing data on imported modules.

Classes:

Data –

Construct the base DataFrame of the import statement data.
DataAnalyzer –

Processe and analyze the data on imported modules.

Classes¶

`Data(path)` ¶

Construct the base DataFrame of the import statement data.

Parameters:

path (str) –

Path to the directory.

Source code in ScanPyImports/analyzer.py

def __init__(self, path: str) -> None:
    """
    Initialize a Data object with the given path.

    Args:
        path: Path to the directory.
    """
    self.path = path
    self.directory = Directory(path)

    if not self.directory.exists:
        raise ValueError("The provided directory path does not exist. Try another path!")

    self._df: Optional[pd.DataFrame] = None

    # Documentation
    self.path: str
    """Path to the directory."""
    self.directory: Directory 
    """An instance of [Directory][ScanPyImports.scan.Directory] initiated with the given path."""


    super().__init__()

Attributes¶

`path: str` `instance-attribute` ¶

Path to the directory.

`directory: Directory` `instance-attribute` ¶

An instance of Directory initiated with the given path.

`df: Optional[pd.DataFrame]` `property` ¶

DataFrame containing import data or None if the directory does not exist.

DataFrame content

The DataFrame contains the following columns:

imported_0, imported_1, ... representing the imported packages and modules.
- where imported_0 represents the top-level package or library being imported.
- imported_1 represents the module or submodule within the package being imported.
- and so on imported_2 represent further nested modules, submodules if present in the import statement.
original: The original line of text containing the import statement.
alias: Alias (if any) of the submodule.
path: Full path of the file containing the import.
file: File name.
filename: File name without extension.
extension: File extension.
directory: Directory path of the file.

The data creation takes place in this private method. One could modify this code to retreive a dictionary or a JSON file instead of a DataFrame.

Functions¶

`DataAnalyzer(path, to_exclude=None)` ¶

Bases: Data

A class to process the data on imported modules.

Parameters:

path (str) –

Path to the directory.
to_exclude (List[str], default: None ) –

List of packages' names to exclude from the analysis.

Methods:

get_frequencies –

Return frequency of imported modules.

Source code in ScanPyImports/analyzer.py

def __init__(self, path: str, to_exclude : List[str] = None) -> None:
    """Initiate DataAnalyzer

    Args:
        path: Path to the directory.
        to_exclude: List of packages' names to exclude from the analysis.

    methods: 
        get_frequencies: Return frequency of imported modules.

    """        
    super().__init__(path)
    self._clean_df = None
    self._own_processed_df = None
    self.to_exclude= to_exclude if to_exclude else []

    # Documentation
    self.to_exclude: List[str]
    """List of packages' names to exclude from the analysis.""" 

Attributes¶

`path: str` `instance-attribute` ¶

Path to the directory.

`directory: Directory` `instance-attribute` ¶

An instance of Directory initiated with the given path.

`df: Optional[pd.DataFrame]` `property` ¶

DataFrame containing import data or None if the directory does not exist.

DataFrame content

The DataFrame contains the following columns:

imported_0, imported_1, ... representing the imported packages and modules.
- where imported_0 represents the top-level package or library being imported.
- imported_1 represents the module or submodule within the package being imported.
- and so on imported_2 represent further nested modules, submodules if present in the import statement.
original: The original line of text containing the import statement.
alias: Alias (if any) of the submodule.
path: Full path of the file containing the import.
file: File name.
filename: File name without extension.
extension: File extension.
directory: Directory path of the file.

The data creation takes place in this private method. One could modify this code to retreive a dictionary or a JSON file instead of a DataFrame.

`to_exclude: List[str]` `instance-attribute` ¶

List of packages' names to exclude from the analysis.

`clean_df: pd.DataFrame` `property` ¶

A cleaned copy of df after conducting some minor changes.

`own_processed_df: pd.DataFrame` `property` ¶

A copy of the DataFrame (df) after processing own-created modules.

Own-created modules

Own-created modules are defined as Python scripts that are imported as modules and reside in the same folder as the script containing the import statement.

A natural extension would be to also include own-created packages residing in the same folder as the .py or .ipynb file where the import statment resides.

In the returned DataFrame, own-created modules are dropped and replaced by the import statements residing inside the own-created module script, provided they relate to external libraries.

Functions¶

`get_frequencies(exclude=True, process_own_modules=True)` ¶

Get the frequency of imported modules.

Parameters:

exclude (bool, default: True ) –

Whether to exclude the packages listed in to_exclude.
process_own_modules (bool, default: True ) –

Whether to process own-created modules.

Returns:

Series –

pd.Series: Series of import frequencies sorted in descending order.

Source code in ScanPyImports/analyzer.py

def get_frequencies(self, exclude: bool = True, process_own_modules: bool = True) -> pd.Series:
    """
    Get the frequency of imported modules.

    Args:
        exclude: Whether to exclude the packages listed in [to_exclude][ScanPyImports.analyzer.DataAnalyzer.to_exclude].
        process_own_modules: Whether to process own-created modules.

    Returns:
        pd.Series: Series of import frequencies sorted in descending order.
    """
    df = self.own_processed_df if process_own_modules else self.clean_df
    count_series = (df.groupby('imported_0')
                    .size()
                    .sort_values(ascending=False))

    count_series.index.name = 'Imports'

    if exclude:
        count_series = count_series[~count_series.index.isin(self.to_exclude)]

    return count_series

Analyzer

ScanPyImports.analyzer ¶

Classes¶

Data(path) ¶

Attributes¶

path: str instance-attribute ¶

directory: Directory instance-attribute ¶

df: Optional[pd.DataFrame] property ¶

Functions¶

DataAnalyzer(path, to_exclude=None) ¶

Attributes¶

path: str instance-attribute ¶

directory: Directory instance-attribute ¶

df: Optional[pd.DataFrame] property ¶

to_exclude: List[str] instance-attribute ¶

clean_df: pd.DataFrame property ¶

own_processed_df: pd.DataFrame property ¶

Functions¶

get_frequencies(exclude=True, process_own_modules=True) ¶

`ScanPyImports.analyzer` ¶

`Data(path)` ¶

`path: str` `instance-attribute` ¶

`directory: Directory` `instance-attribute` ¶

`df: Optional[pd.DataFrame]` `property` ¶

`DataAnalyzer(path, to_exclude=None)` ¶

`path: str` `instance-attribute` ¶

`directory: Directory` `instance-attribute` ¶

`df: Optional[pd.DataFrame]` `property` ¶

`to_exclude: List[str]` `instance-attribute` ¶

`clean_df: pd.DataFrame` `property` ¶

`own_processed_df: pd.DataFrame` `property` ¶

`get_frequencies(exclude=True, process_own_modules=True)` ¶