Skip to content

Analyzer

ScanPyImports.analyzer

A module for processing and analyzing data on imported modules.

Classes:

  • Data

    Construct the base DataFrame of the import statement data.

  • DataAnalyzer

    Processe and analyze the data on imported modules.

Classes

Data(path)

Construct the base DataFrame of the import statement data.

Parameters:

  • path (str) –

    Path to the directory.

Source code in ScanPyImports/analyzer.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def __init__(self, path: str) -> None:
    """
    Initialize a Data object with the given path.

    Args:
        path: Path to the directory.
    """
    self.path = path
    self.directory = Directory(path)

    if not self.directory.exists:
        raise ValueError("The provided directory path does not exist. Try another path!")

    self._df: Optional[pd.DataFrame] = None

    # Documentation
    self.path: str
    """Path to the directory."""
    self.directory: Directory 
    """An instance of [Directory][ScanPyImports.scan.Directory] initiated with the given path."""


    super().__init__()
Attributes
path: str instance-attribute

Path to the directory.

directory: Directory instance-attribute

An instance of Directory initiated with the given path.

df: Optional[pd.DataFrame] property

DataFrame containing import data or None if the directory does not exist.

DataFrame content

The DataFrame contains the following columns:

  • imported_0, imported_1, ... representing the imported packages and modules.
    • where imported_0 represents the top-level package or library being imported.
    • imported_1 represents the module or submodule within the package being imported.
    • and so on imported_2 represent further nested modules, submodules if present in the import statement.
  • original: The original line of text containing the import statement.
  • alias: Alias (if any) of the submodule.
  • path: Full path of the file containing the import.
  • file: File name.
  • filename: File name without extension.
  • extension: File extension.
  • directory: Directory path of the file.

The data creation takes place in this private method. One could modify this code to retreive a dictionary or a JSON file instead of a DataFrame.

Functions

DataAnalyzer(path, to_exclude=None)

Bases: Data

A class to process the data on imported modules.

Parameters:

  • path (str) –

    Path to the directory.

  • to_exclude (List[str], default: None ) –

    List of packages' names to exclude from the analysis.

Methods:

Source code in ScanPyImports/analyzer.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def __init__(self, path: str, to_exclude : List[str] = None) -> None:
    """Initiate DataAnalyzer

    Args:
        path: Path to the directory.
        to_exclude: List of packages' names to exclude from the analysis.

    methods: 
        get_frequencies: Return frequency of imported modules.

    """        
    super().__init__(path)
    self._clean_df = None
    self._own_processed_df = None
    self.to_exclude= to_exclude if to_exclude else []

    # Documentation
    self.to_exclude: List[str]
    """List of packages' names to exclude from the analysis.""" 
Attributes
path: str instance-attribute

Path to the directory.

directory: Directory instance-attribute

An instance of Directory initiated with the given path.

df: Optional[pd.DataFrame] property

DataFrame containing import data or None if the directory does not exist.

DataFrame content

The DataFrame contains the following columns:

  • imported_0, imported_1, ... representing the imported packages and modules.
    • where imported_0 represents the top-level package or library being imported.
    • imported_1 represents the module or submodule within the package being imported.
    • and so on imported_2 represent further nested modules, submodules if present in the import statement.
  • original: The original line of text containing the import statement.
  • alias: Alias (if any) of the submodule.
  • path: Full path of the file containing the import.
  • file: File name.
  • filename: File name without extension.
  • extension: File extension.
  • directory: Directory path of the file.

The data creation takes place in this private method. One could modify this code to retreive a dictionary or a JSON file instead of a DataFrame.

to_exclude: List[str] instance-attribute

List of packages' names to exclude from the analysis.

clean_df: pd.DataFrame property

A cleaned copy of df after conducting some minor changes.

own_processed_df: pd.DataFrame property

A copy of the DataFrame (df) after processing own-created modules.

Own-created modules

Own-created modules are defined as Python scripts that are imported as modules and reside in the same folder as the script containing the import statement.

A natural extension would be to also include own-created packages residing in the same folder as the .py or .ipynb file where the import statment resides.

In the returned DataFrame, own-created modules are dropped and replaced by the import statements residing inside the own-created module script, provided they relate to external libraries.

Functions
get_frequencies(exclude=True, process_own_modules=True)

Get the frequency of imported modules.

Parameters:

  • exclude (bool, default: True ) –

    Whether to exclude the packages listed in to_exclude.

  • process_own_modules (bool, default: True ) –

    Whether to process own-created modules.

Returns:

  • Series

    pd.Series: Series of import frequencies sorted in descending order.

Source code in ScanPyImports/analyzer.py
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
def get_frequencies(self, exclude: bool = True, process_own_modules: bool = True) -> pd.Series:
    """
    Get the frequency of imported modules.

    Args:
        exclude: Whether to exclude the packages listed in [to_exclude][ScanPyImports.analyzer.DataAnalyzer.to_exclude].
        process_own_modules: Whether to process own-created modules.

    Returns:
        pd.Series: Series of import frequencies sorted in descending order.
    """
    df = self.own_processed_df if process_own_modules else self.clean_df
    count_series = (df.groupby('imported_0')
                    .size()
                    .sort_values(ascending=False))

    count_series.index.name = 'Imports'

    if exclude:
        count_series = count_series[~count_series.index.isin(self.to_exclude)]

    return count_series