benford package¶
benford.benford module¶

class
benford.benford.
Base
(data, decimals, sign='all', sec_order=False)[source]¶ Bases:
pandas.core.frame.DataFrame
Internalizes and prepares the data for Analysis.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.`
Raises: TypeError
– if not receiving int or float as input.

class
benford.benford.
Test
(base, digs, confidence, limit_N=None, sec_order=False)[source]¶ Bases:
pandas.core.frame.DataFrame
Transforms the original number sequence into a DataFrame reduced by the ocurrences of the chosen digits, creating other computed columns
Parameters:  base – The Base object with the data prepared for Analysis
 digs – Tells which test to perform: 1: first digit; 2: first two digits; 3: furst three digits; 22: second digit; 2: last two digits.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.

N
¶ Number of records in the sample to consider in computations

ddf
¶ Degrees of Freedom to look up for the critical chisquare value

chi_square
¶ Chisquare statistic for the given test

KS
¶ KolmogorovSmirnov statistic for the given test

MAD
¶ Mean Absolute Deviation for the given test

confidence
¶ Confidence level to consider when setting some critical values

digs
¶ numerical representation of the test at hand. 1: F1D; 2: F2D; 3: F3D; 22: SD; 2: L2D.
Type: int

sec_order
¶ True if the test is a Second Order one
Type: bool

update_confidence
(new_conf, check=True)[source]¶ Sets a new confidence level for the Benford object, so as to be used to produce critical values for the tests
Parameters:  new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics.
 check – checks the value provided for the confidence. Defaults to True

critical_values
¶ a dictionary with the critical values for the test at hand, according to the current confidence level.
Type: dict

show_plot
(save_plot=None, save_plot_kwargs=None)[source]¶ Draws the test plot.
Parameters:  save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

report
(high_Z='pos', show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Handles the report especific to the test, considering its statistics and according to the current confidence level.
Parameters:  high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the critical value or not.
 show_plot – calls the show_plot method, to draw the test plot
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class
benford.benford.
Summ
(base, test)[source]¶ Bases:
pandas.core.frame.DataFrame
Gets the base object and outputs a Summation test object
Parameters:  base – The Base object with the data prepared for Analysis
 test – The test for which to compute the summation

MAD
= None¶ Mean Absolute Deviation for the test

confidence
= None¶ Confidence level to consider when setting some critical values

show_plot
(save_plot=None, save_plot_kwargs=None)[source]¶ Draws the Summation test plot
Parameters:  save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

report
(high_diff=None, show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Gives the report on the Summation test.
Parameters:  high_diff – Number of records to show after ordering by the absolute differences between the found and the expected proportions
 show_plot – calls the show_plot method, to draw the Summation test plot
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class
benford.benford.
Mantissas
(data, confidence=95, limit_N=None)[source]¶ Bases:
object
Computes and holds the mantissas of the logarithms of the records
Parameters:  data – sequence to compute mantissas from. numpy 1D array, pandas Series of pandas DataFrame column.
 confidence – confidence level for computing the critical values to compare with some statistics

data
= None¶ pandas DataFrame with the mantissas
Type: (DataFrame)

stats
¶

update_confidence
(new_conf, check=True)[source]¶ Sets a new confidence level for the Benford object, so as to be used to produce critical values for the tests
Parameters:  new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics.
 check – checks the value provided for the confidence. Defaults to True

report
(show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Displays the Mantissas test stats.
Parameters:  show_plot – shows the Ordered Mantissas plot and the Arc Test plot. Defaults to True.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

show_plot
(figsize=(12, 6), save_plot=None, save_plot_kwargs=None)[source]¶ Plots the ordered mantissas and a line with the expected inclination.
Parameters:  figsize (tuple) – figure size dimensions
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

arc_test
(grid=True, figsize=12, save_plot=None, save_plot_kwargs=None)[source]¶ Adds two columns to Mantissas’s DataFrame equal to their “X” and “Y” coordinates, plots its to a scatter plot and calculates the gravity center of the circle.
Parameters:  grid – show grid of the plot. Defaluts to True.
 figsize (int) – size of the figure to be displayed. Since it is a square, there is no need to provide a tuple, like is usually the case with matplotlib.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class
benford.benford.
Benford
(data, decimals=2, sign='all', confidence=95, mantissas=True, sec_order=False, summation=False, limit_N=None, verbose=True)[source]¶ Bases:
object
Initializes a Benford Analysis object and computes the proportions for the digits. The tets dataFrames are atributes, i.e., obj.F1D is the First Digit DataFrame, the obj.F2D,the First Two Digits one, and so one, F3D for First Three Digits, SD for Second Digit and L2D for Last Two Digits.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a tuple with a pandas DataFrame and the name (str) of the chosen column. Values must be integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to 95.
 mantissas (bool) – opts for also running the mantissas Test. Defaulst to True
 sec_order – runs the Second Order tests, which are the Benford’s tests performed on the differences between the ordered sample (a value minus the one before it, and so on). If the original series is Benford compliant, this new sequence should aldo follow Beford. The Second Order can also be called separately, through the method sec_order().
 summation – creates the Summation DataFrames for the First, First Two, and First Three Digits. The summation tests can also be called separately, through the method summation().
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 verbose – gives some information about the data and the registries used and discarded for each test.

data
¶ the raw data provided for the analysis

chosen
¶ the column of the DataFrame to be analysed or the data itself

sign
¶ which number sign(s) to include in the analysis
Type: str

confidence
¶ current confidence level

limit_N
¶ sample size to use in computations
Type: int

verbose
¶ verbose or not
Type: bool

base
¶ the Base, preprocessed object

tests
¶ keeps track of the tests the instance has
Type: list
ofstr

update_confidence
(new_conf, tests=None)[source]¶ Sets (a) new confidence level(s) for the Benford object, so as to be used to produce critical values for the tests.
Parameters:  new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics.
 tests (
list
ofstr
) – list of tests names (strings) to have their confidence updated. If only one, provide a oneelement list, like [‘F1D’]. Defauts to None, in which case it will use the instance .test list attribute.
Raises: ValueError
– if the test argument is not a list or None.

all_confidences
¶ a dictionary with a confidence level for each computed tests, when applicable.
Type: dict

mantissas
()[source]¶ Adds a Mantissas object to the tests, with all its statistics and plotting capabilities.

sec_order
()[source]¶ Runs the Second Order tests, which are the Benford’s tests performed on the differences between the ordered sample (a value minus the one before it, and so on). If the original series is Benford compliant, this new sequence should aldo follow Beford. The Second Order can also be called separately, through the method sec_order().

class
benford.benford.
Source
(data, decimals=2, sign='all', sec_order=False, verbose=True, inform=None)[source]¶ Bases:
pandas.core.frame.DataFrame
Prepares the data for Analysis. pandas DataFrame subclass.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
 sec_order – choice for the Second Order Test, which cumputes the differences between the ordered entries before running the Tests.
 verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True.
Raises: ValueError
– if the sign arg is not in [‘all’, ‘pos’, ‘neg’]TypeError
– if not receiving int or float as input.

verbose
= None¶ verbose or not
Type: (bool)

mantissas
(report=True, show_plot=True, figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶ Calculates the mantissas, their mean and variance, and compares them with the mean and variance of a Benford’s sequence.
Parameters:  report – prints the mamtissas mean, variance, skewness and kurtosis for the sequence studied, along with reference values.
 show_plot – plots the ordered mantissas and a line with the expected inclination. Defaults to True.
 figsize – tuple that sets the figure dimensions.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

first_digits
(digs, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, ret_df=False)[source]¶ Performs the Benford First Digits test with the series of numbers provided, and populates the mapping dict for future selection of the original series.
Parameters:  digs (int) – number of first digits to consider. Must be 1 (first digit), 2 (first two digits) or 3 (first three digits).
 verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
 bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
 kl_diverg (bool) – calculates the KulbackLaibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
 show_plot (bool) – draws the test plot. Defaults to True.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
 ret_df – returns the test DataFrame. Defaults to False. True if run by the test function.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences

second_digit
(confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, ret_df=False)[source]¶ Performs the Benford Second Digit test with the series of numbers provided.
Parameters:  verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
 bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
 kl_diverg (bool) – calculates the KulbackLaibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
 ret_df – returns the test DataFrame. Defaults to False. True if run by the test function.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences

last_two_digits
(confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, ret_df=False)[source]¶ Performs the Benford Last Two Digits test with the series of numbers provided.
Parameters:  verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
 bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
 kl_diverg (bool) – calculates the KulbackLaibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences

summation
(digs=2, top=20, show_plot=True, save_plot=None, save_plot_kwargs=None, ret_df=False)[source]¶ Performs the Summation test. In a Benford series, the sums of the entries begining with the same digits tends to be the same.
Parameters:  digs – tells the first digits to use. 1 first; 2 first two; 3 first three. Defaults to 2.
 top – choses how many top values to show. Defaults to 20.
 show_plot – plots the results. Defaults to True.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Expected and Found proportions, and their
absolute differences

duplicates
(top_Rep=20, inform=None)[source]¶ Performs a duplicates test and maps the duplicates count in descending order.
Parameters:  verbose (bool) – tells how many duplicated entries were found and prints the top numbers according to the top_Rep argument. Defaluts to True.
 top_Rep – int or None. Chooses how many duplicated entries will be shown withe the top repititions. Defaluts to 20. If None, returns al the ordered repetitions.
Returns:  DataFrame with the duplicated records and their occurrence counts,
in descending order (if verbose is False; if True, prints to terminal).
Raises: ValueError
– if the top_Rep arg is not int or None.

class
benford.benford.
Roll_mad
(data, test, window, decimals=2, sign='all')[source]¶ Bases:
object
Applies the MAD to sequential subsets of the Series, returning another Series.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and 2: Last Two Digits.
 window – size of the subset to be used.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.

test
= None¶ the test (F1D, SD, F2D…) used for the MAD calculation and critical values

show_plot
(figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶ Shows the rolling MAD plot
Parameters:  figsize – the figure dimensions.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

class
benford.benford.
Roll_mse
(data, test, window, decimals=2, sign='all')[source]¶ Bases:
object
Applies the MSE to sequential subsets of the Series, returning another Series.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and 2: Last Two Digits.
 window – size of the subset to be used. decimals: number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.

show_plot
(figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶ Shows the rolling MSE plot
Parameters:  figsize – the figure dimensions.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

benford.benford.
first_digits
(data, digs, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Performs the Benford First Digits test on the series of numbers provided.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
 digs (int) – number of first digits to consider. Must be 1 (first digit), 2 (first two digits) or 3 (first three digits).
 verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
 KS – calculates the KolmogorovSmirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences if the confidence is not None.

benford.benford.
second_digit
(data, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Performs the Benford Second Digits test on the series of numbers provided.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
 verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
 KS – calculates the KolmogorovSmirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences if the confidence is not None.

benford.benford.
last_two_digits
(data, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Performs the Last Two Digits test on the series of numbers provided.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column,with values being integers or floats.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
 verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
 KS – calculates the KolmogorovSmirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Expected and Found proportions, and the Z scores of
the differences if the confidence is not None.

benford.benford.
mantissas
(data, report=True, show_plot=True, arc_test=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Extraxts the mantissas of the records logarithms
Parameters:  data – sequence to compute mantissas from, numpy 1D array, pandas Series of pandas DataFrame column.
 report – prints the mamtissas mean, variance, skewness and kurtosis for the sequence studied, along with reference values.
 show_plot – plots the ordered mantissas and a line with the expected inclination. Defaults to True.
 arc_test – draws the Arc Test plot. Defaluts to True.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns: Series with the data mantissas.

benford.benford.
summation
(data, digs=2, decimals=2, sign='all', top=20, verbose=True, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Performs the Summation test. In a Benford series, the sums of the entries begining with the same digits tends to be the same. Works only with the First Digits (1, 2 or 3) test.
Parameters:  digs – tells the first digits to use: 1 first; 2 first two; 3 first three. Defaults to 2.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 top – choses how many top values to show. Defaults to 20.
 show_plot – plots the results. Defaults to True.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame with the Summation test, whether sorted in descending order
(if verbose == True) or not.

benford.benford.
mad
(data, test, decimals=2, sign='all', verbose=False)[source]¶ Calculates the Mean Absolute Deviation of the Series
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – informs which base test to use for the mad.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns: the Mean Absolute Deviation of the Series
Return type: float

benford.benford.
mse
(data, test, decimals=2, sign='all', verbose=False)[source]¶ Calculates the Mean Squared Error of the Series
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – informs which base test to use for the mad.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns: the Mean Squared Error of the Series
Return type: float

benford.benford.
bhattacharyya_distance
(data, test, decimals, sign='all', verbose=False)[source]¶ Computes the Bhattacharyya Distance between the Found and the Expected (Benford) digits distributions, according toe the test chosen (First, Second, First Two…)
Parameters:  data (ndarray, Series) – sequence to be evaluated, with values being integers or floats.
 test (int, str) – informs which base test to be used.
 decimals (int) – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign (str, optional) – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to “all”.
Returns: the Bhattacharyya Distance between the distributions
Return type: float

benford.benford.
kullback_leibler_divergence
(data, test, decimals, sign='all', verbose=False)[source]¶ Computes the KulbackLeibler Divergence between the Found and the Expected (Benford) digits distributions, according toe the test chosen (First, Second, First Two…).
Parameters:  data (ndarray, Series) – sequence to be evaluated, with values being integers or floats.
 test (int, str) – informs which base test to be used.
 decimals (int) – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign (str, optional) – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to “all”.
Returns: the KulbackLeibler Divergence between the distributions
Return type: float

benford.benford.
mad_summ
(data, test, decimals=2, sign='all', verbose=False)[source]¶ Calculate the Mean Absolute Deviation of the Summation Test
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – informs which base test to use for the summation mad.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns: the Mean Absolute Deviation of the Summation Test
Return type: float

benford.benford.
rolling_mad
(data, test, window, decimals=2, sign='all', show_plot=False, save_plot=None, save_plot_kwargs=None)[source]¶ Applies the MAD to sequential subsets of the records.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and 2: Last Two Digits.
 window – size of the subset to be used.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns: Series with sequentially computed MADs.

benford.benford.
rolling_mse
(data, test, window, decimals=2, sign='all', show_plot=False, save_plot=None, save_plot_kwargs=None)[source]¶ Applies the MSE to sequential subsets of the records.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and 2: Last Two Digits.
 window – size of the subset to be used.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns: Series with sequentially computed MSEs.

benford.benford.
duplicates
(data, top_Rep=20, verbose=True, inform=None)[source]¶ Performs a duplicates test and maps the duplicates count in descending order.
Parameters:  data – sequence to take the duplicates from. pandas Series or numpy Ndarray.
 verbose (bool) – tells how many duplicated entries were found and prints the top numbers according to the top_Rep argument. Defaluts to True.
 top_Rep – chooses how many duplicated entries will be shown withe the top repititions. int or None. Defaluts to 20. If None, returns al the ordered repetitions.
Returns: DataFrame with the duplicated records and their respective counts
Raises: ValueError
– if the top_Rep arg is not int or None.

benford.benford.
second_order
(data, test, decimals=2, sign='all', verbose=True, MAD=False, confidence=None, high_Z='pos', limit_N=None, MSE=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶ Performs the chosen test after subtracting the ordered sequence by itself. Hence Second Order.
Parameters:  data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
 test – the test to be performed  1 or ‘F1D’: First Digit; 2 or ‘F2D’: First Two Digits; 3 or ‘F3D’: First three Digits; 22 or ‘SD’: Second Digits; 2 or ‘L2D’: Last Two Digits.
 decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to infer, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
 sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
 verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
 MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
 confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
 high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
 limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
 MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
 chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
 KS – calculates the KolmogorovSmirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
 show_plot (bool) – draws the test plot.
 save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
Returns:  DataFrame of the test chosen, but applied on Second Order pre
processed data.
benford.expected module¶

class
benford.expected.
First
(digs, plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Bases:
pandas.core.frame.DataFrame
Holds the expected probabilities of the First, First Two, or First Three digits according to Benford’s distribution.
Parameters:  digs – 1, 2 or 3  tells which of the first digits to consider: 1 for the First Digit, 2 for the First Two Digits and 3 for the First Three Digits.
 plot – option to plot a bar chart of the Expected proportions. Defaults to True.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class
benford.expected.
Second
(plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Bases:
pandas.core.frame.DataFrame
Holds the expected probabilities of the Second Digits according to Benford’s distribution.
Parameters:  plot – option to plot a bar chart of the Expected proportions. Defaults to True.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class
benford.expected.
LastTwo
(num=False, plot=True, save_plot=None, save_plot_kwargs=None)[source]¶ Bases:
pandas.core.frame.DataFrame
Holds the expected probabilities of the Last Two Digits according to Benford’s distribution.
Parameters:  plot – option to plot a bar chart of the Expected proportions. Defaults to True.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
benford.stats module¶

benford.stats.
Z_score
(frame, N)[source]¶ Computes the Z statistics for the proportions studied
Parameters:  frame – DataFrame with the expected proportions and the already calculated Absolute Diferences between the found and expeccted proportions
 N – sample size
Returns: Series of computed Z scores

benford.stats.
chi_sq
(frame, ddf, confidence, verbose=True)[source]¶ Comnputes the chisquare statistic of the found distributions and compares it with the critical chisquare of such a sample, according to the confidence level chosen and the degrees of freedom  len(sample) 1.
Parameters:  frame – DataFrame with Found, Expected and their difference columns.
 ddf – Degrees of freedom to consider.
 confidence – Confidence level to look up critical value.
 verbose – prints the chisqure result and compares to the critical chisquare for the sample. Defaults to True.
Returns:  The computed Chi square statistic and the critical chi square
(according) to the degrees of freedom and confidence level, for comparison. None if confidence is None

benford.stats.
chi_sq_2
(frame)[source]¶ Computes the chisquare statistic of the found distributions
Parameters: frame – DataFrame with Found, Expected and their difference columns. Returns: The computed Chi square statistic

benford.stats.
kolmogorov_smirnov
(frame, confidence, N, verbose=True)[source]¶ Computes the KolmogorovSmirnov test of the found distributions and compares it with the critical chisquare of such a sample, according to the confidence level chosen.
Parameters:  frame – DataFrame with Foud and Expected distributions.
 confidence – Confidence level to look up critical value.
 N – Sample size
 verbose – prints the KS result and the critical value for the sample. Defaults to True.
Returns:  The Suprem, which is the greatest absolute difference between the
Found and the expected proportions, and the KolmogorovSmirnov critical value according to the confidence level, for ccomparison

benford.stats.
kolmogorov_smirnov_2
(frame)[source]¶ Computes the KolmogorovSmirnov test of the found distributions
Parameters: frame – DataFrame with Foud and Expected distributions. Returns:  The Suprem, which is the greatest absolute difference between the
 Found end th expected proportions

benford.stats.
mad
(frame, test, verbose=True)[source]¶ Computes the Mean Absolute Deviation (MAD) between the found and the expected proportions.
Parameters:  frame – DataFrame with the Absolute Deviations already calculated.
 test – Test to compute the MAD from (F1D, SD, F2D…)
 verbose – prints the MAD result and compares to limit values of conformity. Defaults to True.
Returns:  The Mean of the Absolute Deviations between the found and expected
proportions.

benford.stats.
mse
(frame, verbose=True)[source]¶ Computes the test’s Mean Square Error
Parameters:  frame – DataFrame with the already computed Absolute Deviations between the found and expected proportions
 verbose – Prints the MSE. Defaults to True.
Returns: Mean of the squared differences between the found and the expected proportions.
benford.viz module¶

benford.viz.
plot_expected
(df, digs, save_plot=None, save_plot_kwargs=None)[source]¶ Plots the Expected Benford Distributions
Parameters:  df – DataFrame with the Expected Proportions
 digs – Test’s digit
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_digs
(df, x, y_Exp, y_Found, N, figsize, conf_Z, text_x=False, save_plot=None, save_plot_kwargs=None)[source]¶ Plots the digits tests results
Parameters:  df – DataFrame with the data to be plotted
 x – sequence to be used in the x axis
 y_Exp – sequence of the expected proportions to be used in the y axis (line)
 y_Found – sequence of the found proportions to be used in the y axis (bars)
 N – lenght of sequence, to be used when plotting the confidence levels
 figsize – tuple to state the size of the plot figure
 conf_Z – Confidence level
 save_pic – file path to save figure
 text_x – Forces to show all x ticks labels. Defaluts to True.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_sum
(df, figsize, li, text_x=False, save_plot=None, save_plot_kwargs=None)[source]¶ Plots the summation test results
Parameters:  df – DataFrame with the data to be plotted
 figsize – sets the dimensions of the plot figure
 li – value with which to draw the horizontal line
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_ordered_mantissas
(col, figsize=(12, 12), save_plot=None, save_plot_kwargs=None)[source]¶  Plots the ordered mantissas and compares them to the expected, straight
 line that should be formed in a Benfordcmpliant set.
Parameters:  col (Series) – column of mantissas to plot.
 figsize (tuple) – sets the dimensions of the plot figure.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_mantissa_arc_test
(df, gravity_center, grid=True, figsize=12, save_plot=None, save_plot_kwargs=None)[source]¶ Draws thee Mantissa Arc Test after computing X and Y circular coordinates for every mantissa and the center of gravity for the set
Parameters:  df (DataFrame) – pandas DataFrame with the mantissas and the X and Y coordinates.
 gravity_center (tuple) – coordinates for plottling the gravity center
 grid (bool) – show grid. Defaults to True.
 figsize (int) – figure dimensions. No need to be a tuple, since the figure is a square.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_roll_mse
(roll_series, figsize, save_plot=None, save_plot_kwargs=None)[source]¶ Shows the rolling MSE plot
Parameters:  roll_series – pd.Series resultant form rolling mse.
 figsize – the figure dimensions.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.
plot_roll_mad
(roll_mad, figsize, save_plot=None, save_plot_kwargs=None)[source]¶ Shows the rolling MAD plot
Parameters:  roll_mad – pd.Series resultant form rolling mad.
 figsize – the figure dimensions.
 save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
 save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html