benford package¶

benford.benford module¶

class benford.benford.Base(data, decimals, sign='all', sec_order=False)[source]¶

Bases: pandas.core.frame.DataFrame

Internalizes and prepares the data for Analysis.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.`

Raises:

TypeError – if not receiving int or float as input.

class benford.benford.Test(base, digs, confidence, limit_N=None, sec_order=False)[source]¶

Bases: pandas.core.frame.DataFrame

Transforms the original number sequence into a DataFrame reduced by the ocurrences of the chosen digits, creating other computed columns

Parameters:

base – The Base object with the data prepared for Analysis
digs – Tells which test to perform: 1: first digit; 2: first two digits; 3: furst three digits; 22: second digit; -2: last two digits.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.

N¶: Number of records in the sample to consider in computations

ddf¶: Degrees of Freedom to look up for the critical chi-square value

chi_square¶: Chi-square statistic for the given test

KS¶: Kolmogorov-Smirnov statistic for the given test

MAD¶: Mean Absolute Deviation for the given test

confidence¶: Confidence level to consider when setting some critical values

digs¶

numerical representation of the test at hand. 1: F1D; 2: F2D; 3: F3D; 22: SD; -2: L2D.

Type:	int

sec_order¶

True if the test is a Second Order one

Type:	bool

update_confidence(new_conf, check=True)[source]¶

Sets a new confidence level for the Benford object, so as to be used to produce critical values for the tests

Parameters:	new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. check – checks the value provided for the confidence. Defaults to True

critical_values¶

a dictionary with the critical values for the test at hand, according to the current confidence level.

Type:	dict

show_plot(save_plot=None, save_plot_kwargs=None)[source]¶

Draws the test plot.

Parameters:

save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

report(high_Z='pos', show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Handles the report especific to the test, considering its statistics and according to the current confidence level.

Parameters:

high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the critical value or not.
show_plot – calls the show_plot method, to draw the test plot
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class benford.benford.Summ(base, test)[source]¶

Bases: pandas.core.frame.DataFrame

Gets the base object and outputs a Summation test object

Parameters:	base – The Base object with the data prepared for Analysis test – The test for which to compute the summation

MAD = None¶: Mean Absolute Deviation for the test

confidence = None¶: Confidence level to consider when setting some critical values

show_plot(save_plot=None, save_plot_kwargs=None)[source]¶

Draws the Summation test plot

Parameters:

save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

report(high_diff=None, show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Gives the report on the Summation test.

Parameters:

high_diff – Number of records to show after ordering by the absolute differences between the found and the expected proportions
show_plot – calls the show_plot method, to draw the Summation test plot
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class benford.benford.Mantissas(data, confidence=95, limit_N=None)[source]¶

Bases: object

Computes and holds the mantissas of the logarithms of the records

Parameters:	data – sequence to compute mantissas from. numpy 1D array, pandas Series of pandas DataFrame column. confidence – confidence level for computing the critical values to compare with some statistics

data = None¶

pandas DataFrame with the mantissas

Type:	(DataFrame)

stats¶

update_confidence(new_conf, check=True)[source]¶

Sets a new confidence level for the Benford object, so as to be used to produce critical values for the tests

Parameters:	new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. check – checks the value provided for the confidence. Defaults to True

report(show_plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Displays the Mantissas test stats.

Parameters:

show_plot – shows the Ordered Mantissas plot and the Arc Test plot. Defaults to True.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

show_plot(figsize=(12, 6), save_plot=None, save_plot_kwargs=None)[source]¶

Plots the ordered mantissas and a line with the expected inclination.

Parameters:

figsize (tuple) – figure size dimensions
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

arc_test(grid=True, figsize=12, save_plot=None, save_plot_kwargs=None)[source]¶

Adds two columns to Mantissas’s DataFrame equal to their “X” and “Y” coordinates, plots its to a scatter plot and calculates the gravity center of the circle.

Parameters:

grid – show grid of the plot. Defaluts to True.
figsize (int) – size of the figure to be displayed. Since it is a square, there is no need to provide a tuple, like is usually the case with matplotlib.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class benford.benford.Benford(data, decimals=2, sign='all', confidence=95, mantissas=True, sec_order=False, summation=False, limit_N=None, verbose=True)[source]¶

Bases: object

Initializes a Benford Analysis object and computes the proportions for the digits. The tets dataFrames are atributes, i.e., obj.F1D is the First Digit DataFrame, the obj.F2D,the First Two Digits one, and so one, F3D for First Three Digits, SD for Second Digit and L2D for Last Two Digits.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a tuple with a pandas DataFrame and the name (str) of the chosen column. Values must be integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to 95.
mantissas (bool) – opts for also running the mantissas Test. Defaulst to True
sec_order – runs the Second Order tests, which are the Benford’s tests performed on the differences between the ordered sample (a value minus the one before it, and so on). If the original series is Benford- compliant, this new sequence should aldo follow Beford. The Second Order can also be called separately, through the method sec_order().
summation – creates the Summation DataFrames for the First, First Two, and First Three Digits. The summation tests can also be called separately, through the method summation().
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
verbose – gives some information about the data and the registries used and discarded for each test.

data¶: the raw data provided for the analysis

chosen¶: the column of the DataFrame to be analysed or the data itself

sign¶

which number sign(s) to include in the analysis

Type:	str

confidence¶: current confidence level

limit_N¶

sample size to use in computations

Type:	int

verbose¶

verbose or not

Type:	bool

base¶: the Base, pre-processed object

tests¶

keeps track of the tests the instance has

Type:	`list` of `str`

update_confidence(new_conf, tests=None)[source]¶

Sets (a) new confidence level(s) for the Benford object, so as to be used to produce critical values for the tests.

Parameters:

new_conf – new confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics.
tests (list of str) – list of tests names (strings) to have their confidence updated. If only one, provide a one-element list, like [‘F1D’]. Defauts to None, in which case it will use the instance .test list attribute.

Raises:

ValueError – if the test argument is not a list or None.

all_confidences¶

a dictionary with a confidence level for each computed tests, when applicable.

Type:	dict

mantissas()[source]¶: Adds a Mantissas object to the tests, with all its statistics and plotting capabilities.

sec_order()[source]¶: Runs the Second Order tests, which are the Benford’s tests performed on the differences between the ordered sample (a value minus the one before it, and so on). If the original series is Benford- compliant, this new sequence should aldo follow Beford. The Second Order can also be called separately, through the method sec_order().

summation()[source]¶: Creates Summation test DataFrames from Base object

class benford.benford.Source(data, decimals=2, sign='all', sec_order=False, verbose=True, inform=None)[source]¶

Bases: pandas.core.frame.DataFrame

Prepares the data for Analysis. pandas DataFrame subclass.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
sec_order – choice for the Second Order Test, which cumputes the differences between the ordered entries before running the Tests.
verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True.

Raises:

ValueError – if the sign arg is not in [‘all’, ‘pos’, ‘neg’]
TypeError – if not receiving int or float as input.

verbose = None¶

verbose or not

Type:	(bool)

mantissas(report=True, show_plot=True, figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶

Calculates the mantissas, their mean and variance, and compares them with the mean and variance of a Benford’s sequence.

Parameters:

report – prints the mamtissas mean, variance, skewness and kurtosis for the sequence studied, along with reference values.
show_plot – plots the ordered mantissas and a line with the expected inclination. Defaults to True.
figsize – tuple that sets the figure dimensions.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

first_digits(digs, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, ret_df=False)[source]¶

Performs the Benford First Digits test with the series of numbers provided, and populates the mapping dict for future selection of the original series.

Parameters:

digs (int) – number of first digits to consider. Must be 1 (first digit), 2 (first two digits) or 3 (first three digits).
verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
kl_diverg (bool) – calculates the Kulback-Laibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
show_plot (bool) – draws the test plot. Defaults to True.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
ret_df – returns the test DataFrame. Defaults to False. True if run by the test function.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences

second_digit(confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, ret_df=False)[source]¶

Performs the Benford Second Digit test with the series of numbers provided.

Parameters:

verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
kl_diverg (bool) – calculates the Kulback-Laibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.
ret_df – returns the test DataFrame. Defaults to False. True if run by the test function.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences

last_two_digits(confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, bhat_coeff=False, bhat_dist=False, kl_diverg=False, show_plot=True, save_plot=None, save_plot_kwargs=None, simple=False, ret_df=False)[source]¶

Performs the Benford Last Two Digits test with the series of numbers provided.

Parameters:

verbose (bool) – tells the number of registries that are being subjected to the analysis; defaults to True
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show, as well as to calculate critical values for the tests’ statistics. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
bhat_coeff (bool) – computes the Bhattacharyya Coefficient between the found and the expected (Benford) digits distribution; defaults to Fasle
bhat_dist (bool) – calculates the Bhattacharyya Distance between the found and the expected (Benford) digits distribution; defaults to Fasle
kl_diverg (bool) – calculates the Kulback-Laibler Divergence between the found and the expected (Benford) digits distribution; defaults to False
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences

summation(digs=2, top=20, show_plot=True, save_plot=None, save_plot_kwargs=None, ret_df=False)[source]¶

Performs the Summation test. In a Benford series, the sums of the entries begining with the same digits tends to be the same.

Parameters:

digs – tells the first digits to use. 1- first; 2- first two; 3- first three. Defaults to 2.
top – choses how many top values to show. Defaults to 20.
show_plot – plots the results. Defaults to True.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Expected and Found proportions, and their: absolute differences

duplicates(top_Rep=20, inform=None)[source]¶

Performs a duplicates test and maps the duplicates count in descending order.

Parameters:

verbose (bool) – tells how many duplicated entries were found and prints the top numbers according to the top_Rep argument. Defaluts to True.
top_Rep – int or None. Chooses how many duplicated entries will be shown withe the top repititions. Defaluts to 20. If None, returns al the ordered repetitions.

Returns:

DataFrame with the duplicated records and their occurrence counts,: in descending order (if verbose is False; if True, prints to terminal).

Raises:

ValueError – if the top_Rep arg is not int or None.

class benford.benford.Roll_mad(data, test, window, decimals=2, sign='all')[source]¶

Bases: object

Applies the MAD to sequential subsets of the Series, returning another Series.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and -2: Last Two Digits.
window – size of the subset to be used.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.

test = None¶: the test (F1D, SD, F2D…) used for the MAD calculation and critical values

show_plot(figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶

Shows the rolling MAD plot

Parameters:

figsize – the figure dimensions.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

class benford.benford.Roll_mse(data, test, window, decimals=2, sign='all')[source]¶

Bases: object

Applies the MSE to sequential subsets of the Series, returning another Series.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and -2: Last Two Digits.
window – size of the subset to be used. decimals: number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.

show_plot(figsize=(15, 8), save_plot=None, save_plot_kwargs=None)[source]¶

Shows the rolling MSE plot

Parameters:

figsize – the figure dimensions.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when save_plot is a string with the figure file path/name.

benford.benford.first_digits(data, digs, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Performs the Benford First Digits test on the series of numbers provided.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
digs (int) – number of first digits to consider. Must be 1 (first digit), 2 (first two digits) or 3 (first three digits).
verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
KS – calculates the Kolmogorov-Smirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences if the confidence is not None.

benford.benford.second_digit(data, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Performs the Benford Second Digits test on the series of numbers provided.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
KS – calculates the Kolmogorov-Smirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences if the confidence is not None.

benford.benford.last_two_digits(data, decimals=2, sign='all', verbose=True, confidence=None, high_Z='pos', limit_N=None, MAD=False, MSE=False, chi_square=False, KS=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Performs the Last Two Digits test on the series of numbers provided.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column,with values being integers or floats.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. ‘pos’: only the positive entries; ‘neg’: only negative entries; ‘all’: all entries but zeros. Defaults to ‘all’.
verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
KS – calculates the Kolmogorov-Smirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Expected and Found proportions, and the Z scores of: the differences if the confidence is not None.

benford.benford.mantissas(data, report=True, show_plot=True, arc_test=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Extraxts the mantissas of the records logarithms

Parameters:

data – sequence to compute mantissas from, numpy 1D array, pandas Series of pandas DataFrame column.
report – prints the mamtissas mean, variance, skewness and kurtosis for the sequence studied, along with reference values.
show_plot – plots the ordered mantissas and a line with the expected inclination. Defaults to True.
arc_test – draws the Arc Test plot. Defaluts to True.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

Series with the data mantissas.

benford.benford.summation(data, digs=2, decimals=2, sign='all', top=20, verbose=True, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Performs the Summation test. In a Benford series, the sums of the entries begining with the same digits tends to be the same. Works only with the First Digits (1, 2 or 3) test.

Parameters:

digs – tells the first digits to use: 1- first; 2- first two; 3- first three. Defaults to 2.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
top – choses how many top values to show. Defaults to 20.
show_plot – plots the results. Defaults to True.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame with the Summation test, whether sorted in descending order: (if verbose == True) or not.

benford.benford.mad(data, test, decimals=2, sign='all', verbose=False)[source]¶

Calculates the Mean Absolute Deviation of the Series

Parameters:	data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats. test – informs which base test to use for the mad. decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance. sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns:	the Mean Absolute Deviation of the Series
Return type:	float

benford.benford.mse(data, test, decimals=2, sign='all', verbose=False)[source]¶

Calculates the Mean Squared Error of the Series

Parameters:	data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats. test – informs which base test to use for the mad. decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance. sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns:	the Mean Squared Error of the Series
Return type:	float

benford.benford.bhattacharyya_distance(data, test, decimals, sign='all', verbose=False)[source]¶

Computes the Bhattacharyya Distance between the Found and the Expected (Benford) digits distributions, according toe the test chosen (First, Second, First Two…)

Parameters:	data (ndarray, Series) – sequence to be evaluated, with values being integers or floats. test (int, str) – informs which base test to be used. decimals (int) – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance. sign (str, optional) – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to “all”.
Returns:	the Bhattacharyya Distance between the distributions
Return type:	float

benford.benford.kullback_leibler_divergence(data, test, decimals, sign='all', verbose=False)[source]¶

Computes the Kulback-Leibler Divergence between the Found and the Expected (Benford) digits distributions, according toe the test chosen (First, Second, First Two…).

Parameters:	data (ndarray, Series) – sequence to be evaluated, with values being integers or floats. test (int, str) – informs which base test to be used. decimals (int) – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance. sign (str, optional) – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to “all”.
Returns:	the Kulback-Leibler Divergence between the distributions
Return type:	float

benford.benford.mad_summ(data, test, decimals=2, sign='all', verbose=False)[source]¶

Calculate the Mean Absolute Deviation of the Summation Test

Parameters:	data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats. test – informs which base test to use for the summation mad. decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance. sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
Returns:	the Mean Absolute Deviation of the Summation Test
Return type:	float

benford.benford.rolling_mad(data, test, window, decimals=2, sign='all', show_plot=False, save_plot=None, save_plot_kwargs=None)[source]¶

Applies the MAD to sequential subsets of the records.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and -2: Last Two Digits.
window – size of the subset to be used.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

Series with sequentially computed MADs.

benford.benford.rolling_mse(data, test, window, decimals=2, sign='all', show_plot=False, save_plot=None, save_plot_kwargs=None)[source]¶

Applies the MSE to sequential subsets of the records.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
test – tells which test to use. 1: Fisrt Digits; 2: First Two Digits; 3: First Three Digits; 22: Second Digit; and -2: Last Two Digits.
window – size of the subset to be used.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

Series with sequentially computed MSEs.

benford.benford.duplicates(data, top_Rep=20, verbose=True, inform=None)[source]¶

Performs a duplicates test and maps the duplicates count in descending order.

Parameters:	data – sequence to take the duplicates from. pandas Series or numpy Ndarray. verbose (bool) – tells how many duplicated entries were found and prints the top numbers according to the top_Rep argument. Defaluts to True. top_Rep – chooses how many duplicated entries will be shown withe the top repititions. int or None. Defaluts to 20. If None, returns al the ordered repetitions.
Returns:	DataFrame with the duplicated records and their respective counts
Raises:	`ValueError` – if the top_Rep arg is not int or None.

benford.benford.second_order(data, test, decimals=2, sign='all', verbose=True, MAD=False, confidence=None, high_Z='pos', limit_N=None, MSE=False, show_plot=True, save_plot=None, save_plot_kwargs=None, inform=None)[source]¶

Performs the chosen test after subtracting the ordered sequence by itself. Hence Second Order.

Parameters:

data – sequence of numbers to be evaluated. Must be a numpy 1D array, a pandas Series or a pandas DataFrame column, with values being integers or floats.
test – the test to be performed - 1 or ‘F1D’: First Digit; 2 or ‘F2D’: First Two Digits; 3 or ‘F3D’: First three Digits; 22 or ‘SD’: Second Digits; -2 or ‘L2D’: Last Two Digits.
decimals – number of decimal places to consider. Defaluts to 2. If integers, set to 0. If set to -infer-, it will remove the zeros and consider up to the fifth decimal place to the right, but will loose performance.
sign – tells which portion of the data to consider. pos: only the positive entries; neg: only negative entries; all: all entries but zeros. Defaults to all.
verbose (bool) – tells the number of registries that are being subjected to the analysis and returns tha analysis DataFrame sorted by the highest Z score down. Defaults to True.
MAD (bool) – calculates the Mean Absolute Difference between the found and the expected distributions; defaults to False.
confidence (int, float) – confidence level to draw lower and upper limits when plotting and to limit the top deviations to show. Defaults to None.
high_Z (int) – chooses which Z scores to be used when displaying results, according to the confidence level chosen. Defaluts to ‘pos’, which will highlight only values higher than the expexted frequencies; ‘all’ will highlight both extremes (positive and negative); and an integer, which will use the first n entries, positive and negative, regardless of whether Z is higher than the confidence or not.
limit_N (int) – sets a limit to N as the sample size for the calculation of the Z scores if the sample is too big. Defaults to None.
MSE (bool) – calculates the Mean Square Error of the sample; defaults to False.
chi_square – calculates the chi_square statistic of the sample and compares it with a critical value, according to the confidence level chosen and the series’s degrees of freedom. Defaults to False. Requires confidence != None.
KS – calculates the Kolmogorov-Smirnov test, comparing the cumulative distribution of the sample with the Benford’s, according to the confidence level chosen. Defaults to False. Requires confidence != None.
show_plot (bool) – draws the test plot.
save_plot (str) – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs (dict) – any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

Returns:

DataFrame of the test chosen, but applied on Second Order pre-: processed data.

benford.expected module¶

class benford.expected.First(digs, plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Bases: pandas.core.frame.DataFrame

Holds the expected probabilities of the First, First Two, or First Three digits according to Benford’s distribution.

Parameters:

digs – 1, 2 or 3 - tells which of the first digits to consider: 1 for the First Digit, 2 for the First Two Digits and 3 for the First Three Digits.
plot – option to plot a bar chart of the Expected proportions. Defaults to True.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class benford.expected.Second(plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Bases: pandas.core.frame.DataFrame

Holds the expected probabilities of the Second Digits according to Benford’s distribution.

Parameters:

plot – option to plot a bar chart of the Expected proportions. Defaults to True.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

class benford.expected.LastTwo(num=False, plot=True, save_plot=None, save_plot_kwargs=None)[source]¶

Bases: pandas.core.frame.DataFrame

Holds the expected probabilities of the Last Two Digits according to Benford’s distribution.

Parameters:

plot – option to plot a bar chart of the Expected proportions. Defaults to True.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension. Only available when plot=True.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html Only available when plot=True and save_plot is a string with the figure file path/name.

benford.stats module¶

benford.stats.Z_score(frame, N)[source]¶

Computes the Z statistics for the proportions studied

Parameters:	frame – DataFrame with the expected proportions and the already calculated Absolute Diferences between the found and expeccted proportions N – sample size
Returns:	Series of computed Z scores

benford.stats.chi_sq(frame, ddf, confidence, verbose=True)[source]¶

Comnputes the chi-square statistic of the found distributions and compares it with the critical chi-square of such a sample, according to the confidence level chosen and the degrees of freedom - len(sample) -1.

Parameters:

frame – DataFrame with Found, Expected and their difference columns.
ddf – Degrees of freedom to consider.
confidence – Confidence level to look up critical value.
verbose – prints the chi-squre result and compares to the critical chi-square for the sample. Defaults to True.

Returns:

The computed Chi square statistic and the critical chi square: (according) to the degrees of freedom and confidence level, for comparison. None if confidence is None

benford.stats.chi_sq_2(frame)[source]¶

Computes the chi-square statistic of the found distributions

Parameters:	frame – DataFrame with Found, Expected and their difference columns.
Returns:	The computed Chi square statistic

benford.stats.kolmogorov_smirnov(frame, confidence, N, verbose=True)[source]¶

Computes the Kolmogorov-Smirnov test of the found distributions and compares it with the critical chi-square of such a sample, according to the confidence level chosen.

Parameters:

frame – DataFrame with Foud and Expected distributions.
confidence – Confidence level to look up critical value.
N – Sample size
verbose – prints the KS result and the critical value for the sample. Defaults to True.

Returns:

The Suprem, which is the greatest absolute difference between the: Found and the expected proportions, and the Kolmogorov-Smirnov critical value according to the confidence level, for ccomparison

benford.stats.kolmogorov_smirnov_2(frame)[source]¶

Computes the Kolmogorov-Smirnov test of the found distributions

Parameters:	frame – DataFrame with Foud and Expected distributions.
Returns:	The Suprem, which is the greatest absolute difference between the Found end th expected proportions

benford.stats.mad(frame, test, verbose=True)[source]¶

Computes the Mean Absolute Deviation (MAD) between the found and the expected proportions.

Parameters:

frame – DataFrame with the Absolute Deviations already calculated.
test – Test to compute the MAD from (F1D, SD, F2D…)
verbose – prints the MAD result and compares to limit values of conformity. Defaults to True.

Returns:

The Mean of the Absolute Deviations between the found and expected: proportions.

benford.stats.mse(frame, verbose=True)[source]¶

Computes the test’s Mean Square Error

Parameters:	frame – DataFrame with the already computed Absolute Deviations between the found and expected proportions verbose – Prints the MSE. Defaults to True.
Returns:	Mean of the squared differences between the found and the expected proportions.

benford.viz module¶

benford.viz.plot_expected(df, digs, save_plot=None, save_plot_kwargs=None)[source]¶

Plots the Expected Benford Distributions

Parameters:

df – DataFrame with the Expected Proportions
digs – Test’s digit
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_digs(df, x, y_Exp, y_Found, N, figsize, conf_Z, text_x=False, save_plot=None, save_plot_kwargs=None)[source]¶

Plots the digits tests results

Parameters:

df – DataFrame with the data to be plotted
x – sequence to be used in the x axis
y_Exp – sequence of the expected proportions to be used in the y axis (line)
y_Found – sequence of the found proportions to be used in the y axis (bars)
N – lenght of sequence, to be used when plotting the confidence levels
figsize – tuple to state the size of the plot figure
conf_Z – Confidence level
save_pic – file path to save figure
text_x – Forces to show all x ticks labels. Defaluts to True.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_sum(df, figsize, li, text_x=False, save_plot=None, save_plot_kwargs=None)[source]¶

Plots the summation test results

Parameters:

df – DataFrame with the data to be plotted
figsize – sets the dimensions of the plot figure
li – value with which to draw the horizontal line
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_ordered_mantissas(col, figsize=(12, 12), save_plot=None, save_plot_kwargs=None)[source]¶

Plots the ordered mantissas and compares them to the expected, straight: line that should be formed in a Benford-cmpliant set.

Parameters:

col (Series) – column of mantissas to plot.
figsize (tuple) – sets the dimensions of the plot figure.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_mantissa_arc_test(df, gravity_center, grid=True, figsize=12, save_plot=None, save_plot_kwargs=None)[source]¶

Draws thee Mantissa Arc Test after computing X and Y circular coordinates for every mantissa and the center of gravity for the set

Parameters:

df (DataFrame) – pandas DataFrame with the mantissas and the X and Y coordinates.
gravity_center (tuple) – coordinates for plottling the gravity center
grid (bool) – show grid. Defaults to True.
figsize (int) – figure dimensions. No need to be a tuple, since the figure is a square.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_roll_mse(roll_series, figsize, save_plot=None, save_plot_kwargs=None)[source]¶

Shows the rolling MSE plot

Parameters:

roll_series – pd.Series resultant form rolling mse.
figsize – the figure dimensions.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html

benford.viz.plot_roll_mad(roll_mad, figsize, save_plot=None, save_plot_kwargs=None)[source]¶

Shows the rolling MAD plot

Parameters:

roll_mad – pd.Series resultant form rolling mad.
figsize – the figure dimensions.
save_plot – string with the path/name of the file in which the generated plot will be saved. Uses matplotlib.pyplot.savefig(). File format is infered by the file name extension.
save_plot_kwargs – dict with any of the kwargs accepted by matplotlib.pyplot.savefig() https://matplotlib.org/api/_as_gen/matplotlib.pyplot.savefig.html