DataLinter.OutputInterface.WARN_LEVEL_TO_NUMDataLinter.DataInterface.build_data_contextDataLinter.KnowledgeBaseInterface.kb_loadDataLinter.KnowledgeBaseInterface.kb_queryDataLinter.LinterCore.applicableDataLinter.LinterCore.build_data_iteratorDataLinter.LinterCore.build_linting_contextDataLinter.LinterCore.build_linting_contextDataLinter.LinterCore.get_experiment_parametersDataLinter.LinterCore.get_linter_kwargsDataLinter.LinterCore.lintDataLinter.LinterCore.linter_is_enabledDataLinter.LinterCore.load_configDataLinter.LinterCore.process_outputDataLinter.LinterCore.reconcile_contextsDataLinter.OutputInterface.scoreDataLinter.cli_linting_workflowDataLinter.printable_versionDataLinter.version
DataLinter.cli_linting_workflow — Method
Basic flow for running the linter in a command line interface environment such as a Unix shell.
DataLinter.printable_version — Method
printable_version()Returns a pretty version string that includes the git commit and date.
DataLinter.version — Method
version()Returns the current DataLinter version using the Project.toml and git. If the Project.toml, git are not available, the version defaults to an empty string.
DataLinter.LinterCore.applicable — Method
Function that checks whether a linter is applicable or not. The logic is that the iterable type must match and if linter.linting_ctx==true then a linting context must exist, either specified in the config, through the presence of code or both.
DataLinter.LinterCore.build_linting_context — Method
Function that builds a LintingContext from a linter configuration
DataLinter.LinterCore.build_linting_context — Method
Function that builds a LintingContext from code and code query
DataLinter.LinterCore.lint — Method
lint(data_ctx::AbstractDataContext, kb::Union{Nothing, AbstractKnowledgeBase}; config=nothing, debug=false, linters=["all"])Main linting function. Lints the data provided by data_ctx using knowledge from kb. A configuration for the available linters can be provided in config. If debug=true, performance information for each linter are shown. By default, all available linters will be used.
DataLinter.LinterCore.reconcile_contexts — Method
reconcile_contexts(code_ctx, config_ctx)Function that reconciles contexts obtained from code and configuration .toml file. The basic approach is to take all available data from code_ctx and when not available fill in from config_ctx.
DataLinter.LinterCore.get_experiment_parameters — Method
Function that reads linter configuration parameters.
DataLinter.LinterCore.get_linter_kwargs — Method
Function that reads linter configuration parameters.
DataLinter.LinterCore.linter_is_enabled — Method
Function that returns whether a linter is enabled in the config or not.
DataLinter.LinterCore.load_config — Method
load_config(configpath::AbstractString)Loads a linting configuration file located at configpath. The configuration file contains options regarding which linters are enabled and linter parameter values.
Examples
julia> using DataLinter
using Pkg
configpath = joinpath(dirname((Pkg.project()).path), "config", "default.toml")
DataLinter.LinterCore.load_config(configpath)
Dict{String, Any} with 2 entries:
"parameters" => Dict{String, Any}("uncommon_signs"=>Dict{String, Any}(), "enum_detector"=>Dict{String, Any}("distinct_max_limit"=>5, "distinct_ratio"=>0.001), "empty_example"=>Dict{String, Any}(), "negative_…
"linters" => Dict{String, Any}("uncommon_signs"=>true, "enum_detector"=>true, "empty_example"=>true, "negative_values"=>true, "tokenizable_string"=>true, "number_as_string"=>true, "int_as_float"=>true, "l…DataLinter.DataInterface.build_data_context — Method
build_data_context(;data=nothing, code=nothing)Builds a data context object using data and code if available. The data context represents a context in which the linter runs: the data it lints and optionally, the code associated to the data i.e. some algorithm that will be applied on that data.
Examples
julia> using DataLinter
ncols, nrows = 3, 10
data = [rand(nrows) for _ in 1:ncols]
ctx = DataLinter.build_data_context(data)
SimpleDataContext 0.00040435791015625 MB of data
julia> kb = DataLinter.kb_load("")
DataLinter.LinterCore.lint(ctx, kb)
38-element Vector{Pair{Tuple{DataLinter.LinterCore.Linter, String}, Union{Nothing, Bool}}}:
(Linter (name=datetime_as_string, f=is_datetime_as_string), "column: x2") => nothing
(Linter (name=datetime_as_string, f=is_datetime_as_string), "column: x3") => nothing
(Linter (name=datetime_as_string, f=is_datetime_as_string), "column: x1") => nothing
(Linter (name=tokenizable_string, f=is_tokenizable_string), "column: x2") => nothing
...DataLinter.LinterCore.build_data_iterator — Method
Function that returns a DataStructure ammendable for use in the data linters. It contains a row iterator, a column iterator, metadata
DataLinter.KnowledgeBaseInterface.kb_load — Function
Loads a knowledge base.
DataLinter.KnowledgeBaseInterface.kb_query — Function
Runs a query over a knowledge base.
DataLinter.OutputInterface.WARN_LEVEL_TO_NUM — Constant
Structure that maps a warning level to a numeric value. This can be used to obtain an numeric estimate of the issues over a dataset.
DataLinter.LinterCore.process_output — Method
process_output(lintout; buffer=stdout, show_stats=false, show_passing=false, show_na=false)Process linting output for display. The function takes the linter output lintout and prints lints to buffer. If show_stats, show_passing and show_na are set to true, the function will print statistics over the checks, the checks that passes and the ones that could not be applied respectively.
DataLinter.OutputInterface.score — Method
Returns a score corresponding to the severity of the issues found in the dataset. The score is based on the WARN_LEVEL_TO_NUM mapping.