CINI- Longitudinal Impact Study (PW-Bengali Tool)

CINI- Longitudinal Impact Study (PW-Bengali Tool)PID 43

Data Quality

This module will allow you to execute data quality rules upon your project data to check for discrepancies in your data. Listed below are some pre-defined data rules that you may utilize and run. You may also create your own rules or edit, delete, or reorder the rules you have already created. To find discrepancies for a given rule, simply click the Execute button next to it, or click the Execute All Rules button to fire all the rules at once. It will provide you with a total number of discrepancies found for each rule and will allow you to view the details of those discrepancies by clicking the View link next to each. Read more detailed instructions.

The pre-defined rules listed in red text cannot be modified, reordered, or removed. They are there if you wish to use them. You may also build and execute your own rules at the bottom of the table below. Rules can be set up using a literal logic format (e.g., [age] > 65) that will be evaluated as a boolean value (true or false) after an existing record's value for that field is substituted (e.g., assuming a record's value is 23 for 'age', 23 > 65 evaluates as false). The logic will be applied to all existing records in the project, and for any record for which the logic evaluates as true, it will return it as a discrepancy for that rule. Similar to branching logic and calculated fields, REDCap variable/field names may be utilized in the rule logic by placing the variable name inside square brackets [ ]. Also, for longitudinal projects, you may reference a field on one specific event by prepending the variable name in the logic with the unique event name in square brackets.

Checking the 'real-time execution' checkbox for any custom data quality rule will enable the rule to be executed invisibly on data entry forms whenever a user clicks the Save button to create or modify a record. After clicking Save, it will execute all relevant data quality rules invisibly (i.e. behind the scenes) and will display a warning pop-up message if any of the rules have been violated, in which it will display a list of the data quality rules that were violated and also display the fields involved with their data values. If no rules were violated, then it will save the record as usual and not display a pop-up message. Just like the results that are returned when executing rules on the Data Quality page itself, results displayed on data entry forms for 'real-time execution' can be excluded (if desired) so that they will not be displayed again if they are still in violation in the future.

Special functions may also be used within the logic as well (similar to functions in calculated fields), all of which are listed on the Help & FAQ page. If Data Access Groups exist for this project, then discrepancies will also be stratified according to their group (assuming the user viewing this page is not in a group). Any user within a Data Access Group will only be able to see the discrepancies for their own group. Also, if users do not have user privileges to view or edit data on specific data entry forms, then they will not be able to view data from those forms if displayed in any results on this page as a data quality discrepancy.

If a discrepancy has been found for a given rule, any individual discrepancy in the list of results may be excluded from those results in the future. Excluding a result merely prevents it from being included in the count of discrepancies if the rule is executed again in the future. Excluded results can be accessed again by clicking the 'view' link at the top of the results table for that rule, after which they can be un-excluded, if desired.

You can only add new Data Quality Rules to the project by uploading a CSV file with a new Data Quality Rules configuration. The format for the CSV upload file can be acquired by exporting the CSV file of your existing Data Quality Rules.

Note: You cannot edit or delete existing Data Quality Rules using the CSV file but can only do that by clicking the edit/delete icon next to a given Rule.

Select your CSV file of Data Quality Rules to be added:

Displayed below is a preview of all new data quality rules you are about to commit. Please look over the additions, and then approve them by clicking the Upload button.

Note: These new data quality rules will be added to the existing set of data quality rules.

Data Quality Rules

Execute rules:

Apply to: Limit to these fields:

	Rule #	Rule Name	Rule Logic (Show discrepancy only if...)	Real-time execution?	Total Discrepancies	Delete rule?

pd-3	A	Blank values*	-
pd-6	B	Blank values* (required fields only)	-
pd-4	C	Field validation errors (incorrect data type)	-
pd-9	D	Field validation errors (out of range)	-
pd-5	E	Outliers for numerical fields (numbers, integers, sliders, calc fields)**	-
pd-7	F	Hidden fields that contain values***	-
pd-8	G	Multiple choice fields with invalid values	-
pd-10	H	Incorrect values for calculated fields	-
pd-11	I	Fields containing "missing data codes"	-
		Enter descriptive name for new rule (e.g., Participants below age 18)	Enter logic for new rule (e.g., [age] < 18) How do I use special functions?	Execute in real time on data entry forms?

* The Blank Values rules above automatically exclude fields hidden by branching logic when searching for blank values. If a field is hidden by branching logic on a data entry form or survey, then it is expected that such a field would not have a value. Thus for these cases, the values for those hidden fields will not be classified as missing. Note: This rule will not return any fields that are blank but also have Missing Data Codes. If you wish to find fields with Missing Data Codes, it is recommended to execute Rule I instead.
** The term 'outlier' refers to a value that is more than two standard deviations from the mean.
*** The term 'hidden fields' refers to any fields on a survey or data entry form that are not being displayed because branching logic is hiding them, which assumes that the field's value should be blank/null.

You may hide certain results from displaying again and again by excluding them. Simply click the 'exclude' link for a result in the table of discrepancies for a rule, and that result will not be counted next time in the number of discrepancies for that rule, nor will it be displayed in the table of results. Results that have been excluded can be viewed again by clicking the 'view' link at the top of the results table for that rule, in which it will display the number of excluded results if any should exist. Results may have their exclusion status removed by clicking the 'remove exclusion' link in the results table for an excluded result.

Data issues found with the Data Quality module can be resolved using the Data Resolution Workflow. After executing any given data quality rule and viewing the results in the pop-up window, you will see a button in the 'Resolve issue' column that allows you and other users in the project to leave comments and/or complete a formal data resolution process for documenting details of the data issue, including the origin the issue, who resolved the issue, and how it was resolved (if applicable). Once the data resolution process has been opened for an item, it will then appear in the 'Resolve Issues', which allows you to view all open items that need to be resolved.

Enabling the 'real-time execution' functionality for a custom rule is a great way to add more data validation on a data entry form to ensure that data are getting entered correctly *at the moment* they are entered, as opposed to checking the quality of the data retroactively by executing the rules here on this page. Using the 'real-time execution' feature is an excellent way to be proactive about maintaining the quality and integrity of your data.

Checking the 'real-time execution' checkbox for any custom data quality rule will enable the rule to be executed invisibly on data entry forms whenever a user clicks the Save button to create or modify a record. After clicking Save, it will execute all relevant data quality rules invisibly (i.e. behind the scenes) and will display a warning pop-up message if any of the rules have been violated, in which it will display a list of the data quality rules that were violated and also display the fields involved with their data values. If no rules were violated, then it will save the record as usual and not display a pop-up message. Just like the results that are returned when executing rules on the Data Quality page itself, results displayed on data entry forms for 'real-time execution' can be excluded (if desired) so that they will not be displayed again if they are still in violation in the future.

NOTE: The pre-defined rules cannot have 'real-time execution' enabled, but only the custom rules can. Also, the 'real-time execution' functionality does not work on survey pages, nor does it get executed when performing data imports (either via the Data Import Tool or via the API). Thus, it only works on data entry forms.

The REDCap Consortium | Citing REDCap

Notifications

Conversations