You can use Analytics to identify instances of duplicate products, customers, or most dimensions on a report. For example, if you want to find duplicate products, you'll have to run the Dusty Inventory report and remove the filters for Is Dusty and Quantity on Hand is greater than zero.
Using Match+Index to find duplicates
-
Add System ID as a dimension to distinguish instances of products with the same name from one another.
-
Then, to rule out case-sensitivity, use the Upper calculation from the cell-based functions to convert all content in the product description to upper case.
-
Click Save, and the text is displayed in capital letters.
-
Next, use Analytics to find the first instance of each description by using the Match calculation from the table-based functions. Start by creating the calculation in brackets.
-
Copy the uppercase and items descriptions.
-
Paste the uppercase and items descriptions.
-
Click Run, and the report will display the row that each description first shows up on. The First Instance numbers are identical to the row numbers.
-
Enter Title and Description and click Save As New.
Identifying instances different from row numbers
After using Match+Index to find duplicates, you can identify where the first instances are not the same as the row ID. This means that the first instance is higher on the table, and, therefore, the current instance is a duplicate.
-
Create an IF statement. If the row is the same as the first instance, return a 0, if not, return a 1. Start the IF statement.
-
Copy and paste the First Instance calculation.
-
Compare it to the row.
-
Click Save, and the report displays the new calculation, highlighting duplicates over others.
-
Enter a Title and Description and click Save As New.
Adding columns for duplicates
Now that you've identified the duplicates, you can pull them out in a meaningful way.
-
To start this, you can have the duplicates rise in sequence. Do this by using the
running_totalfunction from the math functions. -
Use the
running_totalto add the 1s from the duplicates. -
Click Save, and on the report, each new duplicate on the list comes with a new number in sequence.
-
Click Save As New.
Finding which rows have duplicates
Now, with each new duplicate comes a new integer: 1, 2, 3, and so on. This is similar to the sequence for rows. That means you can use Analytics, by row, to find where on the table Duplicate 1 occurs, again, using the Match function.
-
Start it with brackets.
-
First, get Analytics to look at the Row.
-
Then, copy the running total of duplicates.
-
And ask it to look at this sequence.
-
Now. in this example, the report displays that the first duplicate is in row 298, the second duplicate is in row 409, the third is in row 700, and so on.
-
Enter a Title and Description and click Save As New.
Adding the duplicated product descriptions to your report
-
Finally, use the Index function to pull up the descriptions. Start our Index function with brackets.
-
Then, refer to the Descriptions column.
-
For the matched position of rising duplicates.
-
Click Save and you'll get a list of product descriptions that have duplicates.
-
Add a Title, Description, and click Save As New.
Making visualization changes
-
There are a few visual things you can do now to clean up the report. Firstly, remove all the calculations except the final two.
-
Also, hide the first set of dimensions from the Visualization.
This is an important step because the Dimensions on the left-hand side have no bearing on the results on the right.
-
Finally, get Analytics to look for the System IDs for duplicate products, just by copying and pasting the final calculation.
-
Substitute the index column from Description for the System ID.
-
Click Save, and you'll get one report that finds the results you need for your next steps.
This process will work only for the first 5,000 rows of results in Analytics. Remember you can filter results by starting letters of Description, so that if duplicates are not in the same Category or Vendor, you can still identify them.