To identify duplicates in SAS, you can use PROC SORT and use the dupout option. ‘dupout’ will create a new dataset and keep just the duplicate observations of the original dataset.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example dupout=dups noduprecs;
    by a;
run;

/* dups Dataset */
    a    b
    1    2
    1    2
    2    6
    2    6

You can also use the ‘nodupkey’ option to identify duplicates based on specific columns.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example dupout=dups nodupkey;
    by a;
run;

/* dups Dataset */
    a    b
    1    2
    1    2
    2    6
    2    6
    2    8

When working with data, the ability to identify duplicates in your data can be very valuable.

PROC SORT is most used to sort data in SAS, but you can also use PROC SORT to identify duplicates with different options.

When using PROC SORT in SAS, you can use the ‘dupout’ option to output duplicate observations. You can specify ‘nodupkey’ or ‘noduprecs’ as well to specify if the duplicates should be identified with BY values or for the entire observation.

Below is a simple example showing you how to identify duplicate observations with ‘dupout’ and ‘noduprecs’ in SAS with PROC SORT.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example dupout=dups noduprecs;
    by a;
run;

/* dups Dataset */
    a    b
    1    2
    1    2
    2    6
    2    6

This is the opposite of if you used ‘nodup’ with PROC SORT.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example nodup;
    by a;
run;

/* example After PROC SORT */
    a    b
    1    2
    2    6
    2    8

You can also identify duplicate observations by BY values with the ‘nodupkey’ option. Below shows you how to identify duplicates with ‘nodupkey’ and ‘dupout’.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example dupout=dups nodupkey;
    by a;
run;

/* dups Dataset */
    a    b
    1    2
    1    2
    2    6
    2    6
    2    8

This is the opposite of if you used ‘nodupkey’ with PROC SORT.

data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;

proc sort data=example nodup;
    by a;
run;

/* example After PROC SORT */
    a    b
    1    2
    2    6

Hopefully this article has been useful for you to learn how to identify duplicates in SAS with PROC SORT.

Categorized in:

SAS,

Last Update: March 11, 2024