When using PROC SORT in SAS, you can use the ‘nodupkey’ option to remove observations with duplicate BY values. In other words, you can remove duplicates by key variables.
data example;
input a b;
datalines;
1 2
1 3
1 4
2 5
2 6
2 7
2 8
;
run;
proc sort data=example nodupkey;
by a;
run;
/* example After PROC SORT */
a b
1 2
2 5
When working with data, the ability to remove duplicates from your data can be very valuable.
One such case when working with data in SAS is if you want to remove duplicate values based on certain columns in a dataset.
PROC SORT is most used to sort data in SAS, but you can also use PROC SORT to remove duplicates with different options.
When using PROC SORT in SAS, you can use the ‘nodupkey’ option to remove observations with duplicate BY values. In other words, you can remove duplicates by key variables.
If you use the ‘nodupkey’ option, typically you will keep the first observation and remove all other duplicates in the specified column. (This depends on certain global and PROC SORT options you can set)
Below is a simple example showing you how to use ‘nodupkey’ with PROC SORT in SAS.
data example;
input a b;
datalines;
1 2
1 3
1 4
2 5
2 6
2 7
2 8
;
run;
proc sort data=example nodupkey;
by a;
run;
/* example After PROC SORT */
a b
1 2
2 5
The Difference Between nodupkey and nodup Options When Using PROC SORT in SAS
PROC SORT gives many different options for you to use which can change the behavior of what PROC SORT does.
Another useful option when using PROC SORT is ‘nodup’. ‘nodup’ removes duplicate observations and looks at the entire observation instead of just specified columns.
This is a difference between ‘nodup’ and ‘nodupkey’.
It is different as ‘nodupkey’ removes duplicates based on specific columns and ‘nodup’ doesn’t consider specified columns.
In the example above, if we used ‘nodup’, we would get back the entire dataset since there are no duplicate observations.
data example;
input a b;
datalines;
1 2
1 3
1 4
2 5
2 6
2 7
2 8
;
run;
proc sort data=example nodup;
by a;
run;
/* example After PROC SORT */
a b
1 2
1 3
1 4
2 5
2 6
2 7
2 8
Below is a slightly different dataset with some duplicates. Let’s see what happens when we use ‘nodup’ now.
data example;
input a b;
datalines;
1 2
1 2
1 2
2 6
2 6
2 6
2 8
;
run;
proc sort data=example nodup;
by a;
run;
/* example After PROC SORT */
a b
1 2
2 6
2 8
Hopefully this article has been useful for you to learn how to use the ‘nodupkey’ option when using PROC SORT in SAS to remove duplicate by key.