Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -108,14 +108,15 @@ duplicates report latitude longitude
* If I am correct, then the only number in the "Copies" column will be 1.
* But it looks like I was not correct.

* duplicates tag will create a binary variable with 1 for all duplicates
* so I can examine the problem more closely
* (duplicates examples is another option)
duplicates tag latitude longitude, g(duplicated_data)

* If I want to know not just whether there are duplicates but how many
* of each there are for when I look more closely, I can instead do
by latitude longitude, sort: g number_of_duplicates_in_this_group = _N
* duplicates tag will create a variable which contains the number of "extra" rows--
* if there are N rows with the same (latitude longitude), the result is (N - 1).
* I can then examine the problem more closely, e.g. with `browse if n_duplicates != 0`
* (`duplicates examples` is another option)
duplicates tag latitude longitude, g(n_duplicates)

* If instead I want to know the total number of rows with that combination,
* not the number of "extra" rows, I can do:
by latitude longitude, sort: gen n_rows_in_this_group = _N
```

For especially large datasets the [**Gtools**](https://gtools.readthedocs.io/en/latest/index.html) version of the various duplicates commands, [gduplicates](https://gtools.readthedocs.io/en/latest/usage/gduplicates/index.html), is a great option
Expand Down
Loading