responses2.Rmd

---
output: github_document
---

## Responses to review 2 of stats19

Thanks for the detailed review.
We think all the suggestions make sense, and link to previous discussions about combining the 3 stage (`dl`, `read`, `format`) process into a single function call, which we were thinking of calling `get_stats19()`.
There reason for splitting the process up is to ensure maximum transparency and to give the user control over what the package is doing.
However, as long as it is properly documents, we think the benefits of a `get_stats19()` function will outweigh any possible negatives we can think of so we plan to go ahead and create this function.

We would very much welcome a pull request that address some of the other issues you mention.

Responses to other issues/questions/comments are provided below.
We suspect that some of the comments refer to an older version of the package, which is completely understandable (the package has evolved since the initial submission!) and explains why some of the responses are short / questions.

- [x]Clearly much effort has gone into this package. I greatly support the sentiment behind making these data available in R having done the same for other data sets, myself. This package should be a great benefit to researchers in this area. I really appreciate the slim dependencies. It will make this package much easier to maintain into the future.

Thanks for the comments, we have indeed tried to keep dependencies to a minimum but consider `readr` and `tibble` worthwhile.
`readxl` and `curl` have been demoted to `Suggests`, as detailed in another comment.

- [x]I found the package to be well documented, the vignette is helpful in illustrating some use cases for the data along with how to access it and the code is clear.

Thanks. If you think of other was we can communicate the value of the data, do let us know (I think the second mapping figure could be improved...).

- [ ] Some of the functionality I find to be mildly confusing like downloading and then importing the files into the R session and then formatting. As a user I'd prefer it all in one step, but there are likely use cases I'm not aware of that mean that this is useful so some examples of this would be useful I think.

We have long been planning to add a `get_stats19()` function as per https://github.com/ITSLeeds/stats19/issues/11 
The review comment, combined with further discussion, has triggered us to re-prioritise it.
It's been beneficial to polish each of the component functions first, however, and good to document each stage for maximum transparency, however, so we plan to keep the `dl`, `read` and `format` functions exported.

- [ ] My general comments on the code follow and sections for the DESCRTIPTION and Vignette as well. I've commented quite a bit on grammar and spelling as I think that the polish of a package is important as it lends to the perception of the quality.

Agreed.

--------------------------------------------------------------------------------

- [ ] Per rOpenSci policy, avoid start-up messages, rely on the documentation for
citation information:
https://ropensci.github.io/dev_guide/building.html#recommended-scaffolding.

The guidance is to 'Only use package startup messages when necessary'.
A case can be made that this is necessary.
As with `osmdata`, the package provides access to data that has a license that requires it to be cited.
The `osmdata` load message is as follows:

```{r}
library(osmdata)
```

We fully agree with the reasoning behind remove package startup messages however.
As a compromise, we've shortened the startup from 4 lines to 2:

```{r}
# before:
# Data provided under the conditions of the Open Government License.
# If you use data from this package, mention the source
# (UK Department for Transport), cite the package, and link to:
# www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.
```

```{r}
# after:
library(stats19)
```

- [ ] Avoid long lines >80 chars

running `goodpractice::gp()` found the following lines with > 80 lines:

```
    R/format.R:62:1
    R/format.R:67:1
    R/read.R:141:1
    R/utils.R:167:1
```

All these have been fixed.

- [ ] Inconsistent use of white spaces in code, see `find_file_name()` `if` statements for examples.

- [ ] The package does not pass `R CMD check`. *curl*, *readxl* and *tibble* are all listed as an Imports in DESCRIPTION but not imported from. With *curl* being used in tests, this means it should be in Suggests, I think. The others should be removed.

`curl` is used in the tests and `readxl` is used in the examples.
These have been demoted to `Suggests`.
`tibble` has been removed from the DESCRIPTION file.

- [ ] I don't think it's good form to have an example that won't work on Windows in the help file for `stats19_schema`, from data.R - line 17? Most of what I see there would be better served in a `data_raw` folder showing how the data were created with the documentation actually documenting what the variables are not how they were created, see <http://r-pkgs.had.co.nz/data.html> and for an example, <https://github.com/ropensci/GSODR/tree/master/data-raw>.

- [ ] I would suggest to use proper formatting in help files, when naming packages, e.g. \pkg{stats19} and when referring to documented functions or data, e.g. \code{\link{stats19_schema}}, or with single quotes around abbreviations, e.g. 'DfT'. @ColinFay has an excellent page that outlines the formatting options and when/how to use them, <https://colinfay.me/writing-r-extensions/writing-r-documentation-files.html>. This will greatly enhance the users' experience when using the help files by making them more readable.

- [ ] I also would suggest making use of `@seealso` in documentation. For example, the `dl_stats19()` example works great in the help files, but from there I have the data but it's not in R. Using the `@seealso` you can let the user know about the `read_*()` functions.

- [ ] I downloaded files using `dl_stats19()`, selecting "Casualties", and then ran `read_collisions()` and got 
Is it possible to be more descriptive and say that I've used the wrong `read_*()` based on the file/data found and offer to import it?

- [ ] Missing "." after "e.g." in dl.R on lines 8 and 9, there may be others that I didn't spy.

- [ ] Capitalisation in help files is inconsistent, e.g. lines 123-125 of read.R, parameter descriptions are mixed upper and lower case for first word after parameter itself. This applies to other functions where the descriptions are given in all lower case for other functions or upper case.

- [ ] Testing the functionality, I get this, when I expect it to tell me that `deaths` is not a valid input. But then when I hit escape, I expect it simply exit, not provide a warning message on the way out as well.
```r
dl_stats19(year = 1979, type = "deaths")
No files of that type found for that year.
This will download 240 MB+ (1.8 GB unzipped).
Files identified: Stats19-Data1979-2004.zip

Download now (y = enter, n = esc)? 

Warning message:
In find_file_name(years = year, type = type) :
  Coordinates unreliable in this data.
```

- [ ] I got caught out when using the interactive features. I read "y = enter" but hit "y" thinking that would work as well as hitting "enter", but R cancelled the operation anyway just as if I'd hit "esc"

- [ ] Per a recent conversation with CRAN, you should use `donttest()` rather than `dontrun()` for examples you don't want to be run on CRAN. Then set .travis.yml to run them by using `r_check_args: --as-cran --run-donttest`. **This may not be appropriate in all cases, e.g. interactive functions.**


- [ ] When validating user inputs and using `stop()` it's nice to use `call. = FALSE` to simplify the error message that the user receives.

- [ ] Consider using [`hoardr`](https://ropensci.github.io/hoardr/) for managing user-saved files on disk that aren't in `tempdir()`?

- [ ] When using `utils::download.file()`, you should use `mode = "wb"` or Windows users may end up with corrupted downloads in my experience. `curl::curl_download()` does the same thing but uses more updated ways of doing it and defaults to using a binary mode (wb).

- [ ] I don't think that there is much need for the `Attempting download from` or `Reading in: ` message. If it takes that long, I would suggest to use a progress bar to show progress. But this is just a personal observation.

- [ ] Consider setting up a `pkgdown` site? It's easy to do and you can automate deployment with your Travis-CI so it's less to remember.

#### Tests

- [ ] I'm unclear how the interactive portion of the package functions is handled in testing? There are ways to handle this, but I don't see any implemented and when I run `devtools::test()` I'm asked to provide my own input.

- [ ] Suggest using `skip_on_cran()` since some of the tests can take some time to execute due to download times.

#### DESCRIPTION File

- [ ] In the DESCRIPTION file, Mark's author entry is missing his ORCID.

- [ ] More information in the DESCRIPTION's Description field would be desirable, a link to the data's website or other information to give more background perhaps.

- [ ] STATS19 should be in "'" in DESCRIPTION for CRAN, i.e., 'STATS19', I think.

- [ ] Check spelling in DESCRIPTION file, see: "analysie"

- [ ] The Description should include a link to the DfT website.

- [ ] Language field should be set, `Language: en-GB`

#### README File(s)

- [ ] Use `remotes::install_github()` in place of `devtools::install_github()` in README.

- [ ] The code style is inconsistent in the README.Rmd file in the code chunks, e.g. line 85 is missing space around `=`.

- [ ] The example in the README showing two steps seems necessarily confusing to new users. If there is a good reason for having the raw data in R, document in a vignette why this is useful and show the two-step process, but if normal users won't do this, I wouldn't show it in the quick-start.

- [ ] Line 43 of README uses inconsistent "(" around the phrases with the other two `read_*` function description.

#### Vignette

- [ ] Run spell-check on it.

- [ ] The term "attach"" has a specific meaning in R. Suggest rewording the portion about installation and loading the package to omit the use of "attach", since you're not using `attach()` in the R sense (and really shouldn't use it anyway).

- [ ] I would describe why a user might want or need to install the Development version from GitHub in the vignette. Presumably if they are reading the vignette, they've already installed the package from CRAN (in the future).

- [ ] Try to consistently use `function()` to identify functions in the vignette text. This also means that if/when you use pkgdown to build a site, the functions are linked to the help file.

- [ ] In the introduction, the description of why there are `read_*()` and `format_*()` functions is confusing. To me, it reads as if `format` is only a parameter for `read_*()` in the introduction. I was left wondering why it's documented there or why the `format_*()`s even exist until I reached the end of the vignette.

- [ ] There is a comma out of place in Vignette,

- [ ] Format: Each of the read_*() functions has a format parameter which, when TRUE, adds

should be 

- [ ] Format: Each of the read_*() functions has a format parameter, which, when TRUE, adds 

- [x] I'm unsure about including a package that's not on CRAN in the vignette (`ukboundaries`), something like this should be listed in Suggests, but it's not on CRAN, @sckott do you have any thoughts?

This is a good point.
Fixed, by adding a much more useful dataset, representing the juristictions of polic forces across England and Wales. 

- [ ] The first figures in the `sf` section after the join aren't immediately clear to me. The axis lack labels, I'm not really sure what I'm looking at.

#### Meta

- [ ] The contributing guidelines mention a `pkgdown` website, this does not exist