Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up Santa Clara County street import #51

Open
7 tasks
1ec5 opened this issue May 23, 2024 · 4 comments
Open
7 tasks

Clean up Santa Clara County street import #51

1ec5 opened this issue May 23, 2024 · 4 comments

Comments

@1ec5
Copy link
Member

1ec5 commented May 23, 2024

In 2020, Stanford Libraries republished a public domain dataset of streets throughout Santa Clara County that the Santa Clara County Planning Office used to publish on its open data portal.1 Last August, @jeffreyameyer imported an extract of this dataset into OHM, 1,886 features in all, ahead of a presentation at Stanford. The import covers the Stanford campus, downtown Mountain View, and some major streets in that part of the San Francisco Peninsula. This issue tracks cleaning up the import to follow OHM norms.

A map of Stanford and vicinity with 1,886 street features highlighted.

The dataset has a date_creat field, but this only indicates when the feature was added to the database in ArcGIS, generally between 2004 and 2008. By contrast, the import tagged every street as if it started on March 1 in various years in the 19th and 20th centuries.2 These seem to be estimates based on some old maps, but the placeholder month and day leave me a bit uncertain about that.

Aside from dates, most of the other attributes need to be cleaned up. For example, on this stretch of San Antonio Road:

  • Delete group=t. I have no idea what it means,3 but it doesn’t appear in the dataset and isn’t an established OHM tag.
  • Document the mapping from functional classification codes (from the fcc field) to highway=* tags, for the benefit of future imports.
  • Expand abbreviations in name=* based on the streetpref, streetsuff, and streettype fields.
  • Replace oneway=ft and oneway=tf with oneway=yes and oneway=-1, respectively. (Better yet, delete oneway=tf and reverse those ways.)
  • Delete roadlabel=*, which is redundant to name=* but less polished.
  • Delete streetname=*, streetpref=*, streetsuff=*, and streettype=*. (Alternatively, propose a more structured tagging scheme for street names that isn’t specific to this dataset.)
  • Replace surface=asphalt with surface=paved. The surface field’s PAV value doesn’t specify the kind of pavement, and I don’t think we’d be able to track minute changes in pavement material over time without massive effort.

Footnotes

  1. This dataset has been superseded by a continuously updated Road Centerlines dataset, also in the public domain.

  2. 1924 was not a leap year, so every software package in our stack interprets start_date=1924-02-29 as March 1, 1924.

  3. Unfortunately, the original dataset is no longer available online, and although it came with an FGDC metadata file, this file says nothing about each attribute.

@1ec5
Copy link
Member Author

1ec5 commented May 23, 2024

These seem to be estimates based on some old maps, but the placeholder month and day leave me a bit uncertain about that.

By the way, #47 has an idea for dating streets with more certainty back to 1992. But if we want to stick with this outdated county dataset, we should replace the start_date=* start_date:source=arbitrary with a more pessimistic start_date=* and start_date:edtf=* based on the date_creat field. Then mappers can selectively work their way backward through time, with the ability to choose between this import or a different source. (For example, it should be possible to source state-maintained highways more rigorously without relying on this import.)

@1ec5
Copy link
Member Author

1ec5 commented May 23, 2024

@jeffreyameyer do you remember how the start dates came about? Were the years for real but with placeholder months and days?

@jeffreyameyer
Copy link
Collaborator

jeffreyameyer commented May 23, 2024

Ok - clearly, I've left some incomplete work - my apologies! But, I do think things can be cleaned up quickly. Please see notes / comments below.

The years were largely set by choosing an arbitrary (sorry!) old year, then comparing slowly to old maps and adjusting backward as the maps got older. Roads that stopped showing up as you went back in time didn't get older years, those that did show up continued to get older years. This is not a foolproof method, but is directionally useful and having edtf tags is indeed a better solution than the "arbitrary" markings.

  • Delete group=t. This was a personal tag used for grouping related items I was working through, as well as filtering them out while editing. Also deleted: group=b
  • Document the mapping from functional classification codes (from the fcc field) to highway=* tags, for the benefit of future imports.
  • Expand abbreviations in name=* based on the streetpref, streetsuff, and streettype fields. Many of these fields were set by just changing the all caps field name in the source data to lowercase in OSM & would have benefitted from some pre-processing in QGIS.
  • Replace oneway=ft and oneway=tf with oneway=yes and oneway=-1, respectively. (Better yet, delete oneway=tf and reverse those ways.) and then tag with oneway=yes?
  • Delete roadlabel=*, which is redundant to name=* but less polished.
  • Delete streetname=*, streetpref=*, streetsuff=*, and streettype=*. (Alternatively, propose a more structured tagging scheme for street names that isn’t specific to this dataset.)
  • Replace surface=asphalt with surface=paved. The surface field’s PAV value doesn’t specify the kind of pavement, and I don’t think we’d be able to track minute changes in pavement material over time without massive effort.

@1ec5
Copy link
Member Author

1ec5 commented May 24, 2024

(Better yet, delete oneway=tf and reverse those ways.) and then tag with oneway=yes?

Yes, both the TF and FT values appear to indicate one-way streets. The dataset represents a two-way street by setting the field to null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants