Skip to content

Refactoring: Uri object to replace string

John Erik Halse edited this page Feb 3, 2016 · 1 revision

#Change API to use UsableURI wherever possible instead of strings.

##Motivation Most places in Open Wayback where a URI is expected, it is represented as a string. This makes the API fragile because you never know if the URI is valid, meaning that every class ideally needs to validate the URI before using it. If so, lots of code is duplicated across classes and processing time wasted by validating a URI that is probably validated before. If it is not validated everywhere, then we’re not able to ensure that validation or normalization of the URI is done the same for every possible configuration.

Another problem is that there is a chance that changes to how URIs are handled, are not updated everywhere they should. One such example could be support for international domain names (IDN). The rules for rewriting UTF-8 host names to puny-code and back, should be implemented once and be available in every class using it.

##Changes The proposal is to replace most places in the API where a URI is expected as a String to use UsableURI instead. This will probably also require additions to the UsableURI class to support some of the standard rewriting and splitting of URIs. Maybe the UsableURI class should add more support for SURTs as well.

I also question the use of the factory-pattern for generating UsableURI instances. If UsableURI was immutable this pattern would be ok, but it isn't. A lot of set-methods inherited from org.apache.commons.httpclient.URI makes it possible to alter the URI without enforcing the logic in the factory class. It would be better to override methods in UsableURI to make sure values always are parsed the same way. If we ommit the factory class, it would probably be a good idea to make the constructor private and add static instantiation methods to avoid confusion when constructing UsableURI from uri string or SURT string.

public class UsableURI extends LaxURI implements CharSequence, Serializable {
  public static UsableURI fromUriString(String uri)
  public static UsableURI fromUri(java.net.URI uri)
  public static UsableURI fromSurtString(String surt)
  ...
}

If we would like to keep the factory, then UsableURI should be immutable. This could be achieved in two ways:

  • Override all setter-methods with private no-ops to make UsableURI immutable.
  • Let UsableURI be an interface and only expose methods which doesn't change the value without creating new instances.