-
Notifications
You must be signed in to change notification settings - Fork 65
Perusing Pair (and BiStream)
It's 2020 and I'm still talking about Pair
s. ;-)
There are plenty of Stackoverflow questions asking about a generic Pair class in Java:
- https://stackoverflow.com/questions/521171/a-java-collection-of-value-pairs-tuples
- https://stackoverflow.com/questions/7316571/java-pairt-n-class-implementation
- https://stackoverflow.com/questions/5303539/didnt-java-once-have-a-pair-class/16089354
- https://stackoverflow.com/questions/24328679/does-java-se-8-have-pairs-or-tuples/24329071
And at least 4 or 5 libraries that provide from a whole slew of tuple implementations to at least a Pair class.
At Google, we are largely biased against Pair (and all of those tuple types).
Where I had to use them, my own Flume code has disgusted me enough (both Java and C++). Some extreme examples can be found in this post
Code using these nested Pair classes also tend to read horrible:
emitFn.emit(in.getFirst(), Iterables.getOnlyElement(in.getSecond().getFirst()));
...
return String.format("%s (SourceId=%d)\t Status:%s\t AllocationCount:%d",
getNetworkName(input.getFirst().getFirst()),
input.getFirst().getFirst(),
input.getFirst().getSecond(),
input.getSecond());
if ((next_mid_iterator->second.first.first > mid_iterator->second.first.first)
|| (next_mid_iterator->second.first.second <= mid_iterator->second.first.second)) {
...
}
So what exactly is wrong about Pair? I'd like to think along two aspects.
People have different tastes. One may opt for the first/second terminology, or _1/_2, or left/right, car/cdr, foo/bar, a/b, yin/yang, head/tail, night/day, gandolf/saruman.
Whatever names you choose, they have one thing in common: they don't mean anything.
And that is why the above Pair usage code are horrible. "second.first.first" is likely the first second best thing since and before the second goto
was first invented.
If you can come up with a logical meaning for these "second.first.first" thingies, and you want yourself weeks later to understand the code, by all means try to name them what they are, for example: value.name.first_name
.
Granted, Java didn't make it easy to create proper classes with proper field names. But I think programmers are also partially responsible because often times you don't really need hashCode()
/equals()
/getters
/setters
, if you are just trying to have a place to define fields and document their semantics/invariants etc. There is nothing wrong with the following simple class:
class Name {
public final String firstName;
public final String lastName;
Name(String firstName, String lastName) {
this.firstName = firstName;
this.lastName = lastName;
}
}
"But that exposes the fields as public. Breaks encapsulation!" you say. Yes, you are right that it doesn't provide abstractions through getters. But YAGNI anyone? Plus, consider this:
- Neither does Pair<String, String> provide any encapsulation. It's just worse because it exposes not just the public access, it even sticks the
<String, String>
thing on your nose and force you to carry it around wherever you go. - Modern IDEs have the "Encapsulate Field" auto refactoring. If it turns out you need to wrap the fields through a getter, great. It means two things:
- You had made a good choice not using Pair in the first place. Because by now it'd have been more difficult to add any abstraction.
- You need to use the "Encapsulate Field" auto refactoring. It will take care of updating your callers.
The YAGNI optimistism only goes so far for locally-used, private/inner classes where you know you won't need to store the object as a hash map key or a Set
. It won't work if you justifiably need equals()
/hashCode()
(or in C++, many might need the operator==
, operator<
etc.)
For these other uncooperative use cases, code generators like AutoValue give a way out so we can create proper value classes almost as easy as we had wished:
@AutoValue
class Name {
public abstract String firstName();
public abstract String lastName();
static Name of(String firstName, String lastName) {
return AutoValue_Name(firstName, lastName);
}
}
(In the not-too-distant future, we may even be able to use tuples)
To be fair, even with the Pair class, this problem could be alleviated in the age of lambda. For example, why not add a method like:
class Pair<A, B> {
public <R> as(BiFunction<? super A, ? super B, R> output);
}
Code like the following isn't hard to read:
parseUserNameAndDomain("foo@gmail.com")
.as((userId, domain) -> ...);
The type Pair<Person, Person>
is both under-specified and over-specified:
- It underspecifies the relationship between the two Person objects. Are they a couple of husband/wife? doctor/patient? interviewer/interviewee?
- It overspecifies the implementation detais. If for example it represents a marriage between two persons, I need a Marriage type, not a type that hides its identity but taunts me with a riddle:
Hey, I have two Person objects in me, guess what I am?".
But before I go any further, I'd like to clarify some easy confusions first.
No. When there is just one thing, the "relationship" argument is moot. Relationship is at least between two things.
That said, it can still be bad if you are over-using primitives to represent higher-level logical entities, especially if this logical entity will be used in multiple places. For example, if your code tends to use "user id" concept over and over again, it's probably a better idea to create a UserId type. Don't use String just because the user id happens to be represented/encoded as a String.
In a Map, the relationship between the two types is defined. They are keys and the values associated with the key.
And yes, while Map<String, String>
may be okay in the internal implementation detail when it's used once or twice, with the context clearly in scope, it can be bad if it ever gets used across packages, or referenced multiple times. Because not knowing which String means what can be a readability problem. You'd be better off with Map<UserId, UserId>
if they are some kind of user id mapping, or wrap the Map inside a higher-level abstraction class.
Unlike Map, BiStream doesn't define a relationship between the two types. So unless seeing the two types gives the readers an immediate clue of the relationship (like in BiStream<UserId, User>
), BiStream<Integer, Integer>
would be bad.
That said, BiStream typically forms a chain of operations, where at each line the BiStream's type changes. the BiStream<Integer, Integer>
type may only be invisible intermediary types, like this:
BiStream.zip(indexesFrom(0), visits) // BiStream<Integer, Integer>
.map((index, visit) -> ...)
...
.collect(...);
If the context is clear enough that we don't even bother spelling out the type explicitly, it can't hurt us.
There are situations where a semantic-free pair type is precisely what's needed. This happened in a real-life project. We had a layered application with a bunch of domain types (Order, LineItem etc.) and then a bunch of corresponding DTO types (OrderDto, LineItemDto). At the boundary of the DTO -> Domain, the implementation of translation code sometimes need to accept or return a list of Pair<OrderDto, Order>
objects.
There was no relationship untold upon seeing Pair<OrderDto, Order>; and that this thing has a pair of OrderDto and Order is exactly the semantics we needed to convey.
In such case, I'd use either BiStream<FooDto, Foo>
or BiCollection<FooDto, Foo>
, depending on whether I need it to be streamed once, or repetitively accessed.
Going back to the root of the problem, people need Pair because they have methods that need to return two values.
As argued above, some of these cases are not really two-valued binary use cases, because what happens to be two things today may evolve to 3 or 4 things tomorrow. What the programmer really needs is a higher-level abstraction. For example, you'll want to return a Marriage object, not a Pair<Person, Person>
object, because in the future Marriage may evolve to also need other information such as, say, Asset? Jurisdiction? Diamond? Anniversay? ExpirationDate? :)
True two-valued binary use cases do exist though, you know, when you'd have a hard time coming up with any better type names than FooAndBar, DomainAndDto etc. Some other real world examples I can think of:
- Split a flag string in the form of "--mode=dry_run" into the flag name/value pair.
- Calculate the quotient and remainder of a division.
- Find a list element and its current index in the list.
So here's my suggestion if you really need to return two values:
When dealing with a collection or a stream of these pairs, use BiStream
or BiCollection
.
Or else, consider to use the lambda approach (as similarly done in JDK 12's Collectors.teeing() API:
/** Splits string by delimiter */
<R> R split(..., BiFunction<String, String, R> output) {
...
return output.apply(before, after);
}
/** Finds the element and its index */
<R> R locate(Id id, BiFunction<Integer, ? super T, R> output) {
...
return output.apply(index, element);
}
The benefit is that the callers can call the method to create the appropriate type as it fits:
Flag flag = split(..., Flag::new);
locate(id, (index, element) -> ...);
Even when the caller has no appropriate type to use, they can use Pair or Map.Entry
easily anyway:
Map.Entry<Integer, V> found = locate(id, Map::entry);
Pair<String, String> nameValue = split(..., Pair::new);
As a bonus, all two-valued methods with such signature can be method referenced and used together with BiStream. For example, one can split a stream of strings using:
ImmutableListMultimap<String, String> keyValues = readLines().stream()
.collect(toBiStream(Substring.first('=')::splitThenTrim))
.collect(ImmutableListMultimap::toImmutableListMultimap);
So, I guess the question is: why not?