Skip to content

Commit

Permalink
Pimp RDD with minBy/maxBy and key/value RDD with min/maxByKey/Value
Browse files Browse the repository at this point in the history
  • Loading branch information
xavierguihot committed Jun 24, 2018
1 parent 6de5724 commit c77a945
Show file tree
Hide file tree
Showing 8 changed files with 245 additions and 2 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,9 @@ rdd.toList // equivalent to rdd.collect.toList - alias: rdd.collectAsList
rdd.toMap // RDD((1, "a"), (2, "b"), (2, "c")) => Map((1, "a"), (2, "c"))
rdd.duplicates // RDD(1, 3, 2, 1, 7, 8, 8, 1, 2) => RDD(1, 2, 8)
rdd.reduceWithCount // RDD("a", "b", "c", "a", "d", "a", "c") => RDD(("a", 3), ("b", 1), ("c", 2), ("d", 1))
rdd.maxBy(_._2) // RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (2, "c") or (4, "c")
rdd.minBy(_._2) // RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (1, "a")
rdd.maxByKey; rdd.minByKey; rdd.maxByValue, ... // RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")).maxByKey => (4, "c")

```

Expand Down
76 changes: 76 additions & 0 deletions docs/com/spark_helper/SparkHelper$$PairRDDExtensions.html
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,82 @@ <h4 class="signature">
</a>
</span>
<div class="fullcomment"><dl class="attributes block"> <dt>Definition Classes</dt><dd>Any</dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.PairRDDExtensions#maxByKey" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="maxByKey()(implicitord:Ordering[K]):(K,V)"></a>
<a id="maxByKey()(Ordering[K]):(K,V)"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">maxByKey</span><span class="params">()</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>]</span>)</span><span class="result">: (<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>, <span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>)</span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$PairRDDExtensions@maxByKey()(implicitord:Ordering[K]):(K,V)" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the element of this RDD with the largest key as defined by the
implicit Ordering[K].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the element of this RDD with the largest key as defined by the
implicit Ordering[K].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).maxByKey <span class="cmt">// (4, "c")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the element with the largest key</p></dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.PairRDDExtensions#maxByValue" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="maxByValue()(implicitord:Ordering[V]):(K,V)"></a>
<a id="maxByValue()(Ordering[V]):(K,V)"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">maxByValue</span><span class="params">()</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>]</span>)</span><span class="result">: (<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>, <span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>)</span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$PairRDDExtensions@maxByValue()(implicitord:Ordering[V]):(K,V)" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the element of this RDD with the largest value as defined by the
implicit Ordering[V].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the element of this RDD with the largest value as defined by the
implicit Ordering[V].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).maxByValue <span class="cmt">// (2, "c") or (4, "c")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the element with the largest value</p></dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.PairRDDExtensions#minByKey" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="minByKey()(implicitord:Ordering[K]):(K,V)"></a>
<a id="minByKey()(Ordering[K]):(K,V)"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">minByKey</span><span class="params">()</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>]</span>)</span><span class="result">: (<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>, <span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>)</span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$PairRDDExtensions@minByKey()(implicitord:Ordering[K]):(K,V)" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the element of this RDD with the smallest key as defined by the
implicit Ordering[T].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the element of this RDD with the smallest key as defined by the
implicit Ordering[T].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).minByKey <span class="cmt">// (1, "a")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the element with the smallest key</p></dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.PairRDDExtensions#minByValue" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="minByValue()(implicitord:Ordering[V]):(K,V)"></a>
<a id="minByValue()(Ordering[V]):(K,V)"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">minByValue</span><span class="params">()</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>]</span>)</span><span class="result">: (<span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.K">K</span>, <span class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions.V">V</span>)</span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$PairRDDExtensions@minByValue()(implicitord:Ordering[V]):(K,V)" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the element of this RDD with the smallest value as defined by
the implicit Ordering[V].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the element of this RDD with the smallest value as defined by
the implicit Ordering[V].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).minByValue <span class="cmt">// (1, "a")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the element with the smallest value</p></dd></dl></div>
</li><li name="scala.AnyRef#ne" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="ne(x$1:AnyRef):Boolean"></a>
<a id="ne(AnyRef):Boolean"></a>
Expand Down
38 changes: 38 additions & 0 deletions docs/com/spark_helper/SparkHelper$$RDDExtensions.html
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,44 @@ <h4 class="signature">
</a>
</span>
<div class="fullcomment"><dl class="attributes block"> <dt>Definition Classes</dt><dd>Any</dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.RDDExtensions#maxBy" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="maxBy[U](f:T=&gt;U)(implicitord:Ordering[U]):T"></a>
<a id="maxBy[U]((T)⇒U)(Ordering[U]):T"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">maxBy</span><span class="tparams">[<span name="U">U</span>]</span><span class="params">(<span name="f">f: (<span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.T">T</span>) ⇒ <span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.maxBy.U">U</span></span>)</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.maxBy.U">U</span>]</span>)</span><span class="result">: <span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.T">T</span></span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$RDDExtensions@maxBy[U](f:T=&gt;U)(implicitord:Ordering[U]):T" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the max of this RDD by the given predicate as defined by the
implicit Ordering[T].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the max of this RDD by the given predicate as defined by the
implicit Ordering[T].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).maxBy(_._2) <span class="cmt">// (2, "c") or (4, "c")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the max of this RDD by the given predicate</p></dd></dl></div>
</li><li name="com.spark_helper.SparkHelper.RDDExtensions#minBy" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="minBy[U](f:T=&gt;U)(implicitord:Ordering[U]):T"></a>
<a id="minBy[U]((T)⇒U)(Ordering[U]):T"></a>
<h4 class="signature">
<span class="modifier_kind">
<span class="modifier"></span>
<span class="kind">def</span>
</span>
<span class="symbol">
<span class="name">minBy</span><span class="tparams">[<span name="U">U</span>]</span><span class="params">(<span name="f">f: (<span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.T">T</span>) ⇒ <span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.minBy.U">U</span></span>)</span><span class="params">(<span class="implicit">implicit </span><span name="ord">ord: <span class="extype" name="scala.Ordering">Ordering</span>[<span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.minBy.U">U</span>]</span>)</span><span class="result">: <span class="extype" name="com.spark_helper.SparkHelper.RDDExtensions.T">T</span></span>
</span>
</h4><span class="permalink">
<a href="../../index.html#com.spark_helper.SparkHelper$$RDDExtensions@minBy[U](f:T=&gt;U)(implicitord:Ordering[U]):T" title="Permalink" target="_top">
<img src="../../lib/permalink.png" alt="Permalink" />
</a>
</span>
<p class="shortcomment cmt">Returns the min of this RDD by the given predicate as defined by the
implicit Ordering[T].</p><div class="fullcomment"><div class="comment cmt"><p>Returns the min of this RDD by the given predicate as defined by the
implicit Ordering[T].</p><pre>RDD((<span class="num">1</span>, <span class="lit">"a"</span>), (<span class="num">2</span>, <span class="lit">"c"</span>), (<span class="num">3</span>, <span class="lit">"b"</span>), (<span class="num">4</span>, <span class="lit">"c"</span>)).minBy(_._2) <span class="cmt">// (1, "a")</span></pre></div><dl class="paramcmts block"><dt>returns</dt><dd class="cmt"><p>the min of this RDD by the given predicate</p></dd></dl></div>
</li><li name="scala.AnyRef#ne" visbl="pub" data-isabs="false" fullComment="yes" group="Ungrouped">
<a id="ne(x$1:AnyRef):Boolean"></a>
<a id="ne(AnyRef):Boolean"></a>
Expand Down
5 changes: 4 additions & 1 deletion docs/com/spark_helper/SparkHelper$.html
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,10 @@ <h4 id="signature" class="signature">
rdd.toList <span class="cmt">// equivalent to rdd.collect.toList - alias: rdd.collectAsList</span>
rdd.toMap <span class="cmt">// RDD((1, "a"), (2, "b"), (2, "c")) => Map((1, "a"), (2, "c"))</span>
rdd.duplicates <span class="cmt">// RDD(1, 3, 2, 1, 7, 8, 8, 1, 2) => RDD(1, 2, 8)</span>
rdd.reduceWithCount <span class="cmt">// RDD("a", "b", "c", "a", "d", "a", "c") => RDD(("a", 3), ("b", 1), ("c", 2), ("d", 1))</span></pre><p>Source <a href="https://github.com/xavierguihot/spark_helper/blob/master/src
rdd.reduceWithCount <span class="cmt">// RDD("a", "b", "c", "a", "d", "a", "c") => RDD(("a", 3), ("b", 1), ("c", 2), ("d", 1))</span>
rdd.maxBy(_._2) <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (2, "c") or (4, "c")</span>
rdd.minBy(_._2) <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (1, "a")</span>
rdd.maxByKey; rdd.minByKey; rdd.maxByValue, ... <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")).maxByKey => (4, "c")</span></pre><p>Source <a href="https://github.com/xavierguihot/spark_helper/blob/master/src
/main/scala/com/spark_helper/SparkHelper.scala">SparkHelper</a>
</p></div><dl class="attributes block"> <dt>Since</dt><dd><p>2017-02</p></dd><dt>To do</dt><dd><span class="cmt"><p>sc.parallelize[T](elmts: T*) instead of sc.parallelize[T](elmts: Array[T])</p></span></dd></dl><div class="toggleContainer block">
<span class="toggle">Linear Supertypes</span>
Expand Down
5 changes: 4 additions & 1 deletion docs/com/spark_helper/package.html
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,10 @@ <h4 class="signature">
rdd.toList <span class="cmt">// equivalent to rdd.collect.toList - alias: rdd.collectAsList</span>
rdd.toMap <span class="cmt">// RDD((1, "a"), (2, "b"), (2, "c")) => Map((1, "a"), (2, "c"))</span>
rdd.duplicates <span class="cmt">// RDD(1, 3, 2, 1, 7, 8, 8, 1, 2) => RDD(1, 2, 8)</span>
rdd.reduceWithCount <span class="cmt">// RDD("a", "b", "c", "a", "d", "a", "c") => RDD(("a", 3), ("b", 1), ("c", 2), ("d", 1))</span></pre><p>Source <a href="https://github.com/xavierguihot/spark_helper/blob/master/src
rdd.reduceWithCount <span class="cmt">// RDD("a", "b", "c", "a", "d", "a", "c") => RDD(("a", 3), ("b", 1), ("c", 2), ("d", 1))</span>
rdd.maxBy(_._2) <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (2, "c") or (4, "c")</span>
rdd.minBy(_._2) <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")) => (1, "a")</span>
rdd.maxByKey; rdd.minByKey; rdd.maxByValue, ... <span class="cmt">// RDD((1, "a"), (2, "c"), (3, "b"), (4, "c")).maxByKey => (4, "c")</span></pre><p>Source <a href="https://github.com/xavierguihot/spark_helper/blob/master/src
/main/scala/com/spark_helper/SparkHelper.scala">SparkHelper</a>
</p></div><dl class="attributes block"> <dt>Since</dt><dd><p>2017-02</p></dd><dt>To do</dt><dd><span class="cmt"><p>sc.parallelize[T](elmts: T*) instead of sc.parallelize[T](elmts: Array[T])</p></span></dd></dl></div>
</li><li name="com.spark_helper.monitoring" visbl="pub" data-isabs="false" fullComment="no" group="Ungrouped">
Expand Down
18 changes: 18 additions & 0 deletions docs/index/index-m.html
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,24 @@
<body><div class="entry">
<div class="name">Monitor</div>
<div class="occurrences"><a href="../com/spark_helper/package.html" class="extype" name="com.spark_helper">spark_helper</a> </div>
</div><div class="entry">
<div class="name">maxBy</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$RDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.RDDExtensions">RDDExtensions</a> </div>
</div><div class="entry">
<div class="name">maxByKey</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$PairRDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions">PairRDDExtensions</a> </div>
</div><div class="entry">
<div class="name">maxByValue</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$PairRDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions">PairRDDExtensions</a> </div>
</div><div class="entry">
<div class="name">minBy</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$RDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.RDDExtensions">RDDExtensions</a> </div>
</div><div class="entry">
<div class="name">minByKey</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$PairRDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions">PairRDDExtensions</a> </div>
</div><div class="entry">
<div class="name">minByValue</div>
<div class="occurrences"><a href="../com/spark_helper/SparkHelper$$PairRDDExtensions.html" class="extype" name="com.spark_helper.SparkHelper.PairRDDExtensions">PairRDDExtensions</a> </div>
</div><div class="entry">
<div class="name">monitoring</div>
<div class="occurrences"><a href="../com/spark_helper/package.html" class="extype" name="com.spark_helper">spark_helper</a> </div>
Expand Down
Loading

0 comments on commit c77a945

Please sign in to comment.