[SPARK-49944][DOCS] Fix broken main.js import and fix image links for…

… streaming documentation ### What changes were proposed in this pull request? We use the `rel_path_to_root` Jekyll variable in front of all paths that require it. ### Why are the changes needed? Currently, our import to `main.js` and AnchorJS are broken in the Spark 4.0.0-2 preview. Also, images aren't appearing for the Structured Streaming doc pages. See the [ASF issue](https://issues.apache.org/jira/browse/SPARK-49944) for more detail. You can see how the pages are broken [here](https://spark.apache.org/docs/4.0.0-preview2/streaming/getting-started.html); here's a screenshot, for example: <img width="1168" alt="image" src="https://github.com/user-attachments/assets/d0dbc970-a5aa-445a-ae21-f4e32973f031"> ### Does this PR introduce _any_ user-facing change? The preview documentation will now have correctly rendered code blocks, and images will appear. ### How was this patch tested? Local testing. Please build the docs site if you would like to verify. It now looks like: <img width="1271" alt="image" src="https://github.com/user-attachments/assets/08b69f58-d6f4-41b0-bcb5-1af80782c133"> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48438 from neilramaswamy/nr/fix-broken-streaming-links-images. Authored-by: Neil Ramaswamy <neil.ramaswamy@databricks.com> Signed-off-by: Kent Yao <yao@apache.org>
apache · Oct 22, 2024 · e7cdb5a · e7cdb5a
1 parent 2c904e4
commit e7cdb5a
Show file tree

Hide file tree

Showing 4 changed files with 13 additions and 12 deletions.
diff --git a/docs/_layouts/global.html b/docs/_layouts/global.html
@@ -28,7 +28,7 @@
         <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
         <link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,wght@0,400;0,500;0,700;1,400;1,500;1,700&Courier+Prime:wght@400;700&display=swap" rel="stylesheet">
         <link href="{{ rel_path_to_root }}css/custom.css" rel="stylesheet">
-        <script src="/js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
+        <script src="{{ rel_path_to_root}}js/vendor/modernizr-2.6.1-respond-1.1.0.min.js"></script>
 
         <link rel="stylesheet" href="{{ rel_path_to_root }}css/pygments-default.css">
         <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.css" />
@@ -198,8 +198,8 @@ <h1 class="title">{{ page.title }}</h1>
         crossorigin="anonymous"></script>
         <script src="https://code.jquery.com/jquery.js"></script>
 
-        <script src="/js/vendor/anchor.min.js"></script>
-        <script src="/js/main.js"></script>
+        <script src="{{ rel_path_to_root }}js/vendor/anchor.min.js"></script>
+        <script src="{{ rel_path_to_root}}js/main.js"></script>
 
         <script type="text/javascript" src="https://cdn.jsdelivr.net/npm/docsearch.js@2/dist/cdn/docsearch.min.js"></script>
         <script type="text/javascript">

diff --git a/docs/streaming/apis-on-dataframes-and-datasets.md b/docs/streaming/apis-on-dataframes-and-datasets.md
@@ -436,7 +436,7 @@ Imagine our [quick example](./getting-started.html#quick-example) is modified an
 
 The result tables would look something like the following.
 
-![Window Operations](/img/structured-streaming-window.png)
+![Window Operations](../img/structured-streaming-window.png)
 
 Since this windowing is similar to grouping, in code, you can use `groupBy()` and `window()` operations to express windowed aggregations. You can see the full code for the below examples in
 [Python]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/python/sql/streaming/structured_network_wordcount_windowed.py)/[Scala]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala)/[Java]({{site.SPARK_GITHUB_URL}}/blob/v{{site.SPARK_VERSION_SHORT}}/examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredNetworkWordCountWindowed.java).
@@ -512,7 +512,7 @@ naturally in our window-based grouping – Structured Streaming can maintain the
 for partial aggregates for a long period of time such that late data can update aggregates of
 old windows correctly, as illustrated below.
 
-![Handling Late Data](/img/structured-streaming-late-data.png)
+![Handling Late Data](../img/structured-streaming-late-data.png)
 
 However, to run this query for days, it's necessary for the system to bound the amount of
 intermediate in-memory state it accumulates. This means the system needs to know when an old
@@ -605,7 +605,7 @@ the engine will keep updating counts of a window in the Result Table until the w
 than the watermark, which lags behind the current event time in column "timestamp" by 10 minutes.
 Here is an illustration.
 
-![Watermarking in Update Mode](/img/structured-streaming-watermark-update-mode.png)
+![Watermarking in Update Mode](../img/structured-streaming-watermark-update-mode.png)
 
 As shown in the illustration, the maximum event time tracked by the engine is the
 *blue dashed line*, and the watermark set as `(max event time - '10 mins')`
@@ -628,7 +628,7 @@ This is illustrated below.
 Note that using `withWatermark` on a non-streaming Dataset is no-op. As the watermark should not affect
 any batch query in any way, we will ignore it directly.
 
-![Watermarking in Append Mode](/img/structured-streaming-watermark-append-mode.png)
+![Watermarking in Append Mode](../img/structured-streaming-watermark-append-mode.png)
 
 Similar to the Update Mode earlier, the engine maintains intermediate counts for each window.
 However, the partial counts are not updated to the Result Table and not written to sink. The engine
@@ -641,7 +641,7 @@ appended to the Result Table only after the watermark is updated to `12:11`.
 
 Spark supports three types of time windows: tumbling (fixed), sliding and session.
 
-![The types of time windows](/img/structured-streaming-time-window-types.jpg)
+![The types of time windows](../img/structured-streaming-time-window-types.jpg)
 
 Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals. An input
 can only be bound to a single window.

diff --git a/docs/streaming/getting-started.md b/docs/streaming/getting-started.md
@@ -448,14 +448,15 @@ table, and Spark runs it as an *incremental* query on the *unbounded* input
 table. Let’s understand this model in more detail.
 
 ## Basic Concepts
+
 Consider the input data stream as the "Input Table". Every data item that is
 arriving on the stream is like a new row being appended to the Input Table.
 
-![Stream as a Table](/img/structured-streaming-stream-as-a-table.png "Stream as a Table")
+![Stream as a Table](../img/structured-streaming-stream-as-a-table.png "Stream as a Table")
 
 A query on the input will generate the "Result Table". Every trigger interval (say, every 1 second), new rows get appended to the Input Table, which eventually updates the Result Table. Whenever the result table gets updated, we would want to write the changed result rows to an external sink.
 
-![Model](/img/structured-streaming-model.png)
+![Model](../img/structured-streaming-model.png)
 
 The "Output" is defined as what gets written out to the external storage. The output can be defined in a different mode:
 
@@ -476,7 +477,7 @@ will continuously check for new data from the socket connection. If there is
 new data, Spark will run an "incremental" query that combines the previous
 running counts with the new data to compute updated counts, as shown below.
 
-![Model](/img/structured-streaming-example-model.png)
+![Model](../img/structured-streaming-example-model.png)
 
 **Note that Structured Streaming does not materialize the entire table**. It reads the latest
 available data from the streaming data source, processes it incrementally to update the result,

diff --git a/docs/streaming/performance-tips.md b/docs/streaming/performance-tips.md
@@ -26,7 +26,7 @@ license: |
 
 Asynchronous progress tracking allows streaming queries to checkpoint progress asynchronously and in parallel to the actual data processing within a micro-batch, reducing latency associated with maintaining the offset log and commit log.
 
-![Async Progress Tracking](/img/async-progress.png)
+![Async Progress Tracking](../img/async-progress.png)
 
 ## How does it work?