Gladia Home Assignement

Instructions

I have reported the instructions provided in Instructions.md.

Results

Task 1: Technical Understanding and API Integration

API Integration

No need to know React, just run the app and browse!

Load the React app locally

git clone git@github.com:j5py/gladia-home-assignment.git

Install the app
```
cd gladia-home-assignment; npm install
```
Run the app
```
npm start
```
Interact with the app that should now have automatically opened on localhost:3000
Review the code written to implement the API

Technical Write-Up

How to Generate a Bullet Point Summary of an Audio File

The following Shell commands can be executed from a Unix/Linux terminal. The syntax may need to be adapted for Command Prompt, for example, on Windows OS. However, you can use Git Bash (especially from a Visual Studio Code terminal). Using curl and a terminal facilitates testing, the documentation also provides examples in Python, JavaScript, PHP, Go, and Java.

Quick Overview of the Necessary Steps

Upload the file to Gladia and get its URL in return
Request Gladia to proceed with the transcription of the file at the given URL
A new URL allows you to obtain a dataset, including the text of the transcription

Upload Your File to Gladia

curl --request POST \
  --url https://api.gladia.io/v2/upload \
  --header 'Content-Type: multipart/form-data' \
  --header 'x-gladia-key: <your_gladia_api_key>' \
  --form audio=@<your_audio_file>

Placeholder	Example	Comment
<your_gladia_api_key>		Get a key in less than a minute and for free after creating your account
<your_audio_file>	./audio.wav	In case the audio file is in the current directory from which you run curl

After a short delay for the request to be processed, you will receive a JSON dataset, and one key particularly interests us here: audio_url. Note the corresponding value for the next step.

Request Gladia to Process the Transcription of the Uploaded File

curl --request POST \
  --url https://api.gladia.io/v2/transcription \
  --header 'Content-Type: application/json' \
  --header 'x-gladia-key: <your_gladia_api_key>' \
  --data '{
    "audio_to_llm": true,
    "audio_to_llm_config": {
      "prompts": [
        "Extract the key points from the transcription as bullet points"
      ]
    },
    "audio_url": "<your_url_received_after_last_upload>"
  }'

JSON dataset	Comment
`audio_to_llm`	This parameter allows the subsequent configuration point to be taken into account
`audio_to_llm_config`	Edit the content of the `prompts` list (or array) as comma-separated strings

Remember to edit the placeholders, but retain the double quotes in the --data option to ensure it matches the JSON format. Check cURL description of Create Transcription to see an example of a custom prompt request, and Audio to LLM for several prompts in a single request. Finally, this time it's the value of result_url that you need to note from the response.

Check that the Requested Transcription is Available

curl --request GET \
  --url <your_result_url> \
  --header 'x-gladia-key: <your_gladia_api_key>'

Note that the value of the key status may initially return queued or processing. In those case, you will need to repeat the request until you obtain done in a JSON dataset that may vary in size depending on the length of the uploaded audio.

Filter the Dataset to Obtain the Summary

First, let's continue with curl to complete the process. This request is identical to the previous one, but it allows for filtering of a potentially large result.

curl --request GET \
  --url <your_result_url> \
  --header 'x-gladia-key: <your_gladia_api_key>' | \
  grep -o '"response":"[^"]*' | \
  grep -o '[^:]*$' | \
  tr -d '"'

Note that this command will only filter the first occurrence of response corresponding to the first prompt. Subsequent occurrences will be ignored if your previous request contained more than one prompt (in this case, you might consider jq).

Snippet	Description
\	Allows you to split a long command across multiple lines
\|	Pipes output to the next command
grep -o '"response":"[^"]*' \|	captures everything from "response":" up to the next double quote
grep -o '[^:]*$' \|	Captures the content after the last colon
tr -d '"'	Removes double quotes from the output

Now with JavaScript, you can also request the same URL as in the previous curl command to achieve a more immediate reading of the response access. You can open Chrome DevTools and paste your edited version of the code below to execute it from your browser's console.

fetch('<your_result_url>', {
    method: 'GET',
    headers: { 'x-gladia-key': '<your_gladia_api_key>' }
}).then(
    jsonString => jsonString.json()
).then(
    jsObject => console.log(jsObject.result.audio_to_llm.results[0].results.response)
)

Knowing that the access can be read with <receivedObject>.result.audio_to_llm.results[0].results.response in JavaScript helps us understand that the value we are looking for is part of a much larger structure, which is why it can be tedious to find using the raw result of curl. In the case where your previous request contained more than one prompt, you will need to iterate through the results in this way, for example: <receivedObject>.result.audio_to_llm.results.forEach(<item> => ...

Task 2: Customer Support Simulation

Mock Tickets

Ticket 1

"I keep getting a 401 Unauthorized error when trying to access the API."

Re:

Good Day,

Thank you for your message. I'm glad to assist you with this matter.

I would first recommend checking if the x-gladia-key value used in your request exists and fully matches one of the API keys listed on your Account page. A simple visual check can help spot if the key is altered, but it's good practice to use the 'copy to clipboard' feature and paste the key into your request code to ensure you have the correct value.

Please let me know if this allows you to solve the issue. If you continue to receive the 401 status code, I will first need you to provide me with a few details about your technical environment, such as the code snippet of the request and/or a screenshot of your test.

Kind regards

Ticket 2

"I receive an error when trying some of my files: 413 file too large. Why is that and what can I do?"

Re:

Good Day,

Thank you for contacting support. I'm here to help you with your issue.

This is the expected behavior of the Gladia API for performance reasons. Currently, the API has limitations of 135 minutes and 500 MB in size, but we provide you with a few tools for splitting audio files. You need to initiate the setup of one of these tools based on your environment. Please do not hesitate to confirm when it is resolved, or reach out if you encounter any issues with the configuration.

Kind regards

Ticket 3

"I'm seeing a significant delay in receiving the transcription results. Is there a way to speed this up?"

Re:

Good Day,

Thank you for reaching out to us. We appreciate your inquiry.

We can provide high concurrency based on your needs. If you are experiencing a significant delay in receiving results, it is likely due to handling high volumes of requests that exceed the default values. We can offer you additional computational capacity to manage a large number of requests or operations simultaneously.

Could you please share your availability for a meeting to discuss your usage requirements and evaluate the solution?

Kind regards

Ticket 4

"Do you support Chinese?"

Re:

Good Day,

Thanks for reaching out! I'm here to help you with any questions you may have.

Absolutely! Chinese, identified by the ISO 639 language code zh, is one of the available options under Create Transcription > Body > language and is also one of the possible values of target_languages in the examples accessible through Translation > Sample code.

Please let me know if you are making progress. I am available to assist you if you encounter any difficulties. You may also refer to the complete list of supported languages for any future needs.

Kind regards

Ticket 5

"We received complaints from our customers about some very inaccurate transcriptions yesterday, can you check what happened?"

Re:

Good Day,

I appreciate you getting in touch. Let's work together to resolve your concern.

Could you please specify which functionality of Gladia or API key is subject to inaccurate transcriptions, and could you provide us with any records of the requests and responses?

Here are some ways to fine-tune the configuration by functionality to prevent inaccuracy:

Speech Recognition
- Multiple languages detection : Is not activated by default and allows switching the transcription language, but strong accents may cause misinterpretation.
- Custom vocabulary and Custom spelling can significantly enhance the accuracy of transcriptions in specific contexts, such as professional environments where the use of acronyms and internal expressions is commonplace.
Live Speech Recognition
- Recovering from WebSocket disconnects could help prevent missing parts that would cause the transcription to appear erroneous.
- Chunk Size Limitation and Thirty Seconds Behavior should also be considered to ensure the complete input/output is captured.
Speaker Diarization
- Speaker diarization may also improve accuracy by determining the identity of each speaker and attributing their statements accordingly.

If it is complex for you to specify the context of the feedback observed, we could schedule a meeting to identify it at your convenience.

Kind regards

Task 3: Technical Documentation and Communication

Documentation Review

The added section to enhance the original documentation is italicized.

Audio to LLM

Harness the power of the LLM command prompt to access a range of processing capabilities beyond transcription, all in a single request. In addition to speech-to-text, you will fully leverage the value of its content. Ask any question or analysis, as you would do with an assistant.

You only need to write the different treatments you wish to apply to the transcription, with the prompts listed as comma-separated values in the prompts list (or array). In addition to the complete transcription, you will obtain a list of results: each will contain the prompt sent to the LLM and its associated output.

The Audio to LLM feature applies your own prompts to the audio transcription.

import requests
import time


def make_fetch_request(url, headers, method='GET', data=None):
    if method == 'POST':
        response = requests.post(url, headers=headers, json=data)
    else:
        response = requests.get(url, headers=headers)
    return response.json()


gladia_key = "YOUR_GLADIA_API_TOKEN"
request_data = {
  "audio_url": "YOUR_AUDIO_URL",
  "audio_to_llm": True,
  "audio_to_llm_config": {
    "prompts": [
      "Extract the key points from the transcription as bullet points",
      "Generate a title from this transcription"
    ]
  }
}
gladia_url = "https://api.gladia.io/v2/transcription/"

headers = {
    "x-gladia-key": gladia_key,
    "Content-Type": "application/json"
}

print("- Sending initial request to Gladia API...")
initial_response = make_fetch_request(
    gladia_url, headers, 'POST', request_data)

print("Initial response with Transcription ID:", initial_response)
result_url = initial_response.get("result_url")

if result_url:
    while True:
        print("Polling for results...")
        poll_response = make_fetch_request(result_url, headers)

        if poll_response.get("status") == "done":
            print("- Transcription done: \n")
            audio_to_llm_results = poll_response.get("result", {}).get(
                "audio_to_llm", {})
            print(audio_to_llm_results)
            break
        else:
            print("Transcription status:", poll_response.get("status"))
        time.sleep(1)

With this code, your output will look like this:

{
  "success": true,
  "is_empty": false,
  "results": [
    {
      "success": true,
      "is_empty": false,
      "results": {
        "prompt": "Extract the key points from the transcription as bullet points",
        "response": "The main entities key points from the transcription are:\n- ..."
      },
      "exec_time": 1.7726809978485107,
      "error": null
    },
    {
      "success": true,
      "is_empty": false,
      "results": {
        "prompt": "Generate a title from this transcription",
        "response": "The Great Title"
      },
      "exec_time": 1.7832809978258485,
      "error": null
    }
  ],
  "exec_time": 6.127103805541992,
  "error": null
}

You’ll find your custom prompts results of your audio under the results key.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
public		public
src		src
.gitignore		.gitignore
Instructions.md		Instructions.md
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gladia Home Assignement

Instructions

Results

Task 1: Technical Understanding and API Integration

API Integration

Technical Write-Up

How to Generate a Bullet Point Summary of an Audio File

Quick Overview of the Necessary Steps

Upload Your File to Gladia

Request Gladia to Process the Transcription of the Uploaded File

Check that the Requested Transcription is Available

Filter the Dataset to Obtain the Summary

Task 2: Customer Support Simulation

Mock Tickets

Ticket 1

Re:

Ticket 2

Re:

Ticket 3

Re:

Ticket 4

Re:

Ticket 5

Re:

Task 3: Technical Documentation and Communication

Documentation Review

Audio to LLM

About

Releases

Packages

Languages

j5py/gladia-home-assignment

Folders and files

Latest commit

History

Repository files navigation

Gladia Home Assignement

Instructions

Results

Task 1: Technical Understanding and API Integration

API Integration

Technical Write-Up

How to Generate a Bullet Point Summary of an Audio File

Quick Overview of the Necessary Steps

Upload Your File to Gladia

Request Gladia to Process the Transcription of the Uploaded File

Check that the Requested Transcription is Available

Filter the Dataset to Obtain the Summary

Task 2: Customer Support Simulation

Mock Tickets

Ticket 1

Re:

Ticket 2

Re:

Ticket 3

Re:

Ticket 4

Re:

Ticket 5

Re:

Task 3: Technical Documentation and Communication

Documentation Review

Audio to LLM

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages