QuestAPI is a REST API that provides Question and Answers data to the Atelier e-Commerce platform.
QuestAPI provides an automatic Extract, Transform and Load process for the required CSV data, which populates a Mongo instance without any need for manual user interaction.
That process, combined with tests covering 77% of the code base, and a simple straightforward API that interacts with Mongo and Redis instances which are containerized with Docker, means that the service is easily scalable both vertically and horizontally in the future.
QuestAPI includes an automated ETL process built-in. It uses the Node.js Fs module as well as the Mongoose ODM to populate a Mongo database.
On initilization, QuestAPI will check the current connected Mongo instance for the required data, and invoke init() if it does not contain it.
const Promise = require('bluebird');
const mongoose = require('mongoose');
mongoose.promise = Promise;
const db = (async () => {
try {
await mongoose.connect(URI, { useNewUrlParser: true, useUnifiedTopology: true, useCreateIndex: true });
console.log('Successfully Connected to DB'
const connection = mongoose.connection;
const collectionSearch = await connection.db.listCollections().toArray();
if (isEmpty(collectionSearch)) {
console.log('Initializing DB with Data');
init();
}
return connection;
} catch(err) {
console.log(err);
console.log('Error connecting to DB');
}
})();
Once it is verified that the data is not within the database, Init() is invoked, starting the ETL process.
const path = require('path');
const populateDB = require('../csvParse/populateDB.js');
const parseQuestion = require('../csvParse/parseQuestion.js');
const init = () => {
populateDB.insertQuestions(path.join(__dirname, '../data/questions.csv'), parseQuestion);
}
This is first function of the ETL chain, but the next two functions follow the same pattern as this one.
PATTERN :
- Extract data from the CSV file using a readStream
- Transform the data to fit the Mongoose Schema
- Load the data into Mongo with Mongoose Queries
- Repeat
-
IMPORTANT : This implementation only moves as fast as Mongoose Queries can resolve, but it works within the constraints of the V8 engine's memory heap.
-
Each chunk received from the stream is about 530 rows from the CSV.
-
Each function also invokes timer() which is a HoF. The resulting function is invoked again at the end of the process, giving us the function duration.
-
For the transformations, we abstract that logic into separate functions that return the objects to be used by the Mongoose Query.
-
A log is also scheduled every 1000 queries for human sanity checks.
-
This implementation ignores insertion errors and continues on with { ordered: false }
{
insertQuestions: async (filePath, transformer) => {
const startTime = timer();
console.log('STEP 1/3: INSERT QUESTIONS')
let firstLineRead = false;
let columns = '';
let iteration = 0;
const questionReadStream = fs.createReadStream(filePath, {encoding: 'utf8'});
questionReadStream.on('data', async (res) => {
questionReadStream.pause();
let docs = [];
if (!firstLineRead) {
columns = determineColumns(res);
docs = transformRowsIntoObjects('', res, transformer);
firstLineRead = true;
} else {
docs = transformRowsIntoObjects(columns, res, transformer);
}
try {
const insertionQuery = await Question.insertMany(docs, { ordered: false });
if (++iteration % 1000 === 0) console.log(`Questions Iteration Count: ${iteration}`);
} catch (err) {
console.log('Insertion Error: continuing { ordered: false }');
}
questionReadStream.resume();
});
questionReadStream.on('error', (err) => {
console.log(err);
});
questionReadStream.on('end', () => {
console.log(`INSERT QUESTIONS ENDING. TIME ELAPSED: ${startTime()}`)
populateDB.insertAnswers(path.join(__dirname, '../data/answers.csv'), parseAnswer) // Continue Population
});
},
}
Since this ETL process relies on the speed of Mongoose Queries resolving, optimizations are focused on the readStream and chunk-size to Query-speed comparisons.
This process clocked in at an average of 45 minutes (with a fastest completion time of 30 minutes, and a slowest completion of 60 minutes). This process was run on over 12 million rows of CSV data. The times are taken from 7 iterations of this ETL process.
LAST UPDATED : 05/15/2021
FUTURE OPTIMIZATION : Horizontal scaling and load-balancing. This is easily accompishable with the arcitecture of the service being containerized.
The most significant optimization implemented after deployment was the addition of a Redis cache that operated with short expiration timers. I decided this in order to cache the most popular products. On page load, the Questions route would be the route that has a 100% chance of being called. Since this was the case, I wanted to focus on optimizing the most products that were the most popular, handling strong upticks in traffic.
The results of the tests indicate that that was successful, but now we need to scale horizontally in order to handle varied requests.
all stress tests done with Loader.io on an AWS EC2 T2 Micro Instance
The most expensive route is to GET a question since it must also be populated with all of a single product's questions and answers, denormalized.
These next two results are for repeated requests against the Questions at 500 clients per second.
As you can see, adding Redis caching decreased the response time to 2% of its original and in increase to 100% successful responses instead of the previous 21% success rate.
--
These next two results are for repeated requests against the Answer endpoint at 750 clients per second.
The post-Redis optimized results show that we have a response time 1% of the original and nearly 100% of the possible successful repsonses completed.
- Database Initializes with data
- Client sends GET request
- Server checks Redis cache which operates off of LRU timers
- If the requested information is within the Redis cache, return response to the Client
- If the requested information is not within the Redis cache, Query the Database
- Once the request returns from Mongo, add it to the Redis cache with an expiration timer
- Return the response to the Client
All example requests are made with SuperAgent
Retrieves a list of questions for the requested product, not including reported questions.
Header | Type | Description |
---|---|---|
product_id | String | Specifies the product for which to receive questions |
const res = await request.get('/api/qa/questions').set('product_id', '1');
Response: Status 200 OK
{
"product_id": "1",
"results": [
{
"_id": "609bec05262cc03dd6ef9ce7",
"question_id": 1,
"product_id": "1",
"question_body": "What fabric is the top made of?",
"question_date": "2020-07-27T16:18:34-05:00",
"asker_name": "yankeelover",
"asker_email": "first.last@gmail.com",
"reported": false,
"question_helpfulness": 1,
"__v": 0,
"answers": {
"5": {
"photos": [
{
"photo_id": 1,
"url": "https://images.unsplash.com/photo-1530519729491-aea5b51d1ee1?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1651&q=80"
}
],
"answer_id": 5,
"answer_body": "Something pretty soft but I can't be sure",
"answer_date": "2020-09-13T04:49:20-05:00",
"answerer_name": "metslover",
"reported": false,
"answer_helpfulness": 5,
"answerer_email": "first.last@gmail.com",
"__v": 0
},
"7": {
"photos": [],
"answer_id": 7,
"answer_body": "Its the best! Seriously magic fabric",
"answer_date": "2021-02-27T12:45:24-06:00",
"answerer_name": "metslover",
"reported": false,
"answer_helpfulness": 7,
"answerer_email": "first.last@gmail.com",
"__v": 0
},
"8": {
"photos": [],
"answer_id": 8,
"answer_body": "DONT BUY IT! It's bad for the environment",
"answer_date": "2020-09-19T16:49:22-05:00",
"answerer_name": "metslover",
"reported": false,
"answer_helpfulness": 8,
"answerer_email": "first.last@gmail.com",
"__v": 0
},
}
}
]
}
Returns answers for a given question, not including reported answers.
Parameter | Type | Description |
---|---|---|
question_id | Integer | Specifies the product for which to receive answers |
const res = await request.get('/api/qa/questions/1/answers');
Response: Status 200 OK
{
"question": "1",
"page": 0,
"count": 5,
"results": [
{
"photos": [
{
"photo_id": 1,
"url": "https://images.unsplash.com/photo-1530519729491-aea5b51d1ee1?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1651&q=80"
},
{
"photo_id": 2,
"url": "https://images.unsplash.com/photo-1511127088257-53ccfcc769fa?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1650&q=80"
},
{
"photo_id": 3,
"url": "https://images.unsplash.com/photo-1500603720222-eb7a1f997356?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1653&q=80"
}
],
"answer_id": 5,
"answer_body": "Something pretty soft but I can't be sure",
"answer_date": "2020-09-13T04:49:20-05:00",
"answerer_name": "metslover",
"reported": false,
"answer_helpfulness": 5,
"answerer_email": "first.last@gmail.com",
"__v": 0
},
{
"photos": [],
"answer_id": 7,
"answer_body": "Its the best! Seriously magic fabric",
"answer_date": "2021-02-27T12:45:24-06:00",
"answerer_name": "metslover",
"reported": false,
"answer_helpfulness": 7,
"answerer_email": "first.last@gmail.com",
"__v": 0
},
]
}
Creates a question for the specified product.
Body Parameters
Parameter | Type | Description |
---|---|---|
body | String | Text of the question being asked |
name | String | name of the question asker |
String | Email address of the question asker | |
product_id | Integer | Required: ID of the product for which the question is submitted |
const res = await request.post('/api/qa/questions')
.send({ product_id: '25' })
.send({ body: 'Does this make me look hip?!' })
.send({ name: 'CoolGuy221' })
.send({ email: 'coolguy@email.com' });;
Response: Status 201 Created
Creates an answer for the specified question.
Parameters
Parameter | Type | Description |
---|---|---|
question_id | Integer | Required: ID of the question being answered |
Body Parameters
Parameter | Type | Description |
---|---|---|
body | String | Text of the question being askwer |
name | String | name of the question asker |
String | Email address of the question asker | |
photos | [String] | List of photo urls |
const res = await request.post('/api/qa/questions/30/answers')
.send({ body: 'Hey I have an answer to your question!'})
.send({ name: 'I\'m a question asker' })
.send({ email: 'answer@email.com' })
.send({ photos: [ 'url1', 'url2' ]});
Response: Status 201 Created
Adds to the helpfulness count of a specified question.
Parameter | Type | Description |
---|---|---|
question_id | Integer | Required: ID of the question being marked as helpful |
const res = await request.put('/api/qa/questions/1/helpful');
Response: Status 204 No Content
Reports a question.
Parameter | Type | Description |
---|---|---|
question_id | Integer | Required: ID of the question being reported |
const res = await request.put('/api/qa/questions/1/report');
Response: Status 204 No Content
Adds to the helpfulness count of a specified answer.
Parameter | Type | Description |
---|---|---|
answer_id | Integer | Required: ID of the question being marked as helpful |
const res = await request.put('/api/qa/answers/2/helpful');
Response: Status 204 No Content
Reports an answer.
Parameter | Type | Description |
---|---|---|
answer_id | Integer | Required: ID of the question being reported |
const res = await request.put('/api/qa/answers/2/helpful');
Response: Status 204 No Content