A real-time monitoring system built with Next.js, Prometheus, and Express that tracks service availability and performance metrics across multiple endpoints.
- 🔍 Real-time service monitoring
- 📈 Latency tracking and visualization
- 🔄 Automatic retry mechanism for failed requests
- 🚦 Status indicators with tooltips
- 🎯 Group-based service organization
-
Prometheus Exporter (Express Server)
- Runs on port 3002
- Collects metrics:
service_up
: Service availability (1 for up, 0 for down)service_response_time
: Response time in milliseconds
- Implements retry logic for failed requests
- Handles concurrent service checks
-
Prometheus Server
- Runs on port 9090
- Scrapes metrics from the exporter
- Stores time-series data
- Handles metric queries via PromQL
-
Next.js Frontend
- Server-side rendered React (Next.js) application
- Real-time metric visualization
- Responsive status indicators
- Latency graphs
- Concurrent service checking (3 services at a time)
- Configurable retry mechanism:
- Maximum retries: 3
- Retry delay: 2000ms (doubles with each retry)
- Timeout: 10 seconds per request
- Service status changes have a 5-minute grace period
- Service is considered "down" if:
- Response status is not 200-299, OR
- Response doesn't match expected format/content, AND
- Service remains in failed state for >5 minutes
- Notification Logic:
- DOWN notifications are sent only after 5-minute grace period
- RECOVERY notifications are sent immediately after a confirmed downtime
- Transient failures (<5 minutes) are ignored to reduce noise
- Downtime duration is included in recovery notifications
- Each service maintains its own state:
- Current up/down status
- Grace period status
- Last state change timestamp
- Pending alert status
- Transient failures during grace period:
- Do not trigger notifications
- Are automatically cleared if service recovers
- Reset grace period without generating alerts
- Service checks: Every 60 seconds
- Realtime view: 1 second refresh
- Service status Indicator bars: Refresh every 5 minutes
-
Install Dependencies
npm install
-
Configure Endpoints
- Edit
server/endpoints.json
to define monitored services - Each service can specify:
- URL
- Expected response format
- Custom headers
- Request body
- Group assignment
- Edit
-
Start Prometheus
./prometheus
-
Start the Exporter and Next.js
npm run start:all
{
"urls": [
{
"group": "Group Name",
"servers": [
{
"url": "https://api.example.com",
"name": "Service Name",
"help": "Service description",
"expectedResponse": {
"field": "value"
},
"body": {
"key": "value"
}
}
]
}
]
}
PORT
: Exporter port (default: 3002)PROMETHEUS_URL
: Prometheus server URL (default: http://localhost:9090)ENDPOINTS_FILE
: Path to endpoints configuration (default: ./endpoints.json)SLACK_WEBHOOK_URL
: Slack webhook URL for notifications (optional)
├── app/ # Next.js app directory
│ ├── api/ # API routes
│ ├── layout.tsx # Root layout
│ └── page.tsx # Main page
├── components/ # React components
├── hooks/ # Custom React hooks
├── lib/ # Utility functions
├── server/ # Backend services
│ ├── exporter.js # Prometheus exporter
│ └── endpoints.json # Service configuration
└── scripts/ # Helper scripts
-
Network Errors
- Automatic retry with exponential backoff
- Maximum 3 retry attempts
- Failed services marked as down after all retries exhausted
-
Response Validation
- Checks HTTP status codes
- Validates response format against expected schema
- Handles partial matches for text responses
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License - feel free to use this project for any purpose.
For issues and feature requests, please create an issue in the repository.