diff --git a/README.md b/README.md index 41613859e7..2c2283f711 100644 --- a/README.md +++ b/README.md @@ -165,7 +165,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no | | [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no | | [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecycle used for runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no | -| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metric = optional(string, null) # deprectaed
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | +| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
'features': Enable or disable features of the termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metric = optional(string, null) # deprectaed
features = optional(object({
enable_spot_termination_handler = optional(bool, true)
enable_spot_termination_notification_watcher = optional(bool, true)
}), {})
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | | [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win). | `list(string)` |
[
"m5.large",
"c5.large"
]
| no | | [job\_queue\_retention\_in\_seconds](#input\_job\_queue\_retention\_in\_seconds) | The number of seconds the job is held in the queue before it is purged. | `number` | `86400` | no | | [job\_retry](#input\_job\_retry) | Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app.

`enable`: Enable or disable the job retry feature.
`delay_in_seconds`: The delay in seconds before the job retry check lambda will check the job status.
`delay_backoff`: The backoff factor for the delay.
`lambda_memory_size`: Memory size limit in MB for the job retry check lambda.
`lambda_timeout`: Time out of the job retry check lambda in seconds.
`max_attempts`: The maximum number of attempts to retry the job. |
object({
enable = optional(bool, false)
delay_in_seconds = optional(number, 300)
delay_backoff = optional(number, 2)
lambda_memory_size = optional(number, 256)
lambda_timeout = optional(number, 30)
max_attempts = optional(number, 1)
})
| `{}` | no | @@ -257,6 +257,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | Name | Description | |------|-------------| | [binaries\_syncer](#output\_binaries\_syncer) | n/a | +| [instance\_termination\_handler](#output\_instance\_termination\_handler) | n/a | | [instance\_termination\_watcher](#output\_instance\_termination\_watcher) | n/a | | [queues](#output\_queues) | SQS queues. | | [runners](#output\_runners) | n/a | diff --git a/docs/configuration.md b/docs/configuration.md index 0eae2195ec..4b5f33507e 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -215,21 +215,35 @@ In case the setup does not work as intended, trace the events through this seque ### Termination watcher -This feature is in early stage and therefore disabled by default. +This feature is in early stage and therefore disabled by default. To enable the watcher, set `instance_termination_watcher.enable = true`. -The termination watcher is currently watching for spot termination notifications. The module is only taken events into account for instances tagged with `ghr:environment` by default when deployment the module as part of one of the main modules (root or multi-runner). The module can also be deployed stand-alone, in that case the tag filter needs to be tunned. +The termination watcher is currently watching for spot terminations. The module is only taken events into account for instances tagged with `ghr:environment` by default when deployment the module as part of one of the main modules (root or multi-runner). The module can also be deployed stand-alone, in this case, the tag filter needs to be tunned. + +### Termination notification + +The watcher is listening for spot termination warnings and create a log message and optionally a metric. The watcher is disabled by default. The feature is enabled once the watcher is enabled, the feature can be disabled explicit by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`. - Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running. As well some other details. - Metrics: Metrics are disabled by default, this to avoid costs. Once enabled a metric will be created for each warning with at least dimensions for the environment and instance type. THe metric name space can be configured via the variables. The metric name used is `SpotInterruptionWarning`. -#### Log example +### Termination handler + +!!! warning + This feature will only work once the CloudTrail is enabled. + +The termination handler is listening for spot terminations by capture the `BidEvictedEvent` via CloudTrail. The handler will log and optionally create a metric for each termination. The intend is to enhance the logic to inform the user about the termination via the GitHub Job or Workflow run. The feature is disabled by default. The feature is enabled once the watcher is enabled, the feature can be disabled explicit by setting `instance_termination_watcher.features.enable_spot_termination_handler = false`. + +- Logs: The module will log all termination notifications. For each warning it will look up instance details and log the environment, instance type and time the instance is running. As well some other details. +- Metrics: Metrics are disabled by default, this to avoid costs. Once enabled a metric will be created for each termination with at least dimensions for the environment and instance type. THe metric name space can be configured via the variables. The metric name used is `SpotTermination`. + +### Log example (both warnings and terminations) Below an example of the the log messages created. ``` { "level": "INFO", - "message": "Received spot notification warning:", + "message": "Received spot notification for ${metricName}", "environment": "default", "instanceId": "i-0039b8826b3dcea55", "instanceType": "c5.large", diff --git a/examples/default/main.tf b/examples/default/main.tf index a775872137..90c889319b 100644 --- a/examples/default/main.tf +++ b/examples/default/main.tf @@ -78,7 +78,7 @@ module "runners" { # Let the module manage the service linked role # create_service_linked_role_spot = true - instance_types = ["m5.large", "c5.large"] + instance_types = ["m7a.large", "m5.large"] # override delay of events in seconds delay_webhook_event = 5 @@ -122,7 +122,7 @@ module "runners" { # metric = { # enable_spot_termination_warning = true # enable_job_retry = false - # enable_github_app_rate_limit = true + # enable_github_app_rate_limit = false # } # } diff --git a/lambdas/functions/termination-watcher/src/ConfigResolver.ts b/lambdas/functions/termination-watcher/src/ConfigResolver.ts index 477eb613c9..9e98b2a20a 100644 --- a/lambdas/functions/termination-watcher/src/ConfigResolver.ts +++ b/lambdas/functions/termination-watcher/src/ConfigResolver.ts @@ -2,6 +2,7 @@ import { createChildLogger } from '@aws-github-runner/aws-powertools-util'; export class Config { createSpotWarningMetric: boolean; + createSpotTerminationMetric: boolean; tagFilters: Record; prefix: string; @@ -11,6 +12,7 @@ export class Config { logger.debug('Loading config from environment variables', { env: process.env }); this.createSpotWarningMetric = process.env.ENABLE_METRICS_SPOT_WARNING === 'true'; + this.createSpotTerminationMetric = process.env.ENABLE_METRICS_SPOT_TERMINATION === 'true'; this.prefix = process.env.PREFIX ?? ''; this.tagFilters = { 'ghr:environment': this.prefix }; diff --git a/lambdas/functions/termination-watcher/src/ec2.test.ts b/lambdas/functions/termination-watcher/src/ec2.test.ts new file mode 100644 index 0000000000..3a38339dc2 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/ec2.test.ts @@ -0,0 +1,93 @@ +import { EC2Client, DescribeInstancesCommand, DescribeInstancesResult } from '@aws-sdk/client-ec2'; +import { mockClient } from 'aws-sdk-client-mock'; +import { getInstances, tagFilter } from './ec2'; + +const ec2Mock = mockClient(EC2Client); + +describe('getInstances', () => { + beforeEach(() => { + ec2Mock.reset(); + }); + + it('should return the instance when found', async () => { + const instanceId = 'i-1234567890abcdef0'; + const instance = { InstanceId: instanceId }; + ec2Mock.on(DescribeInstancesCommand).resolves({ + Reservations: [{ Instances: [instance] }], + }); + + const result = await getInstances(new EC2Client({}), [instanceId]); + expect(result).toEqual([instance]); + }); + + describe('should return null when the instance is not found', () => { + it.each([{ Reservations: [] }, {}, { Reservations: undefined }])( + 'with %p', + async (item: DescribeInstancesResult) => { + const instanceId = 'i-1234567890abcdef0'; + ec2Mock.on(DescribeInstancesCommand).resolves(item); + + const result = await getInstances(new EC2Client({}), [instanceId]); + expect(result).toEqual([]); + }, + ); + }); +}); + +describe('tagFilter', () => { + describe('should return true when the instance matches the tag filters', () => { + it.each([{ Environment: 'production' }, { Environment: 'prod' }])( + 'with %p', + (tagFilters: Record) => { + const instance = { + Tags: [ + { Key: 'Name', Value: 'test-instance' }, + { Key: 'Environment', Value: 'production' }, + ], + }; + + const result = tagFilter(instance, tagFilters); + expect(result).toBe(true); + }, + ); + }); + + it('should return false when the instance does not have all the tags', () => { + const instance = { + Tags: [{ Key: 'Name', Value: 'test-instance' }], + }; + const tagFilters = { Name: 'test', Environment: 'prod' }; + + const result = tagFilter(instance, tagFilters); + expect(result).toBe(false); + }); + + it('should return false when the instance does not have any tags', () => { + const instance = {}; + const tagFilters = { Name: 'test', Environment: 'prod' }; + + const result = tagFilter(instance, tagFilters); + expect(result).toBe(false); + }); + + it('should return true if the tag filters are empty', () => { + const instance = { + Tags: [ + { Key: 'Name', Value: 'test-instance' }, + { Key: 'Environment', Value: 'production' }, + ], + }; + const tagFilters = {}; + + const result = tagFilter(instance, tagFilters); + expect(result).toBe(true); + }); + + it('should return false if instance is null', () => { + const instance = null; + const tagFilters = { Name: 'test', Environment: 'prod' }; + + const result = tagFilter(instance, tagFilters); + expect(result).toBe(false); + }); +}); diff --git a/lambdas/functions/termination-watcher/src/ec2.ts b/lambdas/functions/termination-watcher/src/ec2.ts new file mode 100644 index 0000000000..56ee7b3848 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/ec2.ts @@ -0,0 +1,13 @@ +import { DescribeInstancesCommand, EC2Client, Instance } from '@aws-sdk/client-ec2'; + +export async function getInstances(ec2: EC2Client, instanceId: string[]): Promise { + const result = await ec2.send(new DescribeInstancesCommand({ InstanceIds: instanceId })); + const instances = result.Reservations?.[0]?.Instances; + return instances ?? []; +} + +export function tagFilter(instance: Instance | null, tagFilters: Record): boolean { + return Object.keys(tagFilters).every((key) => { + return instance?.Tags?.find((tag) => tag.Key === key && tag.Value?.startsWith(tagFilters[key])); + }); +} diff --git a/lambdas/functions/termination-watcher/src/lambda.test.ts b/lambdas/functions/termination-watcher/src/lambda.test.ts index 6214d315a2..2478999c26 100644 --- a/lambdas/functions/termination-watcher/src/lambda.test.ts +++ b/lambdas/functions/termination-watcher/src/lambda.test.ts @@ -3,14 +3,16 @@ import { Context } from 'aws-lambda'; import { mocked } from 'jest-mock'; import { handle as interruptionWarningHandlerImpl } from './termination-warning'; -import { interruptionWarning } from './lambda'; -import { SpotInterruptionWarning, SpotTerminationDetail } from './types'; +import { handle as terminationHandlerImpl } from './termination'; +import { interruptionWarning, termination } from './lambda'; +import { BidEvictedDetail, BidEvictedEvent, SpotInterruptionWarning, SpotTerminationDetail } from './types'; jest.mock('./termination-warning'); +jest.mock('./termination'); process.env.POWERTOOLS_METRICS_NAMESPACE = 'test'; process.env.POWERTOOLS_TRACE_ENABLED = 'true'; -const event: SpotInterruptionWarning = { +const spotInstanceInterruptionEvent: SpotInterruptionWarning = { version: '0', id: '1', 'detail-type': 'EC2 Spot Instance Interruption Warning', @@ -25,6 +27,42 @@ const event: SpotInterruptionWarning = { }, }; +const bidEvictedEvent: BidEvictedEvent = { + version: '0', + id: '186d7999-3121-e749-23f3-c7caec1084e1', + 'detail-type': 'AWS Service Event via CloudTrail', + source: 'aws.ec2', + account: '123456789012', + time: '2024-10-09T11:48:46Z', + region: 'eu-west-1', + resources: [], + detail: { + eventVersion: '1.10', + userIdentity: { + accountId: '123456789012', + invokedBy: 'sec2.amazonaws.com', + }, + eventTime: '2024-10-09T11:48:46Z', + eventSource: 'ec2.amazonaws.com', + eventName: 'BidEvictedEvent', + awsRegion: 'eu-west-1', + sourceIPAddress: 'ec2.amazonaws.com', + userAgent: 'ec2.amazonaws.com', + requestParameters: null, + responseElements: null, + requestID: 'ebf032e3-5009-3484-aae8-b4946ab2e2eb', + eventID: '3a15843b-96c2-41b1-aac1-7d62dc754547', + readOnly: false, + eventType: 'AwsServiceEvent', + managementEvent: true, + recipientAccountId: '123456789012', + serviceEventDetails: { + instanceIdSet: ['i-12345678901234567'], + }, + eventCategory: 'Management', + }, +}; + const context: Context = { awsRequestId: '1', callbackWaitsForEmptyEventLoop: false, @@ -48,6 +86,10 @@ const context: Context = { // Docs for testing async with jest: https://jestjs.io/docs/tutorial-async describe('Handle sport termination interruption warning', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + it('should not throw or log in error.', async () => { const mock = mocked(interruptionWarningHandlerImpl); mock.mockImplementation(() => { @@ -55,7 +97,7 @@ describe('Handle sport termination interruption warning', () => { resolve(); }); }); - await expect(interruptionWarning(event, context)).resolves.not.toThrow(); + await expect(interruptionWarning(spotInstanceInterruptionEvent, context)).resolves.not.toThrow(); }); it('should not throw only log in error in case of an exception.', async () => { @@ -63,7 +105,33 @@ describe('Handle sport termination interruption warning', () => { const error = new Error('An error.'); const mock = mocked(interruptionWarningHandlerImpl); mock.mockRejectedValue(error); - await expect(interruptionWarning(event, context)).resolves.toBeUndefined(); + await expect(interruptionWarning(spotInstanceInterruptionEvent, context)).resolves.toBeUndefined(); + + expect(logSpy).toHaveBeenCalledTimes(1); + }); +}); + +describe('Handle sport termination (BidEvictEvent', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + it('should not throw or log in error.', async () => { + const mock = mocked(terminationHandlerImpl); + mock.mockImplementation(() => { + return new Promise((resolve) => { + resolve(); + }); + }); + await expect(termination(bidEvictedEvent, context)).resolves.not.toThrow(); + }); + + it('should not throw only log in error in case of an exception.', async () => { + const logSpy = jest.spyOn(logger, 'error'); + const error = new Error('An error.'); + const mock = mocked(terminationHandlerImpl); + mock.mockRejectedValue(error); + await expect(termination(bidEvictedEvent, context)).resolves.toBeUndefined(); expect(logSpy).toHaveBeenCalledTimes(1); }); diff --git a/lambdas/functions/termination-watcher/src/lambda.ts b/lambdas/functions/termination-watcher/src/lambda.ts index a4f5696666..77949dd954 100644 --- a/lambdas/functions/termination-watcher/src/lambda.ts +++ b/lambdas/functions/termination-watcher/src/lambda.ts @@ -4,7 +4,8 @@ import { logMetrics } from '@aws-lambda-powertools/metrics/middleware'; import { Context } from 'aws-lambda'; import { handle as handleTerminationWarning } from './termination-warning'; -import { SpotInterruptionWarning, SpotTerminationDetail } from './types'; +import { handle as handleTermination } from './termination'; +import { BidEvictedDetail, BidEvictedEvent, SpotInterruptionWarning, SpotTerminationDetail } from './types'; import { Config } from './ConfigResolver'; const config = new Config(); @@ -24,6 +25,18 @@ export async function interruptionWarning( } } +export async function termination(event: BidEvictedEvent, context: Context): Promise { + setContext(context, 'lambda.ts'); + logger.logEventIfEnabled(event); + logger.debug('Configuration of the lambda', { config }); + + try { + await handleTermination(event, config); + } catch (e) { + logger.error(`${(e as Error).message}`, { error: e as Error }); + } +} + const addMiddleware = () => { const middleware = middy(interruptionWarning); diff --git a/lambdas/functions/termination-watcher/src/metric-event.test.ts b/lambdas/functions/termination-watcher/src/metric-event.test.ts new file mode 100644 index 0000000000..88a0b82f20 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/metric-event.test.ts @@ -0,0 +1,82 @@ +import { Instance } from '@aws-sdk/client-ec2'; +import 'aws-sdk-client-mock-jest'; +import { SpotInterruptionWarning, SpotTerminationDetail } from './types'; +import { createSingleMetric } from '@aws-github-runner/aws-powertools-util'; +import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import { metricEvent } from './metric-event'; + +jest.mock('@aws-github-runner/aws-powertools-util', () => ({ + ...jest.requireActual('@aws-github-runner/aws-powertools-util'), + // eslint-disable-next-line @typescript-eslint/no-unused-vars + createSingleMetric: jest.fn((name: string, unit: string, value: number, dimensions?: Record) => { + return { + addMetadata: jest.fn(), + }; + }), +})); + +const event: SpotInterruptionWarning = { + version: '0', + id: '1', + 'detail-type': 'EC2 Spot Instance Interruption Warning', + source: 'aws.ec2', + account: '123456789012', + time: '2015-11-11T21:29:54Z', + region: 'us-east-1', + resources: ['arn:aws:ec2:us-east-1b:instance/i-abcd1111'], + detail: { + 'instance-id': 'i-abcd1111', + 'instance-action': 'terminate', + }, +}; + +const instance: Instance = { + InstanceId: event.detail['instance-id'], + InstanceType: 't2.micro', + Tags: [ + { Key: 'Name', Value: 'test-instance' }, + { Key: 'ghr:environment', Value: 'test' }, + { Key: 'ghr:created_by', Value: 'niek' }, + ], + State: { Name: 'running' }, + LaunchTime: new Date('2021-01-01'), +}; + +describe('create metric and metric logs', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + it('should log and create a metric', async () => { + const metricName = 'SpotInterruptionWarning'; + await metricEvent(instance, event, metricName, console); + expect(createSingleMetric).toHaveBeenCalledTimes(1); + expect(createSingleMetric).toHaveBeenCalledWith(metricName, MetricUnit.Count, 1, { + InstanceType: instance.InstanceType ? instance.InstanceType : 'unknown', + Environment: instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown', + }); + }); + + it('should log and create a metric for instance with limited data', async () => { + const metricName = 'SpotInterruptionWarning'; + const instanceMinimalData: Instance = { + ...instance, + InstanceId: undefined, + InstanceType: undefined, + LaunchTime: undefined, + Tags: undefined, + }; + + await metricEvent(instanceMinimalData, event, metricName, console); + expect(createSingleMetric).toHaveBeenCalledTimes(1); + expect(createSingleMetric).toHaveBeenCalledWith(metricName, MetricUnit.Count, 1, { + InstanceType: instanceMinimalData.InstanceType ? instanceMinimalData.InstanceType : 'unknown', + Environment: instanceMinimalData.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown', + }); + }); + + it('should log and create NOT create a metric', async () => { + await expect(metricEvent(instance, event, undefined, console)).resolves.not.toThrow(); + expect(createSingleMetric).not.toHaveBeenCalled(); + }); +}); diff --git a/lambdas/functions/termination-watcher/src/metric-event.ts b/lambdas/functions/termination-watcher/src/metric-event.ts new file mode 100644 index 0000000000..ece33213a6 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/metric-event.ts @@ -0,0 +1,34 @@ +import { createSingleMetric } from '@aws-github-runner/aws-powertools-util'; +import { Instance } from '@aws-sdk/client-ec2'; +import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import { Logger } from '@aws-sdk/types'; +import { EventBridgeEvent } from 'aws-lambda'; + +export async function metricEvent( + instance: Instance, + event: EventBridgeEvent, + metricName: string | undefined, + logger: Logger, +): Promise { + const instanceRunningTimeInSeconds = instance.LaunchTime + ? (new Date(event.time).getTime() - new Date(instance.LaunchTime).getTime()) / 1000 + : undefined; + logger.info(`Received spot notification for ${metricName}`, { + instanceId: instance.InstanceId, + instanceType: instance.InstanceType ?? 'unknown', + instanceName: instance.Tags?.find((tag) => tag.Key === 'Name')?.Value, + instanceState: instance.State?.Name, + instanceLaunchTime: instance.LaunchTime, + instanceRunningTimeInSeconds, + tags: instance.Tags, + }); + if (metricName) { + const metric = createSingleMetric(metricName, MetricUnit.Count, 1, { + InstanceType: instance.InstanceType ? instance.InstanceType : 'unknown', + Environment: instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown', + }); + metric.addMetadata('InstanceId', instance.InstanceId ?? 'unknown'); + metric.addMetadata('InstanceType', instance.InstanceType ? instance.InstanceType : 'unknown'); + metric.addMetadata('Environment', instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown'); + } +} diff --git a/lambdas/functions/termination-watcher/src/termination-warning.test.ts b/lambdas/functions/termination-watcher/src/termination-warning.test.ts index 2fa399c5a6..69544764c7 100644 --- a/lambdas/functions/termination-watcher/src/termination-warning.test.ts +++ b/lambdas/functions/termination-watcher/src/termination-warning.test.ts @@ -1,24 +1,29 @@ -import { DescribeInstancesCommand, EC2Client, Instance, Reservation } from '@aws-sdk/client-ec2'; +import { EC2Client, Instance } from '@aws-sdk/client-ec2'; import { mockClient } from 'aws-sdk-client-mock'; import 'aws-sdk-client-mock-jest'; import { handle } from './termination-warning'; import { SpotInterruptionWarning, SpotTerminationDetail } from './types'; -import { createSingleMetric } from '@aws-github-runner/aws-powertools-util'; -import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import { metricEvent } from './metric-event'; +import { mocked } from 'jest-mock'; +import { getInstances } from './ec2'; -jest.mock('@aws-github-runner/aws-powertools-util', () => ({ - ...jest.requireActual('@aws-github-runner/aws-powertools-util'), - // eslint-disable-next-line @typescript-eslint/no-unused-vars - createSingleMetric: jest.fn((name: string, unit: string, value: number, dimensions?: Record) => { - return { - addMetadata: jest.fn(), - }; - }), +jest.mock('./metric-event', () => ({ + metricEvent: jest.fn(), })); -const mockEC2Client = mockClient(EC2Client); +jest.mock('./ec2', () => ({ + ...jest.requireActual('./ec2'), + getInstances: jest.fn(), +})); + +mockClient(EC2Client); -const config = { createSpotWarningMetric: true, tagFilters: { 'ghr:environment': 'test' }, prefix: 'runners' }; +const config = { + createSpotWarningMetric: true, + createSpotTerminationMetric: false, + tagFilters: { 'ghr:environment': 'test' }, + prefix: 'runners', +}; const event: SpotInterruptionWarning = { version: '0', @@ -47,112 +52,36 @@ const instance: Instance = { LaunchTime: new Date('2021-01-01'), }; -const reservations: Reservation[] = [ - { - Instances: [instance], - }, -]; - describe('handle termination warning', () => { beforeEach(() => { jest.clearAllMocks(); }); it('should log and create an metric', async () => { - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); - + mocked(getInstances).mockResolvedValue([instance]); await handle(event, config); - expect(createSingleMetric).toHaveBeenCalled(); - expect(createSingleMetric).toHaveBeenCalledWith('SpotInterruptionWarning', MetricUnit.Count, 1, { - InstanceType: instance.InstanceType ? instance.InstanceType : '_FAIL_', - Environment: instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? '_FAIL_', - }); + + expect(metricEvent).toHaveBeenCalled(); + expect(metricEvent).toHaveBeenCalledWith(instance, event, 'SpotInterruptionWarning', expect.anything()); }); it('should log details and not create a metric', async () => { - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); + mocked(getInstances).mockResolvedValue([instance]); await handle(event, { ...config, createSpotWarningMetric: false }); - expect(createSingleMetric).not.toHaveBeenCalled(); - }); - - it('should log and create matric for custom filters.', async () => { - const tags: Record = { 'ghr:custom': 'runners', 'ghr:created_by': 'niek' }; - mockEC2Client.on(DescribeInstancesCommand).resolves({ - Reservations: [ - { - Instances: [ - { - ...instance, - InstanceType: undefined, - LaunchTime: undefined, - InstanceId: undefined, - Tags: Object.keys(tags).map((key) => ({ Key: key, Value: tags[key] })), - }, - ], - }, - ], - }); - - await handle(event, { createSpotWarningMetric: true, tagFilters: tags, prefix: '' }); - expect(createSingleMetric).toHaveBeenCalled(); - }); - - it('should log and create matric for filter only with prefix match.', async () => { - // esnure instances contians tag with key gh:environment - const tagValue = instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value; - if (!tagValue) { - fail('Tag ghr:environment not found on instance, required for this test.'); - } - expect(tagValue?.length).toBeGreaterThan(2); - - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); - - await handle(event, { - createSpotWarningMetric: true, - tagFilters: { 'ghr:environment': tagValue.substring(0, tagValue.length - 1) }, - prefix: '', - }); - expect(createSingleMetric).toHaveBeenCalled(); - }); - - it('should not log and not create matric for custom filters without a match.', async () => { - // esnure instances contians tag with key gh:environment - expect(instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value).toBeDefined(); - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); - - await handle(event, { createSpotWarningMetric: true, tagFilters: { 'ghr:environment': '_INVALID_' }, prefix: '' }); - expect(createSingleMetric).not.toHaveBeenCalled(); - }); - - it('should log and create matric if filter is empty', async () => { - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); - - await handle(event, { createSpotWarningMetric: true, tagFilters: {}, prefix: '' }); - expect(createSingleMetric).toHaveBeenCalled(); - }); - - it('should not create a metric if no instance is found.', async () => { - mockEC2Client.on(DescribeInstancesCommand).resolves({ - Reservations: [ - { - Instances: [], - }, - ], - }); - - await handle(event, config); - expect(createSingleMetric).not.toHaveBeenCalled(); + expect(metricEvent).toHaveBeenCalledWith(instance, event, undefined, expect.anything()); }); it('should not create a metric if filter not matched.', async () => { - mockEC2Client.on(DescribeInstancesCommand).resolves({ Reservations: reservations }); + mocked(getInstances).mockResolvedValue([instance]); await handle(event, { createSpotWarningMetric: true, + createSpotTerminationMetric: false, tagFilters: { 'ghr:environment': '_NO_MATCH_' }, prefix: 'runners', }); - expect(createSingleMetric).not.toHaveBeenCalled(); + + expect(metricEvent).not.toHaveBeenCalled(); }); }); diff --git a/lambdas/functions/termination-watcher/src/termination-warning.ts b/lambdas/functions/termination-watcher/src/termination-warning.ts index 59706fef7b..1bc3a9e0c9 100644 --- a/lambdas/functions/termination-watcher/src/termination-warning.ts +++ b/lambdas/functions/termination-watcher/src/termination-warning.ts @@ -1,54 +1,37 @@ -import { createChildLogger, createSingleMetric, getTracedAWSV3Client } from '@aws-github-runner/aws-powertools-util'; +import { createChildLogger, getTracedAWSV3Client } from '@aws-github-runner/aws-powertools-util'; import { SpotInterruptionWarning, SpotTerminationDetail } from './types'; -import { DescribeInstancesCommand, EC2Client } from '@aws-sdk/client-ec2'; +import { EC2Client, Instance } from '@aws-sdk/client-ec2'; import { Config } from './ConfigResolver'; -import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import { tagFilter, getInstances } from './ec2'; +import { metricEvent } from './metric-event'; const logger = createChildLogger('termination-warning'); async function handle(event: SpotInterruptionWarning, config: Config): Promise { logger.debug('Received spot notification warning:', { event }); const ec2 = getTracedAWSV3Client(new EC2Client({ region: process.env.AWS_REGION })); - const instance = - (await ec2.send(new DescribeInstancesCommand({ InstanceIds: [event.detail['instance-id']] }))).Reservations?.[0] - .Instances?.[0] ?? null; - logger.debug('Received spot notification warning for:', { instance }); + const instances = await getInstances(ec2, [event.detail['instance-id']]); + logger.debug('Received spot notification warning for:', { instances }); - // check if all tags in config.tagFilter are present on the instance - const matchFilter = Object.keys(config.tagFilters).every((key) => { - return instance?.Tags?.find((tag) => tag.Key === key && tag.Value?.startsWith(config.tagFilters[key])); - }); + await createMetricForInstances(instances, event, config); +} + +async function createMetricForInstances( + instances: Instance[], + event: SpotInterruptionWarning, + config: Config, +): Promise { + for (const instance of instances) { + const matchFilter = tagFilter(instance, config.tagFilters); - if (matchFilter && instance) { - const instanceRunningTimeInSeconds = instance.LaunchTime - ? (new Date(event.time).getTime() - new Date(instance.LaunchTime).getTime()) / 1000 - : undefined; - logger.info('Received spot notification warning:', { - instanceId: instance.InstanceId, - instanceType: instance.InstanceType ?? 'unknown', - instanceName: instance.Tags?.find((tag) => tag.Key === 'Name')?.Value, - instanceState: instance.State?.Name, - instanceLaunchTime: instance.LaunchTime, - instanceRunningTimeInSeconds, - tags: instance.Tags, - }); - if (config.createSpotWarningMetric) { - const metric = createSingleMetric('SpotInterruptionWarning', MetricUnit.Count, 1, { - InstanceType: instance.InstanceType ? instance.InstanceType : 'unknown', - Environment: instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown', - }); - metric.addMetadata('InstanceId', instance.InstanceId ?? 'unknown'); - metric.addMetadata('InstanceType', instance.InstanceType ? instance.InstanceType : 'unknown'); - metric.addMetadata( - 'Environment', - instance.Tags?.find((tag) => tag.Key === 'ghr:environment')?.Value ?? 'unknown', + if (matchFilter) { + metricEvent(instance, event, config.createSpotWarningMetric ? 'SpotInterruptionWarning' : undefined, logger); + } else { + logger.debug( + `Received spot termination notification warning but ` + + `details are not available or instance not matching the tag filster (${config.tagFilters}).`, ); } - } else { - logger.debug( - `Received spot termination notification warning for instance ${event.detail['instance-id']} but ` + - `details are not available or instance not matching the tag fileter (${config.tagFilters}).`, - ); } } diff --git a/lambdas/functions/termination-watcher/src/termination.test.ts b/lambdas/functions/termination-watcher/src/termination.test.ts new file mode 100644 index 0000000000..c0c9a9f571 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/termination.test.ts @@ -0,0 +1,108 @@ +import { EC2Client, Instance } from '@aws-sdk/client-ec2'; +import { mockClient } from 'aws-sdk-client-mock'; +import 'aws-sdk-client-mock-jest'; +import { handle } from './termination'; +import { BidEvictedDetail, BidEvictedEvent } from './types'; +import { metricEvent } from './metric-event'; +import { mocked } from 'jest-mock'; +import { getInstances } from './ec2'; + +jest.mock('./metric-event', () => ({ + metricEvent: jest.fn(), +})); + +jest.mock('./ec2', () => ({ + ...jest.requireActual('./ec2'), + getInstances: jest.fn(), +})); + +mockClient(EC2Client); + +const config = { + createSpotWarningMetric: false, + createSpotTerminationMetric: true, + tagFilters: { 'ghr:environment': 'test' }, + prefix: 'runners', +}; + +const event: BidEvictedEvent = { + version: '0', + id: '186d7999-3121-e749-23f3-c7caec1084e1', + 'detail-type': 'AWS Service Event via CloudTrail', + source: 'aws.ec2', + account: '123456789012', + time: '2024-10-09T11:48:46Z', + region: 'eu-west-1', + resources: [], + detail: { + eventVersion: '1.10', + userIdentity: { + accountId: '123456789012', + invokedBy: 'sec2.amazonaws.com', + }, + eventTime: '2024-10-09T11:48:46Z', + eventSource: 'ec2.amazonaws.com', + eventName: 'BidEvictedEvent', + awsRegion: 'eu-west-1', + sourceIPAddress: 'ec2.amazonaws.com', + userAgent: 'ec2.amazonaws.com', + requestParameters: null, + responseElements: null, + requestID: 'ebf032e3-5009-3484-aae8-b4946ab2e2eb', + eventID: '3a15843b-96c2-41b1-aac1-7d62dc754547', + readOnly: false, + eventType: 'AwsServiceEvent', + managementEvent: true, + recipientAccountId: '123456789012', + serviceEventDetails: { + instanceIdSet: ['i-12345678901234567'], + }, + eventCategory: 'Management', + }, +}; + +const instance: Instance = { + InstanceId: event.detail.serviceEventDetails.instanceIdSet[0], + InstanceType: 't2.micro', + Tags: [ + { Key: 'Name', Value: 'test-instance' }, + { Key: 'ghr:environment', Value: 'test' }, + { Key: 'ghr:created_by', Value: 'niek' }, + ], + State: { Name: 'running' }, + LaunchTime: new Date('2021-01-01'), +}; + +describe('handle termination warning', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + it('should log and create an metric', async () => { + mocked(getInstances).mockResolvedValue([instance]); + await handle(event, config); + + expect(metricEvent).toHaveBeenCalled(); + expect(metricEvent).toHaveBeenCalledWith(instance, event, 'SpotTermination', expect.anything()); + }); + + it('should log details and not create a metric', async () => { + mocked(getInstances).mockResolvedValue([instance]); + + await handle(event, { ...config, createSpotTerminationMetric: false }); + expect(metricEvent).toHaveBeenCalledWith(instance, event, undefined, expect.anything()); + }); + + it('should not create a metric if filter not matched.', async () => { + mocked(getInstances).mockResolvedValue([instance]); + + await handle(event, { + createSpotWarningMetric: false, + createSpotTerminationMetric: true, + tagFilters: { 'ghr:environment': '_NO_MATCH_' }, + prefix: 'runners', + }); + + expect(metricEvent).not.toHaveBeenCalled(); + }); +}); diff --git a/lambdas/functions/termination-watcher/src/termination.ts b/lambdas/functions/termination-watcher/src/termination.ts new file mode 100644 index 0000000000..4efc625245 --- /dev/null +++ b/lambdas/functions/termination-watcher/src/termination.ts @@ -0,0 +1,40 @@ +import { createChildLogger, getTracedAWSV3Client } from '@aws-github-runner/aws-powertools-util'; +import { BidEvictedDetail, BidEvictedEvent } from './types'; +import { EC2Client } from '@aws-sdk/client-ec2'; +import { Config } from './ConfigResolver'; +import { metricEvent } from './metric-event'; +import { getInstances, tagFilter } from './ec2'; + +const logger = createChildLogger('termination-handler'); + +export async function handle(event: BidEvictedEvent, config: Config): Promise { + logger.debug('Received spot termination (BidEvictedEvent):', { event }); + + const instanceIds = event.detail.serviceEventDetails?.instanceIdSet; + await createMetricForInstances(instanceIds, event, config); +} + +async function createMetricForInstances( + instanceIds: string[], + event: BidEvictedEvent, + config: Config, +): Promise { + const ec2 = getTracedAWSV3Client(new EC2Client({ region: process.env.AWS_REGION })); + + const instances = await getInstances(ec2, instanceIds); + logger.debug('Received spot notification termination for:', { instances }); + + // check if all tags in config.tagFilter are present on the instance + for (const instance of instances) { + const matchFilter = tagFilter(instance, config.tagFilters); + + if (matchFilter) { + metricEvent(instance, event, config.createSpotTerminationMetric ? 'SpotTermination' : undefined, logger); + } else { + logger.debug( + `Received spot termination but ` + + `details are not available or instance not matching the tag filter (${config.tagFilters}).`, + ); + } + } +} diff --git a/lambdas/functions/termination-watcher/src/types.d.ts b/lambdas/functions/termination-watcher/src/types.d.ts index 33d22263b5..d242221142 100644 --- a/lambdas/functions/termination-watcher/src/types.d.ts +++ b/lambdas/functions/termination-watcher/src/types.d.ts @@ -8,3 +8,37 @@ interface SpotTerminationDetail { 'instance-id': string; 'instance-action': string; } + +// eslint-disable-next-line @typescript-eslint/no-empty-object-type +export interface BidEvictedEvent + extends EventBridgeEvent<'AWS Service Event via CloudTrail', BidEvictedDetail> {} + +interface BidEvictedDetail { + eventVersion: string; + userIdentity: UserIdentity; + eventTime: string; + eventSource: string; + eventName: string; + awsRegion: string; + sourceIPAddress: string; + userAgent: string; + requestParameters: null; + responseElements: null; + requestID: string; + eventID: string; + readOnly: boolean; + eventType: string; + managementEvent: boolean; + recipientAccountId: string; + serviceEventDetails: ServiceEventDetails; + eventCategory: string; +} + +interface UserIdentity { + accountId: string; + invokedBy: string; +} + +interface ServiceEventDetails { + instanceIdSet: string[]; +} diff --git a/modules/multi-runner/README.md b/modules/multi-runner/README.md index fbd427367c..bb594e83f0 100644 --- a/modules/multi-runner/README.md +++ b/modules/multi-runner/README.md @@ -136,7 +136,7 @@ module "multi-runner" { | [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no | | [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). |
object({
key_base64 = string
id = string
webhook_secret = string
})
| n/a | yes | | [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no | -| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metrics = optional(string, null) # deprecated
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | +| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metrics = optional(string, null) # deprecated
features = optional(object({
enable_spot_termination_handler = optional(bool, true)
enable_spot_termination_notification_watcher = optional(bool, true)
}), {})
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | | [key\_name](#input\_key\_name) | Key pair name | `string` | `null` | no | | [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. | `string` | `null` | no | | [lambda\_architecture](#input\_lambda\_architecture) | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions. | `string` | `"arm64"` | no | diff --git a/modules/multi-runner/variables.tf b/modules/multi-runner/variables.tf index 13b6f838a0..553cc04594 100644 --- a/modules/multi-runner/variables.tf +++ b/modules/multi-runner/variables.tf @@ -634,8 +634,12 @@ variable "instance_termination_watcher" { EOF type = object({ - enable = optional(bool, false) - enable_metrics = optional(string, null) # deprecated + enable = optional(bool, false) + enable_metrics = optional(string, null) # deprecated + features = optional(object({ + enable_spot_termination_handler = optional(bool, true) + enable_spot_termination_notification_watcher = optional(bool, true) + }), {}) memory_size = optional(number, null) s3_key = optional(string, null) s3_object_version = optional(string, null) diff --git a/modules/termination-watcher/README.md b/modules/termination-watcher/README.md index 849380777f..c3ab80ff33 100644 --- a/modules/termination-watcher/README.md +++ b/modules/termination-watcher/README.md @@ -65,34 +65,29 @@ yarn run dist ## Providers -| Name | Version | -|------|---------| -| [aws](#provider\_aws) | ~> 5.27 | +No providers. ## Modules | Name | Source | Version | |------|--------|---------| -| [termination\_warning\_watcher](#module\_termination\_warning\_watcher) | ../lambda | n/a | +| [termination\_handler](#module\_termination\_handler) | ./termination | n/a | +| [termination\_notification](#module\_termination\_notification) | ./notification | n/a | ## Resources -| Name | Type | -|------|------| -| [aws_cloudwatch_event_rule.spot_instance_termination_warning](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource | -| [aws_cloudwatch_event_target.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource | -| [aws_iam_role_policy.lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | -| [aws_lambda_permission.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource | +No resources. ## Inputs | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [config](#input\_config) | Configuration for the spot termination watcher lambda function.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_metric = optional(string, null)
environment_variables = optional(map(string), {})
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics = optional(object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_spot_termination_warning = optional(bool, true)
}), {})
}), {})
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
security_group_ids = optional(list(string), [])
subnet_ids = optional(list(string), [])
tag_filters = optional(map(string), null)
tags = optional(map(string), {})
timeout = optional(number, null)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | +| [config](#input\_config) | Configuration for the spot termination watcher.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
'features': Features to enable the different lambda functions to handle spot termination events.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_metric = optional(string, null)
environment_variables = optional(map(string), {})
features = optional(object({
enable_spot_termination_handler = optional(bool, true)
enable_spot_termination_notification_watcher = optional(bool, true)
}), {})
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics = optional(object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_spot_termination = optional(bool, true)
enable_spot_termination_warning = optional(bool, true)
}), {})
}), {})
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
security_group_ids = optional(list(string), [])
subnet_ids = optional(list(string), [])
tag_filters = optional(map(string), null)
tags = optional(map(string), {})
timeout = optional(number, null)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | ## Outputs | Name | Description | |------|-------------| -| [lambda](#output\_lambda) | n/a | +| [spot\_termination\_handler](#output\_spot\_termination\_handler) | n/a | +| [spot\_termination\_notification](#output\_spot\_termination\_notification) | n/a | diff --git a/modules/termination-watcher/main.tf b/modules/termination-watcher/main.tf index acf41f83be..1cf8ccb275 100644 --- a/modules/termination-watcher/main.tf +++ b/modules/termination-watcher/main.tf @@ -15,41 +15,3 @@ locals { metrics_namespace = var.config.metrics.namespace }) } - -module "termination_warning_watcher" { - source = "../lambda" - lambda = local.config -} - - -resource "aws_cloudwatch_event_rule" "spot_instance_termination_warning" { - name = "${var.config.prefix != null ? format("%s-", var.config.prefix) : ""}spot-instance-termination" - description = "Spot Instance Termination Warning" - - event_pattern = < +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | >= 1.3.0 | +| [aws](#requirement\_aws) | ~> 5.27 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | ~> 5.27 | + +## Modules + +| Name | Source | Version | +|------|--------|---------| +| [termination\_warning\_watcher](#module\_termination\_warning\_watcher) | ../../lambda | n/a | + +## Resources + +| Name | Type | +|------|------| +| [aws_cloudwatch_event_rule.spot_instance_termination_warning](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource | +| [aws_cloudwatch_event_target.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource | +| [aws_iam_role_policy.lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | +| [aws_lambda_permission.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [config](#input\_config) | Configuration for the termination notification watcher | `any` | n/a | yes | + +## Outputs + +| Name | Description | +|------|-------------| +| [lambda](#output\_lambda) | n/a | + \ No newline at end of file diff --git a/modules/termination-watcher/notification/main.tf b/modules/termination-watcher/notification/main.tf new file mode 100644 index 0000000000..e8eef31874 --- /dev/null +++ b/modules/termination-watcher/notification/main.tf @@ -0,0 +1,49 @@ +locals { + name = "spot-termination-notification" + + config = merge(var.config, { + name = local.name, + handler = "index.interruptionWarning", + environment_variables = { + ENABLE_METRICS_SPOT_WARNING = var.config.metrics != null ? var.config.metrics.enable && var.config.metrics.metric.enable_spot_termination_warning : false + TAG_FILTERS = jsonencode(var.config.tag_filters) + } + }) +} + +module "termination_warning_watcher" { + source = "../../lambda" + lambda = local.config +} + +resource "aws_cloudwatch_event_rule" "spot_instance_termination_warning" { + name = "${var.config.prefix != null ? format("%s-", var.config.prefix) : ""}spot-notify" + description = "Spot Instance Termination Warning" + + event_pattern = < +## Requirements + +| Name | Version | +|------|---------| +| [terraform](#requirement\_terraform) | >= 1.3.0 | +| [aws](#requirement\_aws) | ~> 5.27 | + +## Providers + +| Name | Version | +|------|---------| +| [aws](#provider\_aws) | ~> 5.27 | + +## Modules + +| Name | Source | Version | +|------|--------|---------| +| [termination\_handler](#module\_termination\_handler) | ../../lambda | n/a | + +## Resources + +| Name | Type | +|------|------| +| [aws_cloudwatch_event_rule.spot_instance_termination](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_rule) | resource | +| [aws_cloudwatch_event_target.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudwatch_event_target) | resource | +| [aws_iam_role_policy.lambda_policy](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/iam_role_policy) | resource | +| [aws_lambda_permission.main](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_permission) | resource | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [config](#input\_config) | Configuration for the termination handler | `any` | n/a | yes | + +## Outputs + +| Name | Description | +|------|-------------| +| [lambda](#output\_lambda) | n/a | + \ No newline at end of file diff --git a/modules/termination-watcher/termination/main.tf b/modules/termination-watcher/termination/main.tf new file mode 100644 index 0000000000..20557d96d2 --- /dev/null +++ b/modules/termination-watcher/termination/main.tf @@ -0,0 +1,53 @@ +locals { + name = "spot-termination-handler" + + config = merge(var.config, { + name = local.name, + handler = "index.termination", + environment_variables = { + ENABLE_METRICS_SPOT_TERMINATION = var.config.metrics != null ? var.config.metrics.enable && var.config.metrics.metric.enable_spot_termination : false + TAG_FILTERS = jsonencode(var.config.tag_filters) + } + }) +} + +module "termination_handler" { + source = "../../lambda" + lambda = local.config +} + +resource "aws_cloudwatch_event_rule" "spot_instance_termination" { + name = "${var.config.prefix != null ? format("%s-", var.config.prefix) : ""}spot-termination" + description = "Spot Instance Termination (BidEventicedEvent)" + + event_pattern = <