feat(telemetry): add new Telemetry service (OTel, Syslog, Fluent, Metrics, Traces) and unified ingestion pipeline

- Add Telemetry service entrypoint
  - Telemetry/Index.ts: app bootstrap, routes mounting, infrastructure init and Telemetry SDK init.

- Unified queue + worker
  - Telemetry/Jobs/TelemetryIngest/ProcessTelemetry.ts: single worker that dispatches queued jobs to specific processors (logs, traces, metrics, syslog, fluent logs).
  - Telemetry/Services/Queue/TelemetryQueueService.ts: central queue API and job payload types.
  - Per-type Queue wrappers (LogsQueueService, MetricsQueueService, TracesQueueService, FluentLogsQueueService, SyslogQueueService).

- OpenTelemetry ingestion middleware and proto support
  - Telemetry/Middleware/OtelRequestMiddleware.ts: detect OTLP endpoint (logs/traces/metrics), decode protobuf bodies using protobufjs and set product type.
  - Telemetry/ProtoFiles/OTel/v1/*.proto: include common.proto, logs.proto, metrics.proto, resource.proto, traces.proto for OTLP v1 messages.

- Ingest services
  - Telemetry/Services/OtelLogsIngestService.ts: parse incoming OTLP logs, map attributes, convert timestamps, batch insert logs.
  - Telemetry/Services/OtelTracesIngestService.ts: parse OTLP traces, build span rows, extract exceptions, batch insert spans and exceptions, save telemetry exception summary.
  - Telemetry/Services/OtelMetricsIngestService.ts: parse OTLP metrics, normalize datapoints, batch insert metrics and index metric name -> service map.
  - Telemetry/Services/SyslogIngestService.ts: syslog ingestion endpoints, parser integration, map syslog fields to attributes and logs.
  - Telemetry/Services/FluentLogsIngestService.ts: ingest Fluentd style logs, normalize entries and insert into log backend.
  - Telemetry/Services/OtelIngestBaseService.ts: helpers to resolve service name from attributes/headers.

- Syslog parser and utilities
  - Telemetry/Utils/SyslogParser.ts: robust RFC5424 and RFC3164 parser, structured data extraction and sanitization.
  - Telemetry/Tests/Utils/SyslogParser.test.ts: unit tests for parser behavior.

- Telemetry exception utilities
  - Telemetry/Utils/Exception.ts: generate exception fingerprint and upsert telemetry exception status (saveOrUpdateTelemetryException).

- Queue & job integration
  - New integration with Common/Server/Infrastructure/Queue and QueueWorker, job id generation and telemetry job types.
  - Telemetry services add ingestion jobs instead of processing synchronously.

- Config, build and dev tooling
  - Add Telemetry/package.json, package-lock.json, tsconfig.json, nodemon.json, jest config.
  - New script configs and dependencies (protobufjs, ts-node, jest, nodemon, etc).

- Docker / environment updates
  - docker-compose.base.yml, docker-compose.dev.yml, docker-compose.yml: rename service from open-telemetry-ingest -> telemetry and wire TELEMETRY_* envs.
  - config.example.env: rename and consolidate environment variables (OPEN_TELEMETRY_* -> TELEMETRY_*, update hostnames and ports).
  - Tests/Scripts/status-check.sh: update ready-check target to telemetry/status/ready.

- Other
  - Telemetry/Services/Queue/*: export helpers and legacy-compatible job interface shims.
  - Memory cleanup and batching safeguards across ingest services.
  - Logging and capture spans added to key code paths.

BREAKING CHANGES / MIGRATION NOTES:
- Environment variables and docker service names changed:
  - Replace OPEN_TELEMETRY_... vars with TELEMETRY_... (PORT, HOSTNAME, CONCURRENCY, DISABLE_TELEMETRY, etc).
  - docker-compose entries moved from "open-telemetry-ingest" to "telemetry" and image name changed to oneuptime/telemetry.
  - Update any deployment automation and monitoring checks referencing the old service name or endpoints.
- Consumers: OTLP endpoints and behavior remain supported, but ingestion is now queued and processed asynchronously.

Testing / Running:
- Install deps in Telemetry/ (npm install) after syncing Common workspace.
- Run dev: npx nodemon (nodemon.json) or build & start using provided scripts.
- Run tests with jest (Telemetry test suite includes SyslogParser unit tests).

Files added/modified (high level):
- Added many files under Telemetry/: Index, Jobs, Middleware, ProtoFiles, Services, Utils, Tests, package and config artifacts.
- Modified docker-compose.* and config.example.env and status check script to use new TELEMETRY service/vars.
This commit is contained in:
Nawaz Dhandala 2025-11-07 21:36:47 +00:00
parent 351fc4828b
commit f49b1995df
No known key found for this signature in database
GPG key ID: 96C5DCA24769DBCA
63 changed files with 204 additions and 209 deletions

View file

@ -382,7 +382,7 @@ jobs:
max_attempts: 3
command: sudo docker build --no-cache -f ./ServerMonitorIngest/Dockerfile .
docker-build-open-telemetry-ingest:
docker-build-telemetry:
runs-on: ubuntu-latest
env:
CI_PIPELINE_ID: ${{github.run_number}}
@ -403,7 +403,7 @@ jobs:
with:
timeout_minutes: 45
max_attempts: 3
command: sudo docker build --no-cache -f ./OpenTelemetryIngest/Dockerfile .
command: sudo docker build --no-cache -f ./Telemetry/Dockerfile .
docker-build-incoming-request-ingest:
runs-on: ubuntu-latest

View file

@ -319,7 +319,7 @@ jobs:
max_attempts: 3
command: cd ServerMonitorIngest && npm install && npm run compile && npm run dep-check
compile-open-telemetry-ingest:
compile-telemetry:
runs-on: ubuntu-latest
env:
CI_PIPELINE_ID: ${{github.run_number}}
@ -329,12 +329,12 @@ jobs:
with:
node-version: latest
- run: cd Common && npm install
- name: Compile Open Telemetry Ingest
- name: Compile Telemetry
uses: nick-fields/retry@v3
with:
timeout_minutes: 30
max_attempts: 3
command: cd OpenTelemetryIngest && npm install && npm run compile && npm run dep-check
command: cd Telemetry && npm install && npm run compile && npm run dep-check
compile-incoming-request-ingest:

View file

@ -1108,7 +1108,7 @@ jobs:
open-telemetry-ingest-docker-image-deploy:
telemetry-docker-image-deploy:
needs: [generate-build-number, read-version]
runs-on: ubuntu-latest
steps:
@ -1117,8 +1117,8 @@ jobs:
uses: docker/metadata-action@v4
with:
images: |
oneuptime/open-telemetry-ingest
ghcr.io/oneuptime/open-telemetry-ingest
oneuptime/telemetry
ghcr.io/oneuptime/telemetry
tags: |
type=raw,value=release,enable=true
type=semver,value=${{needs.read-version.outputs.major_minor}}.${{needs.generate-build-number.outputs.build_number}},pattern={{version}},enable=true
@ -1144,7 +1144,7 @@ jobs:
max_attempts: 3
command: npm run prerun
# Build and deploy open-telemetry-ingest.
# Build and deploy telemetry.
- name: Login to Docker Hub
uses: nick-fields/retry@v3
@ -1171,22 +1171,22 @@ jobs:
VERSION="${{needs.read-version.outputs.major_minor}}.${{needs.generate-build-number.outputs.build_number}}"
docker buildx build \
--no-cache \
--file ./OpenTelemetryIngest/Dockerfile \
--file ./Telemetry/Dockerfile \
--platform linux/amd64,linux/arm64 \
--push \
--tag oneuptime/open-telemetry-ingest:${VERSION} \
--tag ghcr.io/oneuptime/open-telemetry-ingest:${VERSION} \
--tag oneuptime/telemetry:${VERSION} \
--tag ghcr.io/oneuptime/telemetry:${VERSION} \
--build-arg GIT_SHA=${{ github.sha }} \
--build-arg APP_VERSION=${VERSION} \
--build-arg IS_ENTERPRISE_EDITION=false \
.
docker buildx build \
--no-cache \
--file ./OpenTelemetryIngest/Dockerfile \
--file ./Telemetry/Dockerfile \
--platform linux/amd64,linux/arm64 \
--push \
--tag oneuptime/open-telemetry-ingest:enterprise-${VERSION} \
--tag ghcr.io/oneuptime/open-telemetry-ingest:enterprise-${VERSION} \
--tag oneuptime/telemetry:enterprise-${VERSION} \
--tag ghcr.io/oneuptime/telemetry:enterprise-${VERSION} \
--build-arg GIT_SHA=${{ github.sha }} \
--build-arg APP_VERSION=${VERSION} \
--build-arg IS_ENTERPRISE_EDITION=true \
@ -2340,7 +2340,7 @@ jobs:
- test-docker-image-deploy
- probe-ingest-docker-image-deploy
- server-monitor-ingest-docker-image-deploy
- open-telemetry-ingest-docker-image-deploy
- telemetry-docker-image-deploy
- incoming-request-ingest-docker-image-deploy
- probe-docker-image-deploy
- admin-dashboard-docker-image-deploy
@ -2371,7 +2371,7 @@ jobs:
"test",
"probe-ingest",
"server-monitor-ingest",
"open-telemetry-ingest",
"telemetry",
"incoming-request-ingest",
"probe",
"admin-dashboard",
@ -2438,7 +2438,7 @@ jobs:
test-e2e-release-saas:
runs-on: ubuntu-latest
needs: [open-telemetry-ingest-docker-image-deploy, publish-mcp-server, copilot-docker-image-deploy, docs-docker-image-deploy, api-reference-docker-image-deploy, workflow-docker-image-deploy, llm-docker-image-deploy, accounts-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, dashboard-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, worker-docker-image-deploy, otel-collector-docker-image-deploy, probe-docker-image-deploy, status-page-docker-image-deploy, test-docker-image-deploy, test-server-docker-image-deploy, publish-npm-packages, e2e-docker-image-deploy, helm-chart-deploy, generate-build-number, read-version, nginx-docker-image-deploy, incoming-request-ingest-docker-image-deploy]
needs: [telemetry-docker-image-deploy, publish-mcp-server, copilot-docker-image-deploy, docs-docker-image-deploy, api-reference-docker-image-deploy, workflow-docker-image-deploy, llm-docker-image-deploy, accounts-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, dashboard-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, worker-docker-image-deploy, otel-collector-docker-image-deploy, probe-docker-image-deploy, status-page-docker-image-deploy, test-docker-image-deploy, test-server-docker-image-deploy, publish-npm-packages, e2e-docker-image-deploy, helm-chart-deploy, generate-build-number, read-version, nginx-docker-image-deploy, incoming-request-ingest-docker-image-deploy]
env:
CI_PIPELINE_ID: ${{github.run_number}}
steps:
@ -2525,7 +2525,7 @@ jobs:
test-e2e-release-self-hosted:
runs-on: ubuntu-latest
# After all the jobs runs
needs: [open-telemetry-ingest-docker-image-deploy, publish-mcp-server, copilot-docker-image-deploy, incoming-request-ingest-docker-image-deploy, docs-docker-image-deploy, api-reference-docker-image-deploy, workflow-docker-image-deploy, llm-docker-image-deploy, accounts-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, dashboard-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, worker-docker-image-deploy, otel-collector-docker-image-deploy, probe-docker-image-deploy, status-page-docker-image-deploy, test-docker-image-deploy, test-server-docker-image-deploy, publish-npm-packages, e2e-docker-image-deploy, helm-chart-deploy, generate-build-number, read-version, nginx-docker-image-deploy]
needs: [telemetry-docker-image-deploy, publish-mcp-server, copilot-docker-image-deploy, incoming-request-ingest-docker-image-deploy, docs-docker-image-deploy, api-reference-docker-image-deploy, workflow-docker-image-deploy, llm-docker-image-deploy, accounts-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, dashboard-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, worker-docker-image-deploy, otel-collector-docker-image-deploy, probe-docker-image-deploy, status-page-docker-image-deploy, test-docker-image-deploy, test-server-docker-image-deploy, publish-npm-packages, e2e-docker-image-deploy, helm-chart-deploy, generate-build-number, read-version, nginx-docker-image-deploy]
env:
CI_PIPELINE_ID: ${{github.run_number}}
steps:

View file

@ -1245,7 +1245,7 @@ on:
--build-arg IS_ENTERPRISE_EDITION=true \
.
open-telemetry-ingest-docker-image-deploy:
telemetry-docker-image-deploy:
needs: [read-version, generate-build-number]
runs-on: ubuntu-latest
steps:
@ -1254,8 +1254,8 @@ on:
uses: docker/metadata-action@v4
with:
images: |
oneuptime/open-telemetry-ingest
ghcr.io/oneuptime/open-telemetry-ingest
oneuptime/telemetry
ghcr.io/oneuptime/telemetry
tags: |
type=raw,value=test,enable=true
type=raw,value=${{needs.read-version.outputs.major_minor}}.${{needs.generate-build-number.outputs.build_number}}-test,enable=true
@ -1309,26 +1309,26 @@ on:
VERSION="${{needs.read-version.outputs.major_minor}}.${{needs.generate-build-number.outputs.build_number}}-test"
docker buildx build \
--no-cache \
--file ./OpenTelemetryIngest/Dockerfile \
--file ./Telemetry/Dockerfile \
--platform linux/amd64,linux/arm64 \
--push \
--tag oneuptime/open-telemetry-ingest:test \
--tag oneuptime/open-telemetry-ingest:${VERSION} \
--tag ghcr.io/oneuptime/open-telemetry-ingest:test \
--tag ghcr.io/oneuptime/open-telemetry-ingest:${VERSION} \
--tag oneuptime/telemetry:test \
--tag oneuptime/telemetry:${VERSION} \
--tag ghcr.io/oneuptime/telemetry:test \
--tag ghcr.io/oneuptime/telemetry:${VERSION} \
--build-arg GIT_SHA=${{ github.sha }} \
--build-arg APP_VERSION=${VERSION} \
--build-arg IS_ENTERPRISE_EDITION=false \
.
docker buildx build \
--no-cache \
--file ./OpenTelemetryIngest/Dockerfile \
--file ./Telemetry/Dockerfile \
--platform linux/amd64,linux/arm64 \
--push \
--tag oneuptime/open-telemetry-ingest:enterprise-test \
--tag oneuptime/open-telemetry-ingest:enterprise-${VERSION} \
--tag ghcr.io/oneuptime/open-telemetry-ingest:enterprise-test \
--tag ghcr.io/oneuptime/open-telemetry-ingest:enterprise-${VERSION} \
--tag oneuptime/telemetry:enterprise-test \
--tag oneuptime/telemetry:enterprise-${VERSION} \
--tag ghcr.io/oneuptime/telemetry:enterprise-test \
--tag ghcr.io/oneuptime/telemetry:enterprise-${VERSION} \
--build-arg GIT_SHA=${{ github.sha }} \
--build-arg APP_VERSION=${VERSION} \
--build-arg IS_ENTERPRISE_EDITION=true \
@ -2236,7 +2236,7 @@ on:
test-helm-chart:
runs-on: ubuntu-latest
needs: [infrastructure-agent-deploy, publish-mcp-server, llm-docker-image-deploy, publish-terraform-provider, open-telemetry-ingest-docker-image-deploy, copilot-docker-image-deploy, docs-docker-image-deploy, worker-docker-image-deploy, workflow-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, api-reference-docker-image-deploy, test-server-docker-image-deploy, test-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, probe-docker-image-deploy, dashboard-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, accounts-docker-image-deploy, otel-collector-docker-image-deploy, status-page-docker-image-deploy, nginx-docker-image-deploy, e2e-docker-image-deploy, incoming-request-ingest-docker-image-deploy]
needs: [infrastructure-agent-deploy, publish-mcp-server, llm-docker-image-deploy, publish-terraform-provider, telemetry-docker-image-deploy, copilot-docker-image-deploy, docs-docker-image-deploy, worker-docker-image-deploy, workflow-docker-image-deploy, isolated-vm-docker-image-deploy, home-docker-image-deploy, api-reference-docker-image-deploy, test-server-docker-image-deploy, test-docker-image-deploy, probe-ingest-docker-image-deploy, server-monitor-ingest-docker-image-deploy, probe-docker-image-deploy, dashboard-docker-image-deploy, admin-dashboard-docker-image-deploy, app-docker-image-deploy, accounts-docker-image-deploy, otel-collector-docker-image-deploy, status-page-docker-image-deploy, nginx-docker-image-deploy, e2e-docker-image-deploy, incoming-request-ingest-docker-image-deploy]
env:
CI_PIPELINE_ID: ${{github.run_number}}
steps:

View file

@ -1,4 +1,4 @@
name: OpenTelemetryIngest Test
name: Telemetry Test
on:
pull_request:
@ -17,5 +17,5 @@ jobs:
- uses: actions/setup-node@v4
with:
node-version: latest
- run: cd OpenTelemetryIngest && npm install && npm run test
- run: cd Telemetry && npm install && npm run test

4
.vscode/launch.json vendored
View file

@ -205,8 +205,8 @@
},
{
"address": "127.0.0.1",
"localRoot": "${workspaceFolder}/OpenTelemetryIngest",
"name": "OpenTelemetryIngest: Debug with Docker",
"localRoot": "${workspaceFolder}/Telemetry",
"name": "Telemetry: Debug with Docker",
"port": 9938,
"remoteRoot": "/usr/src/app",
"request": "attach",

View file

@ -165,8 +165,8 @@ export const ProbeIngestHostname: Hostname = Hostname.fromString(
);
export const OpenTelemetryIngestHostname: Hostname = Hostname.fromString(
`${process.env["SERVER_OPEN_TELEMETRY_INGEST_HOSTNAME"] || "localhost"}:${
process.env["OPEN_TELEMETRY_INGEST_PORT"] || 80
`${process.env["SERVER_TELEMETRY_HOSTNAME"] || "localhost"}:${
process.env["TELEMETRY_PORT"] || 80
}`,
);

View file

@ -28,9 +28,7 @@ export const AdminDashboardRoute: Route = new Route("/admin");
export const ProbeIngestRoute: Route = new Route("/probe-ingest");
export const OpenTelemetryIngestRoute: Route = new Route(
"/open-telemetry-ingest",
);
export const TelemetryRoute: Route = new Route("/telemetry");
export const IncomingRequestIngestRoute: Route = new Route(
"/incoming-request-ingest",

View file

@ -16,7 +16,7 @@ import {
StatusPageRoute,
WorkflowRoute,
IncomingRequestIngestRoute,
OpenTelemetryIngestRoute,
TelemetryRoute,
} from "../ServiceRoute";
import Hostname from "../Types/API/Hostname";
import Protocol from "../Types/API/Protocol";
@ -75,7 +75,7 @@ export const WORKFLOW_HOSTNAME: Hostname = Hostname.fromString(HOST);
export const PROBE_INGEST_HOSTNAME: Hostname = Hostname.fromString(HOST);
export const OPEN_TELEMETRY_INGEST_HOSTNAME: Hostname =
export const TELEMETRY_HOSTNAME: Hostname =
Hostname.fromString(HOST);
export const INCOMING_REQUEST_INGEST_HOSTNAME: Hostname =
@ -115,10 +115,10 @@ export const STATUS_PAGE_API_URL: URL = new URL(
new Route(StatusPageApiRoute.toString()),
);
export const OPEN_TELEMETRY_INGEST_URL: URL = new URL(
export const TELEMETRY_URL: URL = new URL(
HTTP_PROTOCOL,
OPEN_TELEMETRY_INGEST_HOSTNAME,
new Route(OpenTelemetryIngestRoute.toString()),
TELEMETRY_HOSTNAME,
new Route(TelemetryRoute.toString()),
);
export const IDENTITY_URL: URL = new URL(

View file

@ -2,8 +2,8 @@ import { BASE_URL } from "../../Config";
import { Page, expect, test } from "@playwright/test";
import URL from "Common/Types/API/URL";
test.describe("check live and health check of the open-telemetry-ingest", () => {
test("check if open-telemetry-ingest status is ok", async ({
test.describe("check live and health check of telemetry", () => {
test("check if telemetry status is ok", async ({
page,
}: {
page: Page;
@ -11,14 +11,14 @@ test.describe("check live and health check of the open-telemetry-ingest", () =>
page.setDefaultNavigationTimeout(120000); // 2 minutes
await page.goto(
`${URL.fromString(BASE_URL.toString())
.addRoute("/open-telemetry-ingest/status")
.addRoute("/telemetry/status")
.toString()}`,
);
const content: string = await page.content();
expect(content).toContain('{"status":"ok"}');
});
test("check if open-telemetry-ingest is ready", async ({
test("check if telemetry is ready", async ({
page,
}: {
page: Page;
@ -26,14 +26,14 @@ test.describe("check live and health check of the open-telemetry-ingest", () =>
page.setDefaultNavigationTimeout(120000); // 2 minutes
await page.goto(
`${URL.fromString(BASE_URL.toString())
.addRoute("/open-telemetry-ingest/status/ready")
.addRoute("/telemetry/status/ready")
.toString()}`,
);
const content: string = await page.content();
expect(content).toContain('{"status":"ok"}');
});
test("check if open-telemetry-ingest is live", async ({
test("check if telemetry is live", async ({
page,
}: {
page: Page;
@ -41,7 +41,7 @@ test.describe("check live and health check of the open-telemetry-ingest", () =>
page.setDefaultNavigationTimeout(120000); // 2 minutes
await page.goto(
`${URL.fromString(BASE_URL.toString())
.addRoute("/open-telemetry-ingest/status/live")
.addRoute("/telemetry/status/live")
.toString()}`,
);
const content: string = await page.content();

View file

@ -105,12 +105,12 @@ Usage:
value: {{ $.Release.Name }}-app.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_PROBE_INGEST_HOSTNAME
value: {{ $.Release.Name }}-probe-ingest.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: OPEN_TELEMETRY_INGEST_HOSTNAME
value: {{ $.Release.Name }}-open-telemetry-ingest.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: TELEMETRY_HOSTNAME
value: {{ $.Release.Name }}-telemetry.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_INCOMING_REQUEST_INGEST_HOSTNAME
value: {{ $.Release.Name }}-incoming-request-ingest.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_OPEN_TELEMETRY_INGEST_HOSTNAME
value: {{ $.Release.Name }}-open-telemetry-ingest.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_TELEMETRY_HOSTNAME
value: {{ $.Release.Name }}-telemetry.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_TEST_SERVER_HOSTNAME
value: {{ $.Release.Name }}-test-server.{{ $.Release.Namespace }}.svc.{{ $.Values.global.clusterDomain }}
- name: SERVER_OTEL_COLLECTOR_HOSTNAME
@ -130,8 +130,8 @@ Usage:
value: {{ $.Values.probeIngest.ports.http | squote }}
- name: SERVER_MONITOR_INGEST_PORT
value: {{ $.Values.serverMonitorIngest.ports.http | squote }}
- name: OPEN_TELEMETRY_INGEST_PORT
value: {{ $.Values.openTelemetryIngest.ports.http | squote }}
- name: TELEMETRY_PORT
value: {{ $.Values.telemetry.ports.http | squote }}
- name: INCOMING_REQUEST_INGEST_PORT
value: {{ $.Values.incomingRequestIngest.ports.http | squote }}
- name: TEST_SERVER_PORT

View file

@ -2,11 +2,11 @@
KEDA ScaledObjects for various services
*/}}
{{/* OpenTelemetry Ingest KEDA ScaledObject */}}
{{- if and .Values.keda.enabled .Values.openTelemetryIngest.keda.enabled (not .Values.openTelemetryIngest.disableAutoscaler) }}
{{- $metricsConfig := dict "enabled" .Values.openTelemetryIngest.keda.enabled "minReplicas" .Values.openTelemetryIngest.keda.minReplicas "maxReplicas" .Values.openTelemetryIngest.keda.maxReplicas "pollingInterval" .Values.openTelemetryIngest.keda.pollingInterval "cooldownPeriod" .Values.openTelemetryIngest.keda.cooldownPeriod "triggers" (list (dict "query" "oneuptime_telemetry_queue_size" "threshold" .Values.openTelemetryIngest.keda.queueSizeThreshold "port" .Values.openTelemetryIngest.ports.http)) }}
{{- $openTelemetryIngestKedaArgs := dict "ServiceName" "open-telemetry-ingest" "Release" .Release "Values" .Values "MetricsConfig" $metricsConfig "DisableAutoscaler" .Values.openTelemetryIngest.disableAutoscaler }}
{{- include "oneuptime.kedaScaledObject" $openTelemetryIngestKedaArgs }}
{{/* Telemetry KEDA ScaledObject */}}
{{- if and .Values.keda.enabled .Values.telemetry.keda.enabled (not .Values.telemetry.disableAutoscaler) }}
{{- $metricsConfig := dict "enabled" .Values.telemetry.keda.enabled "minReplicas" .Values.telemetry.keda.minReplicas "maxReplicas" .Values.telemetry.keda.maxReplicas "pollingInterval" .Values.telemetry.keda.pollingInterval "cooldownPeriod" .Values.telemetry.keda.cooldownPeriod "triggers" (list (dict "query" "oneuptime_telemetry_queue_size" "threshold" .Values.telemetry.keda.queueSizeThreshold "port" .Values.telemetry.ports.http)) }}
{{- $telemetryKedaArgs := dict "ServiceName" "telemetry" "Release" .Release "Values" .Values "MetricsConfig" $metricsConfig "DisableAutoscaler" .Values.telemetry.disableAutoscaler }}
{{- include "oneuptime.kedaScaledObject" $telemetryKedaArgs }}
{{- end }}
{{/* Incoming Request Ingest KEDA ScaledObject */}}

View file

@ -1,12 +1,12 @@
# OneUptime open-telemetry-ingest Deployment
# OneUptime telemetry Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ printf "%s-%s" $.Release.Name "open-telemetry-ingest" }}
name: {{ printf "%s-%s" $.Release.Name "telemetry" }}
namespace: {{ $.Release.Namespace }}
labels:
app: {{ printf "%s-%s" $.Release.Name "open-telemetry-ingest" }}
app: {{ printf "%s-%s" $.Release.Name "telemetry" }}
app.kubernetes.io/part-of: oneuptime
app.kubernetes.io/managed-by: Helm
appname: oneuptime
@ -16,11 +16,11 @@ metadata:
spec:
selector:
matchLabels:
app: {{ printf "%s-%s" $.Release.Name "open-telemetry-ingest" }}
{{- if $.Values.openTelemetryIngest.replicaCount }}
replicas: {{ $.Values.openTelemetryIngest.replicaCount }}
app: {{ printf "%s-%s" $.Release.Name "telemetry" }}
{{- if $.Values.telemetry.replicaCount }}
replicas: {{ $.Values.telemetry.replicaCount }}
{{- else }}
{{- if or (not $.Values.autoscaling.enabled) ($.Values.openTelemetryIngest.disableAutoscaler) }}
{{- if or (not $.Values.autoscaling.enabled) ($.Values.telemetry.disableAutoscaler) }}
replicas: {{ $.Values.deployment.replicaCount }}
{{- end }}
{{- end }}
@ -28,7 +28,7 @@ spec:
template:
metadata:
labels:
app: {{ printf "%s-%s" $.Release.Name "open-telemetry-ingest" }}
app: {{ printf "%s-%s" $.Release.Name "telemetry" }}
{{- if $.Values.deployment.includeTimestampLabel }}
date: "{{ now | unixEpoch }}"
{{- end }}
@ -38,9 +38,9 @@ spec:
- name: greenlockrc
emptyDir:
sizeLimit: "1Gi"
{{- if $.Values.openTelemetryIngest.podSecurityContext }}
{{- if $.Values.telemetry.podSecurityContext }}
securityContext:
{{- toYaml $.Values.openTelemetryIngest.podSecurityContext | nindent 8 }}
{{- toYaml $.Values.telemetry.podSecurityContext | nindent 8 }}
{{- else if $.Values.podSecurityContext }}
securityContext:
{{- toYaml $.Values.podSecurityContext | nindent 8 }}
@ -55,22 +55,22 @@ spec:
{{- if $.Values.tolerations }}
tolerations: {{- $.Values.tolerations | toYaml | nindent 8 }}
{{- end }}
{{- if $.Values.openTelemetryIngest.nodeSelector }}
{{- if $.Values.telemetry.nodeSelector }}
nodeSelector:
{{- toYaml $.Values.openTelemetryIngest.nodeSelector | nindent 8 }}
{{- toYaml $.Values.telemetry.nodeSelector | nindent 8 }}
{{- else if $.Values.nodeSelector }}
nodeSelector:
{{- toYaml $.Values.nodeSelector | nindent 8 }}
{{- end }}
containers:
- image: {{ include "oneuptime.image" (dict "Values" $.Values "ServiceName" "open-telemetry-ingest") }}
name: {{ printf "%s-%s" $.Release.Name "open-telemetry-ingest" }}
- image: {{ include "oneuptime.image" (dict "Values" $.Values "ServiceName" "telemetry") }}
name: {{ printf "%s-%s" $.Release.Name "telemetry" }}
{{- if $.Values.startupProbe.enabled }}
# Startup probe
startupProbe:
httpGet:
path: /status/live
port: {{ $.Values.openTelemetryIngest.ports.http }}
port: {{ $.Values.telemetry.ports.http }}
periodSeconds: {{ $.Values.startupProbe.periodSeconds }}
failureThreshold: {{ $.Values.startupProbe.failureThreshold }}
{{- end }}
@ -79,7 +79,7 @@ spec:
livenessProbe:
httpGet:
path: /status/live
port: {{ $.Values.openTelemetryIngest.ports.http }}
port: {{ $.Values.telemetry.ports.http }}
periodSeconds: {{ $.Values.livenessProbe.periodSeconds }}
timeoutSeconds: {{ $.Values.livenessProbe.timeoutSeconds }}
initialDelaySeconds: {{ $.Values.livenessProbe.initialDelaySeconds }}
@ -89,14 +89,14 @@ spec:
readinessProbe:
httpGet:
path: /status/ready
port: {{ $.Values.openTelemetryIngest.ports.http }}
port: {{ $.Values.telemetry.ports.http }}
periodSeconds: {{ $.Values.readinessProbe.periodSeconds }}
initialDelaySeconds: {{ $.Values.readinessProbe.initialDelaySeconds }}
timeoutSeconds: {{ $.Values.readinessProbe.timeoutSeconds }}
{{- end }}
{{- if $.Values.openTelemetryIngest.containerSecurityContext }}
{{- if $.Values.telemetry.containerSecurityContext }}
securityContext:
{{- toYaml $.Values.openTelemetryIngest.containerSecurityContext | nindent 12 }}
{{- toYaml $.Values.telemetry.containerSecurityContext | nindent 12 }}
{{- else if $.Values.containerSecurityContext }}
securityContext:
{{- toYaml $.Values.containerSecurityContext | nindent 12 }}
@ -106,32 +106,31 @@ spec:
{{- include "oneuptime.env.common" . | nindent 12 }}
{{- include "oneuptime.env.runtime" (dict "Values" $.Values "Release" $.Release) | nindent 12 }}
- name: PORT
value: {{ $.Values.openTelemetryIngest.ports.http | quote }}
value: {{ $.Values.telemetry.ports.http | quote }}
- name: DISABLE_TELEMETRY
value: {{ $.Values.openTelemetryIngest.disableTelemetryCollection | quote }}
- name: OPEN_TELEMETRY_INGEST_CONCURRENCY
value: {{ $.Values.openTelemetryIngest.concurrency | squote }}
value: {{ $.Values.telemetry.disableTelemetryCollection | quote }}
- name: TELEMETRY_CONCURRENCY
value: {{ $.Values.telemetry.concurrency | squote }}
ports:
- containerPort: {{ $.Values.openTelemetryIngest.ports.http }}
- containerPort: {{ $.Values.telemetry.ports.http }}
protocol: TCP
name: http
{{- if $.Values.openTelemetryIngest.resources }}
{{- if $.Values.telemetry.resources }}
resources:
{{- toYaml $.Values.openTelemetryIngest.resources | nindent 12 }}
{{- toYaml $.Values.telemetry.resources | nindent 12 }}
{{- end }}
restartPolicy: {{ $.Values.image.restartPolicy }}
---
# OneUptime open-telemetry-ingest Service
{{- $openTelemetryIngestPorts := dict "port" $.Values.openTelemetryIngest.ports.http -}}
{{- $openTelemetryIngestServiceArgs := dict "ServiceName" "open-telemetry-ingest" "Ports" $openTelemetryIngestPorts "Release" $.Release "Values" $.Values -}}
{{- include "oneuptime.service" $openTelemetryIngestServiceArgs }}
# OneUptime telemetry Service
{{- $telemetryPorts := dict "port" $.Values.telemetry.ports.http -}}
{{- $telemetryServiceArgs := dict "ServiceName" "telemetry" "Ports" $telemetryPorts "Release" $.Release "Values" $.Values -}}
{{- include "oneuptime.service" $telemetryServiceArgs }}
---
# OneUptime open-telemetry-ingest autoscaler
{{- if and (not $.Values.openTelemetryIngest.disableAutoscaler) (not (and $.Values.keda.enabled $.Values.openTelemetryIngest.keda.enabled)) }}
{{- $openTelemetryIngestAutoScalerArgs := dict "ServiceName" "open-telemetry-ingest" "Release" $.Release "Values" $.Values -}}
{{- include "oneuptime.autoscaler" $openTelemetryIngestAutoScalerArgs }}
{{- end }}
---
# OneUptime telemetry autoscaler
{{- if and (not $.Values.telemetry.disableAutoscaler) (not (and $.Values.keda.enabled $.Values.telemetry.keda.enabled)) }}
{{- $telemetryAutoScalerArgs := dict "ServiceName" "telemetry" "Release" $.Release "Values" $.Values -}}
{{- include "oneuptime.autoscaler" $telemetryAutoScalerArgs }}
{{- end }}

View file

@ -1735,7 +1735,7 @@
},
"additionalProperties": false
},
"openTelemetryIngest": {
"telemetry": {
"type": "object",
"properties": {
"replicaCount": {

View file

@ -697,7 +697,7 @@ probeIngest:
# Cooldown period after scaling (in seconds)
cooldownPeriod: 300
openTelemetryIngest:
telemetry:
replicaCount: 1
disableTelemetryCollection: false
disableAutoscaler: false

View file

@ -34,7 +34,7 @@ receivers:
exporters:
otlphttp:
endpoint: "http://{{ .Env.SERVER_OPEN_TELEMETRY_INGEST_HOSTNAME }}:{{ .Env.OPEN_TELEMETRY_INGEST_PORT }}/otlp"
endpoint: "http://{{ .Env.SERVER_TELEMETRY_HOSTNAME }}:{{ .Env.TELEMETRY_PORT }}/otlp"
headers: {"Content-Type": "application/json"}
auth:
authenticator: headers_setter

View file

@ -1,50 +0,0 @@
let concurrency: string | number =
process.env["OPEN_TELEMETRY_INGEST_CONCURRENCY"] || 100;
if (typeof concurrency === "string") {
const parsed: number = parseInt(concurrency, 10);
concurrency = !isNaN(parsed) && parsed > 0 ? parsed : 100;
}
export const OPEN_TELEMETRY_INGEST_CONCURRENCY: number = concurrency as number;
type ParseBatchSizeFunction = (envKey: string, defaultValue: number) => number;
const parseBatchSize: ParseBatchSizeFunction = (
envKey: string,
defaultValue: number,
): number => {
const value: string | undefined = process.env[envKey];
if (!value) {
return defaultValue;
}
const parsed: number = parseInt(value, 10);
if (isNaN(parsed) || parsed <= 0) {
return defaultValue;
}
return parsed;
};
export const OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE: number =
parseBatchSize("OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE", 1000);
export const OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE: number =
parseBatchSize("OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE", 750);
export const OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE: number =
parseBatchSize("OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE", 750);
export const OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE: number =
parseBatchSize("OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE", 500);
/*
* Some telemetry batches can be large and take >30s (BullMQ default lock) to process.
* Allow configuring a longer lock duration (in ms) to avoid premature stall detection.
*/
// 10 minutes.
export const OPEN_TELEMETRY_INGEST_LOCK_DURATION_MS: number = 10 * 60 * 1000;

49
Telemetry/Config.ts Normal file
View file

@ -0,0 +1,49 @@
let concurrency: string | number = process.env["TELEMETRY_CONCURRENCY"] || 100;
if (typeof concurrency === "string") {
const parsed: number = parseInt(concurrency, 10);
concurrency = !isNaN(parsed) && parsed > 0 ? parsed : 100;
}
export const TELEMETRY_CONCURRENCY: number = concurrency as number;
type ParseBatchSizeFunction = (envKey: string, defaultValue: number) => number;
const parseBatchSize: ParseBatchSizeFunction = (
envKey: string,
defaultValue: number,
): number => {
const value: string | undefined = process.env[envKey];
if (!value) {
return defaultValue;
}
const parsed: number = parseInt(value, 10);
if (isNaN(parsed) || parsed <= 0) {
return defaultValue;
}
return parsed;
};
export const TELEMETRY_LOG_FLUSH_BATCH_SIZE: number =
parseBatchSize("TELEMETRY_LOG_FLUSH_BATCH_SIZE", 1000);
export const TELEMETRY_METRIC_FLUSH_BATCH_SIZE: number =
parseBatchSize("TELEMETRY_METRIC_FLUSH_BATCH_SIZE", 750);
export const TELEMETRY_TRACE_FLUSH_BATCH_SIZE: number =
parseBatchSize("TELEMETRY_TRACE_FLUSH_BATCH_SIZE", 750);
export const TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE: number =
parseBatchSize("TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE", 500);
/*
* Some telemetry batches can be large and take >30s (BullMQ default lock) to process.
* Allow configuring a longer lock duration (in ms) to avoid premature stall detection.
*/
// 10 minutes.
export const TELEMETRY_LOCK_DURATION_MS: number = 10 * 60 * 1000;

View file

@ -1,5 +1,5 @@
#
# OneUptime-OpenTelemetryIngest Dockerfile
# OneUptime-Telemetry Dockerfile
#
# Pull base image nodejs image.
@ -65,11 +65,11 @@ WORKDIR /usr/src/app
RUN PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=0 npx playwright install --with-deps
# Install app dependencies
COPY ./OpenTelemetryIngest/package*.json /usr/src/app/
COPY ./Telemetry/package*.json /usr/src/app/
RUN npm install
# Expose ports.
# - 3403: OneUptime-OpenTelemetryIngest
# - 3403: OneUptime-Telemetry
EXPOSE 3403
{{ if eq .Env.ENVIRONMENT "development" }}
@ -77,7 +77,7 @@ EXPOSE 3403
CMD [ "npm", "run", "dev" ]
{{ else }}
# Copy app source
COPY ./OpenTelemetryIngest /usr/src/app
COPY ./Telemetry /usr/src/app
# Bundle app source
RUN npm run compile
# Set permission to write logs and cache in case container run as non root

View file

@ -13,13 +13,13 @@ import Realtime from "Common/Server/Utils/Realtime";
import App from "Common/Server/Utils/StartServer";
import Telemetry from "Common/Server/Utils/Telemetry";
import "./Jobs/TelemetryIngest/ProcessTelemetry";
import { OPEN_TELEMETRY_INGEST_CONCURRENCY } from "./Config";
import { TELEMETRY_CONCURRENCY } from "./Config";
import type { StatusAPIOptions } from "Common/Server/API/StatusAPI";
import "ejs";
const app: ExpressApplication = Express.getExpressApp();
const APP_NAME: string = "open-telemetry-ingest";
const APP_NAME: string = "telemetry";
const ROUTE_PREFIXES: Array<string> = [`/${APP_NAME}`, "/"];
app.use(ROUTE_PREFIXES, OTelIngestAPI);
@ -44,7 +44,7 @@ const init: PromiseVoidFunction = async (): Promise<void> => {
});
logger.info(
`OpenTelemetryIngest Service - Queue concurrency: ${OPEN_TELEMETRY_INGEST_CONCURRENCY}`,
`Telemetry Service - Queue concurrency: ${TELEMETRY_CONCURRENCY}`,
);
// init the app

View file

@ -13,8 +13,8 @@ import { QueueJob, QueueName } from "Common/Server/Infrastructure/Queue";
import QueueWorker from "Common/Server/Infrastructure/QueueWorker";
import ObjectID from "Common/Types/ObjectID";
import {
OPEN_TELEMETRY_INGEST_CONCURRENCY,
OPEN_TELEMETRY_INGEST_LOCK_DURATION_MS,
TELEMETRY_CONCURRENCY,
TELEMETRY_LOCK_DURATION_MS,
} from "../../Config";
// Set up the unified worker for processing telemetry queue
@ -81,8 +81,8 @@ QueueWorker.getWorker(
}
},
{
concurrency: OPEN_TELEMETRY_INGEST_CONCURRENCY,
lockDuration: OPEN_TELEMETRY_INGEST_LOCK_DURATION_MS,
concurrency: TELEMETRY_CONCURRENCY,
lockDuration: TELEMETRY_LOCK_DURATION_MS,
// allow a couple of stall recoveries before marking failed if genuinely stuck
maxStalledCount: 2,
},

View file

@ -22,7 +22,7 @@ import OTelIngestService, {
import LogService from "Common/Server/Services/LogService";
import OtelIngestBaseService from "./OtelIngestBaseService";
import FluentLogsQueueService from "./Queue/FluentLogsQueueService";
import { OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE } from "../Config";
import { TELEMETRY_LOG_FLUSH_BATCH_SIZE } from "../Config";
export default class FluentLogsIngestService extends OtelIngestBaseService {
private static readonly DEFAULT_SERVICE_NAME: string = "Fluentd";
@ -150,7 +150,7 @@ export default class FluentLogsIngestService extends OtelIngestBaseService {
dbLogs.push(logRow);
processed++;
if (dbLogs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE) {
if (dbLogs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE) {
await this.flushLogsBuffer(dbLogs);
}
} catch (processingError) {
@ -302,12 +302,12 @@ export default class FluentLogsIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
logs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE ||
logs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE ||
(force && logs.length > 0)
) {
const batchSize: number = Math.min(
logs.length,
OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE,
TELEMETRY_LOG_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = logs.splice(0, batchSize);

View file

@ -22,7 +22,7 @@ import logger from "Common/Server/Utils/Logger";
import CaptureSpan from "Common/Server/Utils/Telemetry/CaptureSpan";
import LogsQueueService from "./Queue/LogsQueueService";
import OtelIngestBaseService from "./OtelIngestBaseService";
import { OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE } from "../Config";
import { TELEMETRY_LOG_FLUSH_BATCH_SIZE } from "../Config";
import LogService from "Common/Server/Services/LogService";
export default class OtelLogsIngestService extends OtelIngestBaseService {
@ -31,12 +31,12 @@ export default class OtelLogsIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
logs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE ||
logs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE ||
(force && logs.length > 0)
) {
const batchSize: number = Math.min(
logs.length,
OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE,
TELEMETRY_LOG_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = logs.splice(0, batchSize);
@ -307,7 +307,7 @@ export default class OtelLogsIngestService extends OtelIngestBaseService {
totalLogsProcessed++;
if (
dbLogs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE
dbLogs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE
) {
await this.flushLogsBuffer(dbLogs);
}

View file

@ -26,7 +26,7 @@ import MetricType from "Common/Models/DatabaseModels/MetricType";
import TelemetryService from "Common/Models/DatabaseModels/TelemetryService";
import MetricsQueueService from "./Queue/MetricsQueueService";
import OtelIngestBaseService from "./OtelIngestBaseService";
import { OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE } from "../Config";
import { TELEMETRY_METRIC_FLUSH_BATCH_SIZE } from "../Config";
import OneUptimeDate from "Common/Types/Date";
import MetricService from "Common/Server/Services/MetricService";
@ -43,12 +43,12 @@ export default class OtelMetricsIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
metrics.length >= OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE ||
metrics.length >= TELEMETRY_METRIC_FLUSH_BATCH_SIZE ||
(force && metrics.length > 0)
) {
const batchSize: number = Math.min(
metrics.length,
OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE,
TELEMETRY_METRIC_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = metrics.splice(0, batchSize);
@ -313,7 +313,7 @@ export default class OtelMetricsIngestService extends OtelIngestBaseService {
if (
dbMetrics.length >=
OPEN_TELEMETRY_INGEST_METRIC_FLUSH_BATCH_SIZE
TELEMETRY_METRIC_FLUSH_BATCH_SIZE
) {
await this.flushMetricsBuffer(dbMetrics);
}

View file

@ -30,8 +30,8 @@ import Text from "Common/Types/Text";
import TracesQueueService from "./Queue/TracesQueueService";
import OtelIngestBaseService from "./OtelIngestBaseService";
import {
OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE,
OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE,
TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE,
TELEMETRY_TRACE_FLUSH_BATCH_SIZE,
} from "../Config";
type ParsedUnixNano = {
@ -63,12 +63,12 @@ export default class OtelTracesIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
spans.length >= OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE ||
spans.length >= TELEMETRY_TRACE_FLUSH_BATCH_SIZE ||
(force && spans.length > 0)
) {
const batchSize: number = Math.min(
spans.length,
OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE,
TELEMETRY_TRACE_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = spans.splice(0, batchSize);
@ -85,12 +85,12 @@ export default class OtelTracesIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
exceptions.length >= OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE ||
exceptions.length >= TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE ||
(force && exceptions.length > 0)
) {
const batchSize: number = Math.min(
exceptions.length,
OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE,
TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = exceptions.splice(0, batchSize);
@ -363,14 +363,14 @@ export default class OtelTracesIngestService extends OtelIngestBaseService {
if (
dbSpans.length >=
OPEN_TELEMETRY_INGEST_TRACE_FLUSH_BATCH_SIZE
TELEMETRY_TRACE_FLUSH_BATCH_SIZE
) {
await this.flushSpansBuffer(dbSpans);
}
if (
dbExceptions.length >=
OPEN_TELEMETRY_INGEST_EXCEPTION_FLUSH_BATCH_SIZE
TELEMETRY_EXCEPTION_FLUSH_BATCH_SIZE
) {
await this.flushExceptionsBuffer(dbExceptions);
}

View file

@ -22,7 +22,7 @@ import LogService from "Common/Server/Services/LogService";
import logger from "Common/Server/Utils/Logger";
import OtelIngestBaseService from "./OtelIngestBaseService";
import SyslogQueueService from "./Queue/SyslogQueueService";
import { OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE } from "../Config";
import { TELEMETRY_LOG_FLUSH_BATCH_SIZE } from "../Config";
import {
ParsedSyslogMessage,
ParsedSyslogStructuredData,
@ -216,7 +216,7 @@ export default class SyslogIngestService extends OtelIngestBaseService {
dbLogs.push(logRow);
processed++;
if (dbLogs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE) {
if (dbLogs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE) {
await this.flushLogsBuffer(dbLogs);
}
} catch (processingError) {
@ -394,12 +394,12 @@ export default class SyslogIngestService extends OtelIngestBaseService {
force: boolean = false,
): Promise<void> {
while (
logs.length >= OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE ||
logs.length >= TELEMETRY_LOG_FLUSH_BATCH_SIZE ||
(force && logs.length > 0)
) {
const batchSize: number = Math.min(
logs.length,
OPEN_TELEMETRY_INGEST_LOG_FLUSH_BATCH_SIZE,
TELEMETRY_LOG_FLUSH_BATCH_SIZE,
);
const batch: Array<JSONObject> = logs.splice(0, batchSize);

View file

@ -49,7 +49,7 @@ bash $scriptDir/endpoint-status.sh "Admin Dashboard (Ready Check)" $HOST_TO_CHEC
bash $scriptDir/endpoint-status.sh "ProbeIngest (Ready Check)" $HOST_TO_CHECK/probe-ingest/status/ready
bash $scriptDir/endpoint-status.sh "OpenTelemetry Ingest (Ready Check)" $HOST_TO_CHECK/open-telemetry-ingest/status/ready
bash $scriptDir/endpoint-status.sh "Telemetry (Ready Check)" $HOST_TO_CHECK/telemetry/status/ready
bash $scriptDir/endpoint-status.sh "ProbeIngest (Status Check)" $HOST_TO_CHECK/probe-ingest/status

View file

@ -92,15 +92,14 @@ REDIS_TLS_SENTINEL_MODE=false
# Hostnames. Usually does not need to change.
PROBE_INGEST_HOSTNAME=probe-ingest:3400
FLUENT_LOGS_HOSTNAME=open-telemetry-ingest:3403
INCOMING_REQUEST_INGEST_HOSTNAME=incoming-request-ingest:3402
OPEN_TELEMETRY_INGEST_HOSTNAME=otel-telemetry-ingest:3403
TELEMETRY_HOSTNAME=telemetry:3403
SERVER_ACCOUNTS_HOSTNAME=accounts
SERVER_APP_HOSTNAME=app
SERVER_PROBE_INGEST_HOSTNAME=probe-ingest
SERVER_SERVER_MONITOR_INGEST_HOSTNAME=server-monitor-ingest
SERVER_OPEN_TELEMETRY_INGEST_HOSTNAME=otel-telemetry-ingest
SERVER_TELEMETRY_HOSTNAME=telemetry
SERVER_INCOMING_REQUEST_INGEST_HOSTNAME=incoming-request-ingest
SERVER_TEST_SERVER_HOSTNAME=test-server
SERVER_STATUS_PAGE_HOSTNAME=status-page
@ -116,7 +115,7 @@ SERVER_DOCS_HOSTNAME=docs
APP_PORT=3002
PROBE_INGEST_PORT=3400
SERVER_MONITOR_INGEST_PORT=3404
OPEN_TELEMETRY_INGEST_PORT=3403
TELEMETRY_PORT=3403
INCOMING_REQUEST_INGEST_PORT=3402
TEST_SERVER_PORT=3800
ACCOUNTS_PORT=3003
@ -244,7 +243,7 @@ WORKFLOW_TIMEOUT_IN_MS=5000
# Concurrency settings
# Max number of telemetry jobs processed concurrently by OpenTelemetry Ingest worker
OPEN_TELEMETRY_INGEST_CONCURRENCY=100
TELEMETRY_CONCURRENCY=100
# Max number of jobs processed concurrently by Fluent Logs worker
FLUENT_LOGS_CONCURRENCY=100
@ -313,7 +312,7 @@ LLM_SERVER_HUGGINGFACE_MODEL_NAME=
DISABLE_TELEMETRY_FOR_ACCOUNTS=true
DISABLE_TELEMETRY_FOR_APP=true
DISABLE_TELEMETRY_FOR_PROBE_INGEST=true
DISABLE_TELEMETRY_FOR_OPEN_TELEMETRY_INGEST=true
DISABLE_TELEMETRY_FOR_TELEMETRY=true
DISABLE_TELEMETRY_FOR_FLUENT_LOGS=true
DISABLE_TELEMETRY_FOR_INCOMING_REQUEST_INGEST=true
DISABLE_TELEMETRY_FOR_TEST_SERVER=true

View file

@ -33,7 +33,7 @@ x-common-variables: &common-variables
SERVER_APP_HOSTNAME: app
SERVER_ALERT_HOSTNAME: alert
SERVER_PROBE_INGEST_HOSTNAME: probe-ingest
SERVER_OPEN_TELEMETRY_INGEST_HOSTNAME: open-telemetry-ingest
SERVER_TELEMETRY_HOSTNAME: telemetry
SERVER_INCOMING_REQUEST_INGEST_HOSTNAME: incoming-request-ingest
SERVER_TEST_SERVER_HOSTNAME: test-server
SERVER_STATUS_PAGE_HOSTNAME: status-page
@ -52,7 +52,7 @@ x-common-variables: &common-variables
APP_PORT: ${APP_PORT}
HOME_PORT: ${HOME_PORT}
PROBE_INGEST_PORT: ${PROBE_INGEST_PORT}
OPEN_TELEMETRY_INGEST_PORT: ${OPEN_TELEMETRY_INGEST_PORT}
TELEMETRY_PORT: ${TELEMETRY_PORT}
INCOMING_REQUEST_INGEST_PORT: ${INCOMING_REQUEST_INGEST_PORT}
TEST_SERVER_PORT: ${TEST_SERVER_PORT}
ACCOUNTS_PORT: ${ACCOUNTS_PORT}
@ -471,16 +471,16 @@ services:
options:
max-size: "1000m"
open-telemetry-ingest:
telemetry:
networks:
- oneuptime
restart: always
environment:
<<: *common-runtime-variables
PORT: ${OPEN_TELEMETRY_INGEST_PORT}
DISABLE_TELEMETRY: ${DISABLE_TELEMETRY_FOR_OPEN_TELEMETRY_INGEST}
PORT: ${TELEMETRY_PORT}
DISABLE_TELEMETRY: ${DISABLE_TELEMETRY_FOR_TELEMETRY}
# Max concurrent telemetry jobs the worker will process
OPEN_TELEMETRY_INGEST_CONCURRENCY: ${OPEN_TELEMETRY_INGEST_CONCURRENCY}
TELEMETRY_CONCURRENCY: ${TELEMETRY_CONCURRENCY}
logging:
driver: "local"
options:

View file

@ -358,9 +358,9 @@ services:
context: .
dockerfile: ./ServerMonitorIngest/Dockerfile
open-telemetry-ingest:
telemetry:
volumes:
- ./OpenTelemetryIngest:/usr/src/app:cached
- ./Telemetry:/usr/src/app:cached
# Use node modules of the container and not host system.
# https://stackoverflow.com/questions/29181032/add-a-volume-to-docker-but-exclude-a-sub-folder
- /usr/src/app/node_modules/
@ -370,11 +370,11 @@ services:
- '9938:9229' # Debugging port.
extends:
file: ./docker-compose.base.yml
service: open-telemetry-ingest
service: telemetry
build:
network: host
context: .
dockerfile: ./OpenTelemetryIngest/Dockerfile
dockerfile: ./Telemetry/Dockerfile
incoming-request-ingest:
volumes:

View file

@ -114,11 +114,11 @@ services:
file: ./docker-compose.base.yml
service: server-monitor-ingest
open-telemetry-ingest:
image: oneuptime/open-telemetry-ingest:${APP_TAG}
telemetry:
image: oneuptime/telemetry:${APP_TAG}
extends:
file: ./docker-compose.base.yml
service: open-telemetry-ingest
service: telemetry
incoming-request-ingest:
image: oneuptime/incoming-request-ingest:${APP_TAG}