- Add Telemetry service entrypoint
- Telemetry/Index.ts: app bootstrap, routes mounting, infrastructure init and Telemetry SDK init.
- Unified queue + worker
- Telemetry/Jobs/TelemetryIngest/ProcessTelemetry.ts: single worker that dispatches queued jobs to specific processors (logs, traces, metrics, syslog, fluent logs).
- Telemetry/Services/Queue/TelemetryQueueService.ts: central queue API and job payload types.
- Per-type Queue wrappers (LogsQueueService, MetricsQueueService, TracesQueueService, FluentLogsQueueService, SyslogQueueService).
- OpenTelemetry ingestion middleware and proto support
- Telemetry/Middleware/OtelRequestMiddleware.ts: detect OTLP endpoint (logs/traces/metrics), decode protobuf bodies using protobufjs and set product type.
- Telemetry/ProtoFiles/OTel/v1/*.proto: include common.proto, logs.proto, metrics.proto, resource.proto, traces.proto for OTLP v1 messages.
- Ingest services
- Telemetry/Services/OtelLogsIngestService.ts: parse incoming OTLP logs, map attributes, convert timestamps, batch insert logs.
- Telemetry/Services/OtelTracesIngestService.ts: parse OTLP traces, build span rows, extract exceptions, batch insert spans and exceptions, save telemetry exception summary.
- Telemetry/Services/OtelMetricsIngestService.ts: parse OTLP metrics, normalize datapoints, batch insert metrics and index metric name -> service map.
- Telemetry/Services/SyslogIngestService.ts: syslog ingestion endpoints, parser integration, map syslog fields to attributes and logs.
- Telemetry/Services/FluentLogsIngestService.ts: ingest Fluentd style logs, normalize entries and insert into log backend.
- Telemetry/Services/OtelIngestBaseService.ts: helpers to resolve service name from attributes/headers.
- Syslog parser and utilities
- Telemetry/Utils/SyslogParser.ts: robust RFC5424 and RFC3164 parser, structured data extraction and sanitization.
- Telemetry/Tests/Utils/SyslogParser.test.ts: unit tests for parser behavior.
- Telemetry exception utilities
- Telemetry/Utils/Exception.ts: generate exception fingerprint and upsert telemetry exception status (saveOrUpdateTelemetryException).
- Queue & job integration
- New integration with Common/Server/Infrastructure/Queue and QueueWorker, job id generation and telemetry job types.
- Telemetry services add ingestion jobs instead of processing synchronously.
- Config, build and dev tooling
- Add Telemetry/package.json, package-lock.json, tsconfig.json, nodemon.json, jest config.
- New script configs and dependencies (protobufjs, ts-node, jest, nodemon, etc).
- Docker / environment updates
- docker-compose.base.yml, docker-compose.dev.yml, docker-compose.yml: rename service from open-telemetry-ingest -> telemetry and wire TELEMETRY_* envs.
- config.example.env: rename and consolidate environment variables (OPEN_TELEMETRY_* -> TELEMETRY_*, update hostnames and ports).
- Tests/Scripts/status-check.sh: update ready-check target to telemetry/status/ready.
- Other
- Telemetry/Services/Queue/*: export helpers and legacy-compatible job interface shims.
- Memory cleanup and batching safeguards across ingest services.
- Logging and capture spans added to key code paths.
BREAKING CHANGES / MIGRATION NOTES:
- Environment variables and docker service names changed:
- Replace OPEN_TELEMETRY_... vars with TELEMETRY_... (PORT, HOSTNAME, CONCURRENCY, DISABLE_TELEMETRY, etc).
- docker-compose entries moved from "open-telemetry-ingest" to "telemetry" and image name changed to oneuptime/telemetry.
- Update any deployment automation and monitoring checks referencing the old service name or endpoints.
- Consumers: OTLP endpoints and behavior remain supported, but ingestion is now queued and processed asynchronously.
Testing / Running:
- Install deps in Telemetry/ (npm install) after syncing Common workspace.
- Run dev: npx nodemon (nodemon.json) or build & start using provided scripts.
- Run tests with jest (Telemetry test suite includes SyslogParser unit tests).
Files added/modified (high level):
- Added many files under Telemetry/: Index, Jobs, Middleware, ProtoFiles, Services, Utils, Tests, package and config artifacts.
- Modified docker-compose.* and config.example.env and status check script to use new TELEMETRY service/vars.
- Move Fluent/Fluent Bit logs ingestion into open-telemetry-ingest:
- Add OpenTelemetryIngest/API/Fluent.ts (routes for /fluentd and queue endpoints)
- Add Queue service, job worker and processor:
- OpenTelemetryIngest/Services/Queue/FluentLogsQueueService.ts
- OpenTelemetryIngest/Jobs/TelemetryIngest/ProcessFluentLogs.ts
- Register Fluent API and job processing in OpenTelemetryIngest/Index.ts
- Introduce QueueName.FluentLogs and related queue usage
- Remove legacy FluentIngest service and configuration:
- Delete fluent-ingest docker-compose/dev/base entries and docker-compose.yml service
- Remove fluent-ingest related helm values, KEDA scaledobject, ingress host and schema entries
- Remove FLUENTD_HOST env/values and replace FLUENT_INGEST_HOSTNAME -> FLUENT_LOGS_HOSTNAME (pointing to open-telemetry-ingest)
- Update config.example.env keys (FLUENT_LOGS_CONCURRENCY, DISABLE_TELEMETRY_FOR_FLUENT_LOGS)
- Remove FluentIngestRoute and FLUENT_INGEST_URL/hostname usages from UI config/templates
- Remove VSCode launch debug config for Fluent Ingest
- Remove Fluent ingest E2E status check entry in Tests/Scripts/status-check.sh
- Update docs/architecture diagram and Helm templates to reflect "FluentLogs" / Fluent Bit flow
- Misc:
- Remove FLUENTD_HOST environment injection from docker-compose.base.yml
- Cleanup related values.schema.json and values.yaml entries
This consolidates log ingestion under the OpenTelemetry ingest service and removes the separate FluentIngest service and its configuration.
The status-check.sh script has been modified to update the endpoint URLs for checking the status of the Dashboard, Status Page, and Accounts services. The URLs now include "/status/ready" at the end, ensuring that the services are ready to handle requests. This change improves the accuracy and reliability of the status check script.
The status-check.sh script has been modified to update the endpoint URLs for checking the status of the Dashboard and Status Page services. The URLs now include "/status/ready" at the end, ensuring that the services are ready to handle requests. This change improves the accuracy and reliability of the status check script.
The status-check.sh script has been updated to modify the endpoint URLs for checking the status of various services. The URLs now include "/ready" at the end, indicating that the services are ready to handle requests. This change ensures that the status check accurately reflects the readiness of the services and improves the reliability of the script.
This code change updates the endpoint-status.sh script to improve the error message when the endpoint returns an HTTP status other than 200. The previous message mentioned that it usually takes a few minutes for the app to boot, which is not accurate. The updated message removes this misleading information and provides a more accurate description of the retry behavior.