Skip to content

Unveiling Trae: ByteDance's AI IDE and Its Extensive Data Collection System

Agentic AI Security Series - Edition 1

Welcome to the first article in our Agentic AI Security Series, where Unit 221B examines the security implications of AI-powered development tools. Our comprehensive technical analysis of Trae, ByteDance's AI coding assistant, reveals a sophisticated telemetry architecture operating within the application. While offering free access to Claude 3.7 Sonnet and GPT-4o, the IDE implements an extensive data collection system with multiple communication channels, persistent tracking capabilities, and comprehensive monitoring features. This analysis documents the technical components of this telemetry infrastructure, its network behavior, and the implications for developers considering such tools for their workflows.

Analysis Environment

Trae version 1.0.10282 for macOS

About Trae.ai - From Their Perspective

Trae.ai Interface Screenshot

Trae is an adaptive AI IDE developed by ByteDance that transforms how developers work. Launched in early 2025, it positions itself as a revolutionary coding environment that collaborates with developers to ship faster. The application is distributed through ByteDance's Singapore-based subsidiary SPRING(SG)PTE.LTD.

According to Trae's official descriptions, the platform integrates two major international models - Claude 3.7 Sonnet and GPT-4o - both currently available for free. Trae is designed to compete functionally with products like Cursor and GitHub Copilot while offering a more accessible experience, especially for Chinese-speaking developers, with its bilingual interface supporting both English and Simplified Chinese.

Like other AI-powered IDEs, Trae is built on Microsoft's Visual Studio Code, allowing users to directly migrate plugins and settings from VS Code or Cursor for a seamless transition. The platform is available for both macOS and Windows. This analysis was conducted on the macOS version only. While we haven't analyzed the Windows version, we can safely assume the telemetry metrics will be cross-compatible between platforms, though some implementation differences may occur due to platform-specific requirements.

Trae highlights several key features in its promotional materials:

  • Intelligent Q&A and Assistance: Providing real-time support for code explanations, comments, error fixing, and programming patterns
  • Real-time Code Suggestions: Analyzing code logic to provide optimization suggestions within the editor
  • Code Snippet Generation: Converting natural language descriptions into functional code, including multi-file project-level implementations
  • Builder Mode: Allowing developers to create applications from scratch through AI-guided project setup and task breakdowns
  • Multimodal Interaction: Supporting image upload for design references or error screenshots
  • Integrated Webview: Displaying web pages directly within the IDE for seamless web development
  • Comprehensive IDE Features: Including standard tools for code writing, project management, plugin management, and version control

ByteDance positions Trae as democratizing programming for developers of all skill levels - emphasizing that even those with minimal coding experience can build functional applications with its assistance.

Executive Summary

Trae has rapidly emerged as a formidable competitor to established AI coding assistants like Cursor and GitHub Copilot. Its main selling point? It's completely free - offering Claude 3.7 Sonnet and GPT-4o without any subscription fees. Unit 221B's technical analysis, using network traffic interception, binary analysis, and runtime monitoring, has identified a sophisticated telemetry framework that continuously transmits data to multiple ByteDance servers. From a cybersecurity perspective, this represents a complex data collection operation with significant security and privacy implications.

Key Findings:

  • Persistent connections to minimum 5 unique ByteDance domains, creating multiple data transmission vectors
  • Continuous telemetry transmission even during idle periods, indicating an always-on monitoring system
  • Regular update checks and configuration pulls from ByteDance servers, allowing for dynamic control
  • Permanent device identification via machineId parameter, which appears to be derived from hardware identifiers, enabling long-term tracking capabilities
  • Local WebSocket channels observed collecting full file content, with portions potentially transmitted to remote servers
  • Complex local microservice architecture with redundant pathways for code data, suggesting a deliberate system design
  • JWT tokens and authentication data observed in multiple communication channels, presenting potential credential exposure concerns
  • Use of binary MessagePack format observed in data transfers, adding complexity to security analysis
  • Extensive behavioral tracking mechanisms capable of building detailed user activity profiles
  • Sophisticated data segregation across multiple endpoints, consistent with enterprise-grade telemetry systems

The "Free" Claude and GPT-4o integration includes a sophisticated telemetry framework that continuously transmits data. Our investigation into Trae serves as a case study in how ostensibly free AI developer tools operate as sophisticated data collection systems. The level of instrumentation we've uncovered rivals enterprise-grade telemetry platforms (see definition below), with persistent tracking capabilities that survive even application reinstallation.

Note: In this analysis, we use the term "enterprise-grade telemetry" to describe data collection systems that exhibit characteristics typically found in corporate software: (1) architecturally complex with multiple specialized endpoints, (2) employing data segregation across different categories, (3) implementing redundant collection pathways, (4) featuring centralized management capabilities, (5) utilizing persistent device tracking, and (6) maintaining scheduled, consistent reporting intervals. These characteristics distinguish such systems from simpler telemetry implementations common in consumer applications.

For security professionals and development teams handling sensitive intellectual property, understanding these hidden data flows is critical when evaluating AI coding tools. The techniques documented here reflect industry-wide patterns that merit close attention in any secure development environment.

Technical Architecture of Trae's Telemetry System

Our analysis reveals a multi-layered approach to data collection, with specialized endpoints for different types of telemetry data. The system employs a distributed architecture that segments data collection across domains:

Domain Primary Function Data Type Protocol
mon-va.byteoversea.com Primary telemetry collection Application state, user behavior, performance metrics HTTPS (POST)
maliva-mcs.byteoversea.com Configuration and heartbeat System status, feature flags, configuration HTTPS (POST)
api.trae.ai Core API services Device registration, configuration queries HTTPS (GET/POST)
api-sg-central.trae.ai Regional API services Regional backend interactions, device logs HTTPS (POST)
bytegate-sg.byteintlapi.com Feature gate management Feature flags, workspace configuration HTTPS (POST)
lf3-static.bytednsdoc.com Static resource delivery Control URLs, configuration data HTTPS (GET)

Network Traffic Analysis

Through network traffic analysis, Unit 221B's research team captured and analyzed the communication between Trae and ByteDance servers. The following patterns emerged:

1. Continuous Network Connections

Our traffic analysis confirms that Trae establishes and maintains persistent connections to ByteDance servers, even during periods of complete inactivity by the user. These connections use HTTPS POST requests to transmit compressed data in regular intervals:

// Network connection sample - observed during 20-minute monitoring session
[18:32:26.799] Server connect mon-va.byteoversea.com:443 (23.43.85.213:443)
[::1]:62515: POST https://mon-va.byteoversea.com/monitor_browser/collect/batch/?biz_id=marscode_nativeide_us
 << HTTP/2.0 204 No Content 0b

// Subsequent connection occurs approximately 30 seconds later

2. Device Identification and Registration

Trae implements a permanent device identification system using a machine ID that persists across installations. This ID appears to be a cryptographic hash that is passed in all configuration and API requests:

[18:32:26.799] Server connect api.trae.ai:443 (23.43.85.213:443)
[::1]:62515: GET https://api.trae.ai/icube/api/v1/native/config/query?machineId=[REDACTED_MACHINE_ID]… HTTP/2.0
 << HTTP/2.0 200 OK 1.3k

The machineId parameter (truncated above) is a SHA-256 hash derived from hardware identifiers, creating a persistent fingerprint of the user's system that can be tracked across installations and sessions.

3. Multi-Region Server Architecture

Trae employs a geographically distributed server infrastructure, with region-specific endpoints for different functions. For example, we observed connections to Singapore-based servers for API requests:

[18:32:31.122] Server connect api-sg-central.trae.ai:443 (23.43.85.219:443)
[::1]:62519: POST https://api-sg-central.trae.ai/icube/api/v1/device/log/check HTTP/2.0
 << HTTP/2.0 200 OK 61b

This distributed architecture allows ByteDance to segregate data collection by region while maintaining centralized control through their global infrastructure.

4. WebSocket Communication

In addition to standard HTTPS traffic, our analysis identified persistent WebSocket connections established locally, likely for inter-process communication or real-time feature updates. Deep analysis of these connections revealed the following patterns of internal data handling:

  • AI Completion Endpoint: ws://127.0.0.1:51000/module/aicompletion/0

    This connection uses the Language Server Protocol (LSP) format with JSON payloads for AI code completion. Key data flows include:

    • Client Initialization: Sends extensive system details including hardware specs, OS version, unique device identifiers, and a large teaConfig object containing analytics configuration.
    • Authentication Data: Transmits user info including a JWT token and authentication details, creating potential security risks if intercepted.
    • Complete Document Content: Sends the full content of every opened file during textDocument/didOpen events and again with every change via textDocument/didChange, essentially logging all code edits in their entirety.
    • Continuous Activity Monitoring: Uses $/ping/$/pong messages every 20 seconds to maintain awareness of active editing sessions.

    The volume and sensitivity of data flowing through this local channel closely parallels the external telemetry patterns, suggesting an integrated collection strategy.

  • Manager Endpoint: ws://127.0.0.1:51000/manager/

    This channel uses binary MessagePack format (WebSocket opcode 2) to coordinate a complex microservice architecture running locally:

    • Service Discovery: Reveals a sophisticated internal architecture with multiple services (ai-agent, ckg, aicompletion) running on specific local ports.
    • Redundant Data Transmission: Many of the same sensitive details (user IDs, device info, authentication tokens) sent over the AI completion channel are duplicated here in binary format.
    • Code Snapshot Transmission: Notably, this channel sends update_snapshot requests to the ai-agent service containing complete file content packaged as "snapshots," labeled with created_by: "ai", effectively creating a second pathway for full code content to move through the system.
    • Parallel Authentication Flow: Sends credentials and user details largely redundant with those from the other WebSocket, suggesting a design choice with potential implications for both performance and security models.

    The use of binary MessagePack format provides a layer of obfuscation compared to the plain JSON on the AI completion channel, potentially making manual inspection more difficult.

Example WebSocket Traffic (Decoded)

# Authentication data sent over AI completion WebSocket
{
  "userInfo": {
    "name": "[REDACTED_USERNAME]",
    "token": "[REDACTED_JWT_TOKEN]",
    "region": "US",
    "is_internal": false,
    "user_id": "[REDACTED_USER_ID]"
  },
  "authInfo": {
    "jwtTokenType": "Cloud-IDE-JWT"
  },
  "teaConfig": {
    "icube_uid": "[REDACTED_USER_ID]",
    "user_id": "[REDACTED_USER_ID]",
    "biz_user_id": "[REDACTED_USER_ID]",
    "user_is_login": true,
    "device_id": "[REDACTED_DEVICE_ID]",
    "machine_id": "[REDACTED_MACHINE_ID]",
    "arch": "arm64",
    "system": "darwin",
    "build_version": "1.0.10282",
    "region": "US"
  }
}

# Code content sent over manager WebSocket (MessagePack decoded)
{
  "service": "snapshot",
  "method": "update_snapshot",
  "data": {
    "snapshot_diff_data": {
      "file_infos": [{
        "file_path": "/[REDACTED_PATH]/src/utils/logger.js",
        "current_content": "",
        "new_content": "import fs from 'fs';\nimport path from 'path';\nimport os from 'os';\n\n// Define log levels\nconst LOG_LEVELS = {\n  ERROR: 0,\n  WARN: 1,\n...[FULL FILE CONTENT]",
        "created_by": "ai",
        "file_action": "added"
      }]
    },
    "project_id": "[REDACTED_PROJECT_ID]",
    "chat_session_id": "[REDACTED_SESSION_ID]"
  }
}

This internal architecture documents how ByteDance maintains a system for tracking user activity, code content, and system information. The local WebSocket traffic illustrates how data flows from user editing sessions through internal channels.

Security Consideration: The confirmed internal movement of full document content through multiple channels establishes that the application processes complete file contents locally. While we can directly observe code flowing through these internal channels, the external telemetry's compressed or encrypted nature makes it difficult to conclusively verify whether any of this data is transmitted to remote servers. The transmission of authentication tokens (JWT) and credentials through multiple local channels creates additional attack vectors for potential credential interception. The duplicate transmission of file contents through separate channels represents a design choice that may impact both performance and security models.

Bytedance Telemetry Implementation

Through binary analysis and decompilation of the Trae application, we identified ByteDance's proprietary telemetry framework implemented as part of the Electron application. The primary components include:

Core Telemetry Packages

  • @byted-icube/slardar - Core telemetry collection system1
    • Evidence: Configuration references in network requests
    • Purpose: Error reporting, performance monitoring, analytics
  • @byted-icube/tea - User behavior analytics
    • Evidence: Captured in WebSocket traffic as part of teaConfig object
    • Purpose: Tracking user interactions and activity metrics
    • Note: This appears to be an internal ByteDance package, with limited external documentation available
  • @byted/device-register - Persistent device tracking
    • Evidence: Device ID generation and transmission
    • Purpose: Cross-session user identification

1Note: ByteDance's Slardar telemetry framework has been independently documented in security research as a known component of ByteDance applications, where it's described as enabling "remote configuration, feature flagging, and policy enforcement." Similar telemetry mechanisms have been observed in multiple ByteDance products.

Electron Integration

The telemetry system is deeply integrated into Trae's Electron runtime, as evidenced by this startup log showing the command line parameters and initialization sequence:

[main 2025-03-30T22:46:40.370Z] ICUBE:update scheduleCheckForUpdates, updateInterval -> 60
[18:46:40.370][127.0.0.1:64408] server connect mon-va.byteoversea.com:443 (147.160.190.227:443)
127.0.0.1:64408: POST https://mon-va.byteoversea.com/monitor_browser/collect/batch/

Server Infrastructure

Our network analysis revealed the following server infrastructure supporting Trae's telemetry system:

Domain IP Addresses Location Provider
mon-va.byteoversea.com 147.160.190.227
147.160.190.228
71.18.74.198
71.18.1.198
United States (Virginia) Akamai Edge Network
maliva-mcs.byteoversea.com 184.25.58.58 United States Akamai Edge Network
api.trae.ai 23.43.85.213
23.43.85.216
United States Akamai Edge Network
api-sg-central.trae.ai 23.43.85.219 Singapore Akamai Edge Network

ByteDance leverages Akamai's global edge network to distribute their telemetry collection infrastructure, allowing them to potentially manage data flows across multiple jurisdictions.2 This relationship has been documented in network traffic analysis showing ByteDance services routing through Akamai's content delivery network infrastructure.

2Note: The relationship between ByteDance and Akamai has been confirmed through independent network analysis. In 2024, research from Kentik (reported by Data Center Dynamics) documented TikTok traffic shifting to "third-party CDNs provided by vendors such as Akamai and Fastly," establishing the business relationship between these companies.

Frequency of Data Collection

Our traffic capture revealed the frequency of network connections. The following timeline illustrates the regular pattern of telemetry transmissions:

18:37:09.878 - Telemetry POST to mon-va.byteoversea.com
18:37:39.930 - Server disconnect
18:37:44.366 - Server reconnect, new telemetry POST
18:38:41.014 - Server disconnect
18:39:10.110 - Server reconnect, new telemetry POST
18:39:40.872 - Server disconnect
18:40:09.880 - Server reconnect, new telemetry POST
18:41:07.887 - Server disconnect
18:41:09.880 - Server reconnect, new telemetry POST

This pattern shows Trae connecting to telemetry servers approximately every 30 seconds, even during periods of complete inactivity. Each connection involves a POST request to the telemetry endpoint, indicating regular data transmission regardless of user activity.

Feature Gates and Control Infrastructure

Our analysis uncovered ByteDance's sophisticated feature gate system that controls Trae's functionality remotely:

[18:32:43.459] Server connect bytegate-sg.byteintlapi.com:443 (23.43.85.219:443)
[::1]:62575: POST https://bytegate-sg.byteintlapi.com/api/v1/workspace/feature_gates/values HTTP/2.0
 << HTTP/2.0 200 OK 805b

[::1]:62573: GET https://lf3-static.bytednsdoc.com/obj/eden-cn/lkpkbvsj/ljhwZthlaukjlkulzlp/marketplace/controlUrl.json HTTP/2.0
 << HTTP/2.0 200 OK 11.3k

This system allows ByteDance to remotely enable or disable features, potentially targeting specific regions, users, or workspaces. The control infrastructure provides centralized management of the application's behavior, allowing ByteDance to modify functionality without pushing updates.

Data Flow to Microsoft Services

Interestingly, Trae also connects to Microsoft telemetry services, potentially as part of its Visual Studio Code core:

[18:46:41.393] Server connect mobile.events.data.microsoft.com:443 (20.189.173.18:443)
127.0.0.1:64412: POST https://mobile.events.data.microsoft.com/OneCollector/1.0?cors=true&content-type=application/x-json-stream
              << 200 OK 9b

This creates a scenario where user information may be subject to the telemetry policies of both ByteDance and Microsoft. This dual-layer architecture is consistent with how VSCode-based applications typically operate, as documented by Microsoft, but with the important distinction that ByteDance has added its own extensive telemetry layer on top of Microsoft's baseline collection.

Privacy Consideration: The combination of ByteDance telemetry and Microsoft telemetry creates a multi-layered tracking architecture that may exceed data minimization expectations in some privacy frameworks. While Microsoft publicly commits that user data "is not used to train foundation models," we found no equivalent public commitments from ByteDance regarding limitations on data use from Trae.

Business Model Analysis

The extensive data collection infrastructure in Trae can be analyzed in terms of common technology business models that leverage user data:

  1. Data Collection for AI Development - Free AI coding tools can potentially provide valuable training data from real-world code and development patterns, which could improve future AI models.
  2. User Experience Research - Data about how developers interact with AI tools can inform product improvements and feature development.
  3. Market Research - Usage patterns may provide insights into software development trends and tool preferences.
  4. Freemium Strategy - Free tools with comprehensive telemetry often lead to premium offerings or enterprise solutions in technology business models.

This approach of offering free tools with extensive telemetry capabilities reflects common practices in the technology industry, where data collection often forms part of the value exchange for free services.

API Endpoint Analysis

Our technical analysis identified the primary API endpoints used by Trae and their functions:

Endpoint Function Data Transmitted
/monitor_browser/collect/batch/ Telemetry collection Application state, user actions, performance metrics
/icube/api/v1/native/config/query Configuration retrieval Machine ID, application state, version information
/icube/api/v1/device/log/check Device logging status Device information, log configuration
/api/v1/workspace/feature_gates/values Feature gate configuration Workspace information, user context
/api/sdk/check_update Update checks Current version, build ID, user ID, platform
ws://127.0.0.1:51000/module/aicompletion/0 AI completion WebSocket Full file contents, user credentials, system information, editing activity
ws://127.0.0.1:51000/manager/ Internal service manager MessagePack-encoded snapshots with file contents, credentials, routing between internal services

The API structure reveals a sophisticated platform designed for comprehensive monitoring and remote control. The WebSocket endpoints further reveal how data flows internally between components before potentially being aggregated for external transmission.

TTPs for Enterprise Detection

For security teams looking to identify and monitor Trae's telemetry activities within their enterprise networks, Unit 221B has compiled the following Tactics, Techniques, and Procedures (TTPs) based on our analysis:

Network Indicators

Category Indicators Detection Guidance
Domain Patterns
  • *.byteoversea.com
  • *.trae.ai
  • *.byteintlapi.com
  • *.bytednsdoc.com
Configure network monitoring to flag connections to these domain patterns. While some legitimate ByteDance services may use these domains, in corporate environments without approved ByteDance applications, these connections may indicate unauthorized software.
Specific Endpoints
  • mon-va.byteoversea.com
  • maliva-mcs.byteoversea.com
  • api.trae.ai
  • api-sg-central.trae.ai
  • bytegate-sg.byteintlapi.com
  • lf3-static.bytednsdoc.com
These specific domains are directly associated with Trae's telemetry infrastructure. Monitor or block these endpoints in environments with stringent data security requirements.
IP Addresses
  • 23.43.85.213/24 (Akamai Edge Network)
  • 147.160.190.227/28 (Akamai Edge Network)
  • 71.18.74.198, 71.18.1.198
  • 184.25.58.58
While these IPs are associated with Akamai's edge network and may change, persistent connections to these ranges combined with other indicators may help identify the application.

Traffic Patterns

Behavior Detection Method
Regular 30-second POST intervals to ByteDance domains Monitor for cyclical HTTP POST requests occurring approximately every 30 seconds to the domains listed above, particularly to mon-va.byteoversea.com/monitor_browser/collect/batch/ endpoints.
Persistent connections during idle periods Look for sustained network connections to telemetry endpoints even when workstations are idle, particularly distinguishing from regular background processes.
204 No Content responses The telemetry endpoints often respond with HTTP 204 (No Content) status codes, especially from the monitor_browser/collect/batch endpoint.
Multiple parallel connections Trae establishes connections to multiple ByteDance domains simultaneously, creating a distinctive network signature of parallel connections to different ByteDance infrastructure.
Local WebSocket traffic on port 51000 For endpoint monitoring solutions, watch for local WebSocket connections on port 51000, particularly to ws://127.0.0.1:51000/module/aicompletion/0 and ws://127.0.0.1:51000/manager/

HTTP Request Patterns

Telemetry Collection Requests

POST https://mon-va.byteoversea.com/monitor_browser/collect/batch/?biz_id=marscode_nativeide_us HTTP/2
[Headers]
content-type: application/json
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 KHTML, like Gecko) Trae/1.0.10282 Chrome/110.0.5481.52 Electron/26.6.0 Safari/537.36

[Body contains compressed JSON]

Configuration Requests

GET https://api.trae.ai/icube/api/v1/native/config/query?machineId=[HASH]&platform=darwin&version=1.0.10282&language=en-US HTTP/2
[Headers]
user-agent: Trae/1.0.10282 Electron/26.6.0
accept: application/json

Detection Implementation

Firewall Rules

For organizations looking to restrict Trae's telemetry, consider implementing the following firewall rules:

# Block primary telemetry domains
block domain *.byteoversea.com
block domain *.trae.ai
block domain *.byteintlapi.com
block domain *.bytednsdoc.com

# Block specific high-value endpoints
block domain mon-va.byteoversea.com
block domain maliva-mcs.byteoversea.com
block domain api.trae.ai
block domain api-sg-central.trae.ai
block domain bytegate-sg.byteintlapi.com

SIEM Detection Rules

For Security Information and Event Management (SIEM) systems, implement the following detection logic:

# Pseudo-code for SIEM rule
rule "Trae Telemetry Detection" {
    events:
        $e1 = network_connection(
            destination_domain MATCHES "*.byteoversea.com" OR 
            destination_domain MATCHES "*.trae.ai" OR
            destination_domain MATCHES "*.byteintlapi.com"
        )
    
    timewindow: 2 minutes
    
    condition:
        # Look for the characteristic 30-second interval pattern
        count($e1) >= 3 AND
        # Check for consistent time gaps between events
        (max_time_between($e1) <= 40 seconds AND min_time_between($e1) >= 25 seconds)
    
    actions:
        alert("Potential Trae telemetry traffic detected from " + $e1.source_ip)
}

Process and File Indicators

For endpoint detection systems, monitor for these process and file indicators:

  • Process name: Trae or Trae Helper
  • Binary paths:
    • macOS: /Applications/Trae.app/Contents/MacOS/Trae
  • Configuration storage:
    • macOS: ~/Library/Application Support/Trae/
  • Device identifier storage: Look for files containing persistent machine IDs or device tracking information in the application's storage directories

Data Leakage Prevention

To prevent potential data leakage through Trae, security teams should consider:

  1. Network Segmentation: Place development workstations using AI coding tools in isolated network segments with controlled internet access
  2. Transparent Proxies: Implement TLS inspection for connections to known telemetry domains to monitor data transmission content
  3. Endpoint Controls: Use application allowlisting to prevent unauthorized installation of AI coding tools
  4. Traffic Anomaly Detection: Monitor for abnormal data transfer patterns, especially to CDN and edge networks that might serve as exfiltration paths
  5. Regular Telemetry Audits: Periodically analyze network traffic from development environments to identify new or changing telemetry patterns

These TTPs should help security teams effectively detect and monitor Trae's telemetry activities within their networks, enabling informed decisions about the use of such tools in their environments based on their specific security requirements and risk tolerance.

Conclusion

Our technical analysis of Trae documents an application with comprehensive telemetry capabilities integrated throughout its architecture. The application provides Claude 3.7 Sonnet and GPT-4o AI features while maintaining regular network connections to ByteDance servers via multiple endpoints.

The regular cadence of these connections and the variety of data categories being collected indicates a significant investment in telemetry infrastructure, likely supporting both product improvement and user analytics objectives. This approach reflects common practices in free AI tools where data collection often forms part of the underlying business model.

From a security perspective, Trae represents a case study in how modern AI development tools can function as sophisticated data collection platforms. The multi-layered telemetry architecture, redundant data flows, and persistent tracking capabilities demonstrate advanced techniques that security professionals should be aware of when evaluating tools for sensitive development environments.

The use of AI coding tools requires careful consideration of data collection practices. At Unit 221B, we believe in empowering developers and organizations to make informed decisions about the tools they use. Understanding these data flows is crucial for managing sensitive data and maintaining control over development environments. Evaluating the convenience of AI tools against data handling practices is important, and the choice should be an informed one.

This analysis documents the application behavior observed through technical analysis of network traffic conducted in March 2025. The observed behavior and configurations may change in future software updates.

Unit 221B will continue monitoring the evolution of AI coding assistant tools and their security implications as part of our ongoing threat intelligence work. The balance between AI capability and data collection practices represents one of the key security considerations for development teams in 2025.

Sources and References

This analysis is based on technical analysis conducted by our research team, but several external sources provide supporting context:

  1. ByteDance's Telemetry Infrastructure: Previous security research has documented ByteDance's use of proprietary telemetry frameworks including Slardar, which aligns with our findings. The Citizen Lab's analysis of TikTok and Douyin (2021) documented similar patterns of "first-party trackers" that collect device information, with more extensive data collection in ByteDance's Chinese-market applications.
  2. VSCode Telemetry Architecture: As a VSCode fork, Trae inherits core elements of VSCode's telemetry system, which Microsoft documents. VSCode's multi-layered approach to telemetry includes usage data, error telemetry, and crash reports. Our analysis shows Trae extends this architecture with additional ByteDance-specific components.
  3. WebSocket Protocol Patterns: The internal WebSocket communications using both JSON and MessagePack formats align with known VSCode architecture patterns. Research by progrium's vscode-protocol project has previously documented similar communication patterns in the VSCode protocol.
  4. Microsoft's Commercial Data Collection Practices: Microsoft's published statements on data collection in their AI coding tools provide industry context for comparison. Unlike Microsoft's commitments that "your data is not used to train foundation models," the multi-layered telemetry in Trae lacks similar transparency around data usage limitations.
  5. Industry Context for AI Coding Tools: Media reports in publications like Visual Studio Magazine have confirmed that Trae is indeed developed by ByteDance and distributed through their Singapore subsidiary SPRING(SG)PTE.LTD, corroborating our attribution findings.

For organizations interested in further research on telemetry patterns in AI coding tools, we recommend conducting network traffic analysis with appropriate tools to observe telemetry endpoints and data collection patterns through direct observation.

At Unit 221B, we believe in providing security professionals and developers with technical insights needed to make informed technology decisions. Understanding the data collection capabilities and communication patterns in modern AI development tools is essential when evaluating their appropriate use, particularly in environments with specific security or privacy requirements. This analysis aims to contribute to that understanding by documenting observable telemetry mechanisms in popular development tools.