Analysis of Key Features in Network Traffic and Research into Novel Identification Techniques

Date: May 18, 2023 Author: Hasan Topic: Network Security Research

Table of Contents

Abstract
1. Introduction
2. Analysis and Identification Challenges Faced by Network Infrastructure
3. Key Identification Characteristics of Specific Communication Protocols
4. Debunking Common Misconceptions and TLS Connection Risk Assessment
- 4.1 Debunking Common Misconceptions
- 4.2 TLS Connection Risk Assessment
5. Novel Identification Technique: Application of "Client-Side Reverse Probing"
6. Analysis of High-Performance Communication Protocol Technical Characteristics
7. Conclusion

Disclaimer: The technical analysis and discussion presented in this paper are strictly for academic research purposes, aimed at enhancing the understanding of network protocol technological characteristics. Any network activities must comply with relevant laws and regulations, and no illegal actions should be undertaken. The author does not assume responsibility for actions performed by any individual based on the content of this document.

Abstract

In the increasingly complex network environment, certain communication traffic may employ specific protocol encapsulation to accommodate disparate network management or security policy requirements. This paper aims to deeply analyze the essential technical characteristics used to identify specific communication protocols (e.g., communication patterns of protocols like MTProto FakeTLS and Shadow TLS under specific configurations) and proposes a novel identification technique oriented towards network infrastructure (such as network monitoring devices and Deep Packet Inspection systems). The paper first outlines the challenges network infrastructure faces in traffic analysis and identification. Subsequently, it details the typical observable features of specific communication protocols across connection modality, traffic pattern, connection volume, and communication modus operandi. Building on this, the paper distinguishes from the common assertion that TLS encryption leads to total unidentifiability, and assesses the traffic identification risks based on elements such as the TLS handshake, SNI, TLS versions, and server certificates. Finally, this work introduces a novel identification technique named "Client-Side Reverse Probing." This method analyzes the client's response behavior to abnormal communication sequences by simulating a legitimate server interaction process, thereby enabling the identification of communication patterns unique to specific protocols, such as Shadow TLS v2, which employs a unidirectional authentication scheme (where the server authenticates the client, while the client's validation of the server is insufficient). Through practical case studies, the effectiveness of this identification technique is verified, and an overview of communication protocols that exhibit significant performance advantages in specific technical scenarios is provided.

Keywords: Communication Protocol Feature Recognition, Network Monitoring Technology, TLS Fingerprinting, SNI, FakeTLS, Shadow TLS, QUIC, Network Probing Techniques

1. Introduction

Any network communication protocol, regardless of its design, may possess identifiable characteristics due to its inherent technical features.

In today's interconnected landscape, needs such as network traffic management, data transmission optimization, and privacy protection have spurred the evolution of numerous advanced communication protocols. These protocols optimize their communication patterns to navigate traffic management and Deep Packet Inspection (DPI) technologies prevalent in network environments. Network infrastructure components, such as firewalls, load balancers, and Intrusion Detection Systems (IDS), play pivotal roles in analyzing and managing network traffic. However, the evolution of communication protocols also presents significant challenges to traditional traffic identification methods. This research focuses on analyzing the intrinsic technical characteristics of specific communication protocols, evaluating the efficacy of existing identification techniques, and exploring more robust novel techniques to address the increasingly complex problem of network traffic recognition.

The objective of this research is to gain a deep understanding of the underlying technical principles of communication protocols and to enhance technical capabilities in network security analysis. This work does not constitute any encouragement or guidance for illegal network activities. Network communication behavior must always adhere to the laws and regulations of the respective jurisdiction.

2. Analysis and Identification Challenges Faced by Network Infrastructure

Network infrastructure typically employs the following primary strategies for analyzing and identifying traffic:

2.1 Passive Analysis

These methods do not actively intervene in network traffic but analyze it solely by observing existing communication patterns.

Traffic Feature Analysis: Focuses on observable characteristics such as protocol behavior, packet structure, and temporal statistics.
Technical Characteristic Analysis: Utilizes technical features (Proof-of-Concept, PoC) embedded in the implementation of specific protocols, particularly effective for analyzing protocol interactions during the early communication stages (e.g., the TLS ClientHello).

2.2 Active Probing

"Active Probing" in this context is an analytical method used for academic research, referring to the act of sending specific messages and observing the target system's response to infer its true information or protocol type.

Protocol Interaction Simulation: For example, actively probing specific communication protocols (such as Shadowsocks, V2Ray, etc., used here purely as technical analysis examples).
TLS Handshake Analysis: Targeting TLS 1.3 (due to its highly encrypted handshake), attempts are made to identify the protocol by retrieving the server's SSL certificate.

2.3 Packet Replay

Involves re-injecting previously captured legitimate traffic packets into the network and observing whether the target exhibits an anomalous response.

3. Key Identification Characteristics of Specific Communication Protocols

Traffic from certain communication protocols often reveals non-typical communication patterns across multiple dimensions. These patterns become critical indicators for network analysis devices:

3.1 Long-lived Connections

Unlike the short-lived connection pattern of most Web applications (especially HTTP/1.x), many communication protocols employing long-connection mechanisms tend to maintain TCP connections for extended durations, facilitating continuous data transfer and session management.

3.2 Bidirectional Flow

The vast majority of Web applications (e.g., HTTP) follow a "request-response" model, where traffic primarily flows from the client requesting resources to the server responding. This predominantly unidirectional traffic pattern accounts for a high proportion of Web usage. Conversely, certain specific communication protocols (especially when using WebSocket) often exhibit pronounced characteristics of bidirectional data exchange. If a specific communication protocol masquerades as a Web service but shows high bidirectionality, this may serve as an identification clue.

3.3 High Volume Traffic

Despite attempts to camouflage as applications like Web chat rooms, these applications usually do not generate the massive data throughput characteristic of certain specific communication protocols. Therefore, anomalous overall traffic volume is an important detection metric.

3.4 Numerous Connections

If a single client establishes a substantial number of concurrent TCP connections with the same target (via domain name or IP address) within a short period, and these connections do not align with typical Web browsing behavior (e.g., a single webpage creates only one WebSocket connection), the suspicion level rises significantly.

3.5 Point-to-Point Communication Pattern

When a client establishes a large volume of encrypted, long-lived communications with a specific target (via domain name or IP address), particularly without other legitimate justification, this communication pattern appears anomalous and is readily flagged as characteristic of a specific communication protocol.

Note: The above characteristics are generally present in many communication protocols. While some technical characteristic adjustments can be made, fully eliminating these patterns is technically highly difficult.

4. Debunking Common Misconceptions and TLS Connection Risk Assessment

4.1 Debunking Common Misconceptions

"Simple application of TLS enables complete anonymity and unidentifiability": This is a pervasive misconception. While TLS encryption ensures data confidentiality, elements like the handshake process, padding characters, and traffic patterns can still leak information. See the subsequent TLS Connection Risk Assessment for details.
"The impact of whitelisting specific IPs/domains on analysis": This perspective is limited. Firewalls, acting as middleboxes, can observe and process all incoming and outgoing traffic. Theoretically, a firewall possesses the capability to forge source IP addresses. Paradoxically, whitelisting specific IPs or domains, absent a legitimate technical justification, might draw more analytical scrutiny to that communication.
"You might not have been affected by specific network traffic management measures": This experience may stem from the fact that the user's traffic characteristics have not yet been identified by existing detection mechanisms, or the network environment they are in has not implemented concentrated traffic analysis against them.
"Domain name filing in mainland China is a prerequisite for providing certain services, but this does not inherently mean that all cross-border data transmission or network behaviors are free from surveillance."
- The existence of domestic registered domain names (e.g., ".cn") is primarily aimed at complying with local regulations to provide Web services within mainland China. Scrutiny and restrictions on cross-border data transmission are generally less related to the domain's filing status and more dependent on the content and destination of the transmission.

4.2 TLS Connection Risk Assessment

The security of a TLS connection is not absolute; various stages of the handshake process can serve as identification points:

ClientHello: The TLS ClientHello message contains a set of parameters, such as supported cipher suites, TLS version, compression methods, and various extensions (like SNI). The combination of these parameters forms a "TLS Fingerprint." Using the default TLS fingerprint of standard libraries (such as Go's default TLS library, curl, wget), when paired with the traffic characteristics mentioned above, is highly likely to be identified as characteristic of a specific communication protocol.
- Empirical Observation: Studies suggest that network censorship systems in certain regions actively block access to non-whitelisted SNIs using non-famous TLS fingerprints (like curl or wget), while allowing traffic with Go TLS or browser fingerprints to pass.
- Technical Analysis: Generating a more representative or closer-to-legitimate end-device TLS fingerprint by analyzing user behavior data can help evade simple fingerprint-based identification.
- Update Risk: It must be noted that the libraries or samples used to generate the fingerprint might be outdated, and the contained browser fingerprints could be obsolete, which itself could become a new identification risk. Maintaining up-to-date TLS implementations or selecting actively maintained projects remains the direction for technical optimization.
Server Name Indication (SNI): The SNI extension is transmitted in plaintext during the early stages of the TLS handshake, indicating the domain name requested by the client.
- Domain Characteristics: Using free domains, short-term registered domains, or unusual domains carries a higher characteristic risk than using commercially registered, reputable domains.
TLS Connection Version:
- TLS 1.3: High difficulty of identification. The handshake is highly encrypted; all data after the ServerHello is encrypted, and middle-box analysis equipment must actively probe to retrieve the server certificate.
- TLS 1.2: Medium difficulty of identification. The certificate exchange process is transmitted in plaintext, allowing middle-box analysis equipment to readily intercept it for validation.
- SSLv3/TLS 1.0/TLS 1.1: Low difficulty of identification but poses security risks. These older protocols are not recommended due to known security vulnerabilities, and traffic samples for these older versions are extremely rare due to the widespread adoption of TLS 1.2 and later.
TLS Server Certificate:
- Risk Levels:
  - Self-signed Certificate: Highest risk value.
  - Free Certificates (Cloudflare Free SSL, Let's Encrypt): High risk value. Due to their ease of acquisition, they are widely used by many communication protocols and are easily associated with suspicious traffic.
  - Commercial Certificates: SSL certificates purchased separately and at a higher cost are less likely to be used for specific communication protocols.

5. Novel Identification Technique: Application of "Client-Side Reverse Probing" in Protocol Feature Recognition (e.g., MTProto FakeTLS)

5.1 Technical Dissection

Protocols such as MTProto FakeTLS and Shadow TLS (v1/v2) incorporate specific mechanisms in their design:

MTProto FakeTLS Client Validation: MTProto FakeTLS validates the client by calculating an HMAC-SHA256 over the ClientHello message (excluding the random field), using the secret as the key. The randomness of the random field ensures its one-time process capability. If validation fails, it defaults to a real Web server.
- Identification Method: The MTProto TLS handshake process may deviate from standard practices in certain details. It can be identified by capturing the third message sent by the server (`hostCert`) and checking its fixed length (1024–4096 bytes, based on reference to the faketls source code of the mtg project).
Shadow TLS v1 Characteristics: Shadow TLS v1 lacks a client validation mechanism, performing only server validation.
- Identification Method: Due to the absence of client validation, when a standard TLS request is used to analyze the server, the server will process the request directly without requiring a specific response, making identification relatively straightforward.
Shadow TLS v2 Client Validation: Shadow TLS v2 validates the client by having the client retrieve the raw data returned by the server (as the challenge's response) and using the password to compute an HMAC. The security of this mechanism is higher than MTProto FakeTLS because it utilizes the randomness of the data returned by the server and implements "one-time authentication." For analysis equipment, it is impossible to generate the correct challenge response without knowing the password.
- Security Feature: More secure than the MTProto handshake mechanism as it effectively prevents replay or manipulation attacks on the handshake data.

5.2 Novel Identification Technique: Client-Side Reverse Probing

The security design of many communication protocols primarily focuses on "server authentication of the client," while the client's validation of the server is often overlooked, leading to insufficient or unidirectional authentication.

Network analysis devices can leverage this vulnerability by simulating a normal server interaction process and analyzing the client's response behavior to an anomalous communication sequence. This reverse probing mechanism aims to distinguish whether the client's behavior is consistent with a truly legitimate endpoint device.

5.3 Case Study: Identifying Shadow TLS v2 Traffic Characteristics

Protocol Design Reference: Shadow TLS Official Documentation

Identification Principle:

Connection Capture: The network analysis device randomly selects an ongoing TCP connection.
ClientHello Identification: The ClientHello message sent by the client is captured.
SNI Correlation: Based on the SNI field in the ClientHello, the traffic of this connection is correlated to the actual TLS server pointed to by the target SNI.
Handshake and Response Simulation:
- When the Shadow TLS v2 client receives the handshake response from the actual TLS server, it will insert 8 bytes of challenge response (which is the HMAC signature it previously received from the server) before the Application Data.
- At this point, because the target server is a genuine TLS server and not a simulated Shadow TLS server, the Shadow TLS client will complete the TLS handshake through its built-in logic.
Triggering Anomalous Response: However, when the client attempts to send Application Data containing the HMAC signature, the real TLS server will raise an Alert error of type unexpected_message (10) or decode_error (50) due to the format and encryption method not conforming to standard TLS protocols, and subsequently terminate the TCP connection (via FIN or RST).
Successful Identification: By observing the aforementioned anomaly-terminated connection, the communication behavior where the client exhibited Shadow TLS v2 protocol characteristics can be accurately identified. Analyzing this process helps in understanding the protocol's behavior in different network environments, providing a reference for building more robust traffic identification and analysis models.

Key Advantage: Specific communication protocols often generate a large number of concurrent connections. Therefore, randomly sampling only a small number of connections for the client-side reverse probing described above allows for efficient and relatively accurate identification of communication behavior exhibiting this protocol's characteristics.

6. Analysis of High-Performance Communication Protocol Technical Characteristics

From a purely technical research perspective, the QUIC protocol, as an application-layer transport protocol based on UDP, offers characteristics such as faster connection establishment, improved congestion control, and resistance to packet loss. Its unique encapsulation features differentiate its technical characteristics when compared to some traditional TCP-based traffic identification methods.

For users with specific technical requirements (e.g., needing to test the performance and anti-interference capabilities of various transport protocols in a scientific research lab environment), the QUIC protocol and its related implementations (such as Hysteria, TUIC, etc.) warrant further research and technical characteristic observation due to their advanced technical architecture.

Please Note: Any network communication activities must adhere to the laws and regulations of the respective region. This paper discusses protocol characteristics solely from a technical viewpoint and does not constitute advice or encouragement for any illegal network activities.

7. Conclusion

This paper systematically analyzed the key technical characteristics exhibited by specific communication protocols and delved into the risk points of TLS connections during the identification process. Based on an understanding of existing traffic identification methods, we proposed an innovative "Client-Side Reverse Probing" mechanism. This mechanism simulates server interaction to analyze the client's reaction to server responses under specific conditions, thereby enabling the identification of communication patterns of specific protocols, such as Shadow TLS v2. Experiments demonstrated that this method exhibits significant analytical capability in identifying the communication behavior of specific protocols. As network monitoring technology continues to evolve, the security of communication protocols requires ongoing attention and improvement. Future research will continue to explore more advanced traffic identification and analysis techniques while promoting the development of more robust communication protocols.