Fake Base Station Detection
Summary of research project conducted on the POWDER platform
The work is led by Dr. Imtiaz Karim, postdoc at Purdue University, and involves Professor Bertino and graduate student Kazi Mubasshir. The work has been partially supported by the NSF grant 2112471 – AI Institute for Future Edge Networks and Distributed Intelligence (AI-EDGE) and by a supplement by the PAWR program.
In today’s digital age, cellular networks have become an integral part of our lives, and we rely on them for a range of applications, from making phone calls to accessing the internet. However, the proliferation of Stingrays/fake base stations in cellular networks poses a significant security threat, which can lead to serious consequences.
Stingray is the generic name for an electronic surveillance tool that simulates a cell phone tower to force mobile phones and other devices to connect to it instead of to a legitimate cell tower. Phones periodically and automatically broadcast their presence to the cell tower that is nearest to them, so that the phone carrier’s network can provide them with service in that location. A stingray masquerades as a cell tower in order to get phones to ping it instead of legitimate cell towers, and in doing so, reveal the phones’ identity. In the past, it did this by emitting a signal that was stronger than the signal generated by legitimate cell towers around it. To tackle this authentication was introduced but security researchers found that Stingray can still collect user data before the phone determines it’s not communicating with an authentic cell tower and switches to one that is authenticated. In doing so, the phone or other device reveals information about itself and its user to the operator of the stingray. Other common names for the tool are “cell-site simulator”, “IMSI catcher” and “Fake Base Station”.
Attackers can use stingrays/fake base stations as stepping stones to launch various multi-step attacks, such as signal counterfeiting, numb attacks, detach/downgrade attacks, energy depletion attacks, and panic attacks. These attacks can cause severe damage to individuals, organizations, and even governments. Therefore it is very important to detect fake base stations hiding in a cellular network. Prior efforts have used network scanning devices to scan the signals and detect malicious signals in a network to identify fake base stations which are highly impractical due to the huge infrastructure cost required to set up such devices to cover all the possible areas.
One practical solution to detect fake base stations in cellular networks can be in device solutions that can detect fake base stations from the network traces and Machine Learning algorithms have proven to be very successful in this task. This approach of detecting fake base stations using machine learning algorithms requires high-quality datasets, however, creating a high-quality dataset to train machine learning algorithms for fake base station detection is a challenging task. It is illegal to create fake base stations in public areas, and there are no publicly available datasets that meet our requirements. Therefore, we chose POWDER, which provides a controlled environment to simulate different cellular network scenarios and create a high-quality dataset for training machine learning models.
We created various topologies in POWDER and ran experiments to capture the packets transmitted between mobile devices and base stations, both legitimate and fake. Each packet has multiple fields and each field has multiple attributes. Relevant features were extracted from the packets to represent the characteristics of the network traffic. Feature extraction includes parameters such as signal strength, packet timing, protocol usage, sequence patterns, or statistical properties. Careful consideration was given to select features that effectively capture the differences between legitimate and fake base stations. The extracted features need to be represented in a suitable format for machine learning algorithms to process. This typically involves organizing the features into a tabular format, where each row corresponds to a sample (packet trace) and each column represents a specific feature. Our Additional preprocessing steps, such as normalization or dimensionality reduction techniques like Principal Component Analysis (PCA), were applied to enhance the dataset representation. The unprocessed dataset has a size of 2.5 Gigabytes. After the preprocessing steps, we have a large tabular dataset with 200 features and 2500 rows that is sufficient for classical machine learning algorithms to effectively learn and generalize the detection of fake base stations from a network trace.
Using the dataset, we trained machine learning models in unsupervised learning, which allowed us to detect the presence of fake base stations in cellular networks with high accuracy. To assess the performance of the machine learning model, the dataset was divided into training and testing subsets. The training set is used to train the model, while the testing set evaluates the model’s performance on unseen data. It is important to ensure an appropriate split ratio to avoid overfitting or underfitting the model. The pipeline has multiple components. The segmentation component aims to divide the packet traces into distinct segments or sessions. This is important because a single capture may contain multiple interactions or activities, and analyzing them separately can improve the accuracy of the detection process. Segmentation can be based on various factors such as time intervals, and protocol types. The recognition component is the core of the pipeline and involves training a machine learning algorithm to recognize patterns or signatures indicative of a fake base station. This typically involves unsupervised learning, where the algorithm is trained using instances of fake base stations and legitimate base stations. Various machine learning techniques such as decision tree, random forest, support vector machine, and k-nearest-neighbor were employed for recognition and classification. The trained model can then be used to predict the presence of a fake base station in unseen packet traces.
In conclusion, our work highlights the significance of a high-quality dataset in detecting fake base stations in cellular networks using machine learning algorithms. By carefully preparing the dataset, including appropriate feature extraction, representation, and splitting, researchers can ensure the effectiveness and reliability of the Machine Learning Pipeline for detecting fake base stations. It is crucial to consider the quality, diversity, and representativeness of the dataset to enhance the pipeline’s performance and generalizability in real-world scenarios. The POWDER platform proved to be a valuable tool in creating a high-quality dataset, which can aid researchers and practitioners in developing effective solutions for detecting fake base stations in cellular networks and securing them against attacks.
References
[1] https://theintercept.com/2020/07/31/protests-surveillance-stingrays-dirtboxes-phone-tracking/
[2] https://techcrunch.com/2020/08/05/crocodile-hunter-4g-stingray-cell/