Javier E. Rodriguez, PE
School of Cybersecurity and Privacy
Georgia Institute of Technology
javirodz@gatech.edu
Abstract
This paper presents the design and implementation of a behavioral ransomware detector built with machine learning. The system models the input and output (I/O) patterns of a virtual machine in two states: at steady state, and while its files are being encrypted by a typical ransomware attack. The model relies on key performance indicators (KPIs) that are available in most modern storage arrays, together with Azure Machine Learning Studio in the Microsoft Azure cloud. This combination keeps the method accessible to practitioners who do not have specialized knowledge of ransomware internals or machine learning.
Data collection takes place at the storage array level through the Nutanix distributed data cluster. Observing I/O at this layer makes the measurement invisible to adversarial ransomware running inside the guest operating system. Because the method is behavioral, it can be expressed as anomaly detection, which allows it to provide a general detection capability against previously unseen, zero day ransomware.
The experiments show that, once a virtual machine reaches steady state I/O, the model reacts to the anomalies caused by active encryption with very high accuracy.
1. Introduction
According to several industry reports, including the CrowdStrike Global Threat Report [1] and guidance from the Cybersecurity and Infrastructure Security Agency (CISA) [2], ransomware remains one of the most visible cybersecurity risks. The practice will continue for as long as it stays profitable. Estimates of its cost vary, but they consistently exceed the billion dollar mark, and industry coverage describes a threat that keeps evolving in both scale and technique [3].
The average ransom payment nearly doubled in the year preceding this study, yet that figure is small next to the cost of downtime. PurpleSec reported that the average cost of downtime per incident in 2020 was approximately $283,000 [4]. The growth in attacks reached every sector, public and private. Readers should treat that figure as a vendor reported estimate and verify it against a primary source before citing it independently.
The central difficulty in detecting ransomware is that it uses the same libraries and system calls as legitimate applications and routine operating system tasks. By taking a generic approach built on off the shelf tools, the system described here aims to address that difficulty without depending on signatures specific to any one family.
2. Background
Ransomware is a class of malware that denies a user access to their data until a ransom is paid [5]. The threat actor demands payment in exchange for restoring the data, which may be anything held in the system’s storage. The goal is to prevent victims from carrying out their normal activities (see Figure 1).
![Figure 1. Steps in ransomware activity [6].](https://i0.wp.com/vwannabe.com/wp-content/uploads/2026/06/fig01_ransomware_steps.jpg?resize=467%2C79&ssl=1)
Figure 1. Steps in ransomware activity [6].
Ransomware is commonly divided into two basic types, locker and crypto, with hybrid variants in some cases [6]. The approach in this study targets crypto ransomware, which encrypts the victim’s original data and renders it unavailable. The scheme typically includes a ransom note with instructions for paying and for obtaining the key needed to decrypt the data. This is an attack against the availability of the system.
In some cases the threat actor also exfiltrates the data and threatens to publish it unless the ransom is paid. That tactic attacks the confidentiality of the data and can expose victims to regulatory fines, for example when the data includes payment card information or medical history.
2.1 Technology Overview
Figure 2 shows the technology used in the laboratory setting. The top layer, labeled App, holds the virtual machine running the Windows 10 operating system. The left side of the figure shows the conceptual layout of a hyperconverged system, and the right side shows the physical equipment used in the experiments.
![Figure 2. The legacy three tier infrastructure consists of three layers: the compute layer, the storage area network or fabric (SAN), and the storage array or arrays [7].](https://i0.wp.com/vwannabe.com/wp-content/uploads/2026/06/fig02_hci_vs_three_tier.jpg?resize=512%2C257&ssl=1)
Figure 2. The legacy three tier infrastructure consists of three layers: the compute layer, the storage area network or fabric (SAN), and the storage array or arrays [7].
Hyperconverged infrastructure (HCI) is a software defined, unified system that combines the conventional data center elements of storage, compute, networking, and management. It uses software and x86 servers in place of expensive, purpose built hardware, which reduces data center complexity and improves scalability through a single, simple console. The laboratory equipment used in these experiments is a Nutanix hyperconverged cluster running the Acropolis hypervisor. The second part of the laboratory environment runs in the Azure cloud and is described in the Analysis section.
3. Literature Review
Dozens of studies address ransomware detection using a wide range of techniques. A substantial body of work applies machine learning and dynamic analysis to the problem, including wrapper based feature selection [8], network traffic analysis [9], [10], software defined networking [11], behavioral classification of variants [12], layered machine learning defenses [13], finite state machine models [14], and broad surveys of the field [15], [6], [16], [17]. Behavior based automated malware analysis has also been studied in depth [18], [19], [20], [21]. Hypervisor based and disk or storage level monitoring has been explored as well [22], [23], [24], [25], along with self healing, ransomware aware file systems [26] and data centric stopping techniques [27]. The approach in this study aims to distinguish itself by collecting data only at the storage array level and by using the hypervisor to keep the attacker unaware that it is being observed. Two studies that rely on dynamic behavior are worth discussing in more detail.
The detection system described by Kharraz et al. [28] is based on the disk access actions performed by a process. It observes the change in entropy between a read and a write to the same region of a file, the proportion of file content that is overwritten, and whether the process deletes files. It also collects metadata about disk access, including whether a process writes to many files and whether those files span very different types or come from a single application. It measures the time between write requests and assigns higher risk as that interval shortens. These features are combined into a risk score through a linear function whose weights are determined by recursive feature elimination.
In a second study, Baek et al. [24] proposed a detection model based on a set of lightweight behavioral features that describe the overwriting pattern of ransomware, a pattern that is largely invariant across families.

Figure 3. Ransomware overwriting pattern contrasted with valid applications [24].
4. Methodology
This section presents the design of the I/O pattern analyzer. By drawing on the key performance indicators present in most modern storage arrays, the detection model is independent of the operating system installed on top of the hypervisor. The design has two goals: first, to create an efficient monitoring tool, and second, to remain hidden beneath the operating system layer so that it resists ransomware evasion techniques.
4.1 Threat Model
The threat model considered in this experiment is an attacker who can infect the operating system of the virtual machine. The attacker has evaded the static detection techniques and has begun the encryption process. The attacker has no access, physical or remote, to the hypervisor. Framing the problem this way follows established threat modeling guidance for ransomware [29].
4.2 Ransomware
There are close to 400 families of ransomware. The behavioral characteristics of each family matter in the design of a detection model. The characteristics relevant to this experiment are the way the data is encrypted and the type of evasion techniques used. Other characteristics, such as the network flow and the attack vector, are outside the scope of this study.
After a certain period, a guest operating system reaches a steady state of I/O access patterns. A typical application is unlikely to behave in the same way a malicious payload does, at least not continuously. Everything a ransomware executable does requires resources such as CPU and memory, and it requires access to files, because the primary goal of a crypto locker is to encrypt all of the data in a way that makes it unusable to the victim.
| I/O Access pattern | I/O Characteristics | Typical Applications |
|---|---|---|
| Streaming Reads | 100% Reads; Large contiguous requests; 1-64 concurrent requests. It may be threaded. | Media Servers (Video-on-demand, etc.). Virtual Tape Libraries (VTL), Application Servers |
| Streaming Writes | 100% Writes; Large contiguous requests; 1-64 concurrent requests. It may be threaded. | Media Capture, VTL, Medical Imaging, Archiving, Backup, Video Surveillance, Reference Data |
| OLTP | Typically, 2KB to 16KB request sizes; Read modify, write, verify operations resulting in 2 reads for every write; Primarily random accesses. Large number of concurrent requests. When running SQL statements in parallel, Database will typically perform large random I/Os. | Databases (SAP, Oracle, SQL), Online Transaction Servers |
| File Server | Moderate distribution of request sizes from 4KB to 64KB. However, 4KB and 64KB comprise 70% of requests; it is primarily random; Generally, four reads for every write operation, a large number of concurrent requests during peak operational periods. | File and Printer Servers, e-mail (Exchange, Notes), Decision Support Systems |
| Web Server | A wide distribution of request sizes from 512 bytes to 512KB; Primarily random accesses; a Large number of concurrent requests during peak operational periods | Web Services, Blogs, RSS Feeds, Shopping Carts, Search Engines, Storage Services |
| Workstations | Primarily small to medium request sizes; 80% sequential and 20% random; Generally, four reads for every writes operation. 1-4 concurrent requests. | Business Productivity, Scientific/Engineering Applications |
Table 1. Application I/O characteristics by access pattern [30].
Table 1 summarizes common application types together with their typical I/O patterns and behavior. Other characteristics, such as streaming versus batch access, serial versus random access, and the block size histogram, also change during a ransomware attack. Figures 4 and 5 show examples of how data processing and access patterns differ.

Figure 4. Data processing model [30].

Figure 5. Access pattern contrast [31].
Kharraz et al. divide the characteristics of ransomware I/O access patterns into three main categories:
The attacker overwrites the user’s file with the encrypted version.
The attacker reads the file, writes a new encrypted file, and then deletes the original.
The attacker reads the file, writes a new encrypted file, and then overwrites the original.

Figure 6. I/O pattern categories according to Kharraz et al. [28].
Most families use a specific file extension for the encrypted output. For example, some Mespinoza variants of ransomware use the .pysa extension. Taking these access patterns into account, some families list and then randomly encrypt the files, which is a more advanced evasion technique. Detecting this kind of malicious behavior reliably requires several orthogonal methods of monitoring, a point expanded in the Discussion and Limitations section.
With the experimental setup ready, data collection began. A simulated ransomware script (see Appendix III) traverses the Documents folder in the Windows 10 test VM and encrypts, from top to bottom, every file with one of the following extensions:
“.pptx”, “txt”, “csv”, “.db”, “.mdb”, “.log”, “.sav”, “.sql”, “.xml”,”.key”, “.cert”,
“.pem”, “.doc”, “.pdf”, “.email”, “.eml”, “.msg”, “.oft”, “.ost”,
“.pst”, “.vcf”, “.apk”, “.bat”, “.pl”, “ps1”, “.pl”, “.vsd”, “.vss”, “.vst”, “.vdx”,
“.vsx”, “.vtx”, “.vsw”, “.vsl”, “.dot”, “.xls”, “.py”, “.jpg”, “.jpeg”, “.png”,
“.pgp”, “.tiff”, “sys”, “.pfx”, “plist”, “.vmx”, “.gif”, “.lic”, “.kit”, “.ctx”,
“.sh”, “.conf”, “.ttf”, “.ico”, “.exe”, “.dmg”, “kdbx”, “.java”, “.jar”, “.yml”, “.json”,
“kdb”, “.dll”, “.img”, “.msi”, “.wsf”, “.htm”, “.php”, “.vb”, “.c”, “.pcap”
A complete traversal of the Documents folder takes approximately seventeen minutes. Appendix IV shows a timestamped sequence of performance metric snapshots captured during a traversal.
4.3 Testbed
Configuring a laboratory setting involves several considerations. The first is to provide an environment that resembles production. In this case, several tools were used to populate a Windows 10 VM with the data needed for a ransomware attack. Building such a testbed is not trivial, and additional observations appear in the Future Work section. The design followed prudent practices for malware experiments [32], and drew on isolated analysis environments such as Cuckoo Sandbox [33], [34].
For this scenario, the operating system is assumed to be free of ransomware during the time it takes to reach steady state. That period is when the monitor reads the I/O patterns to create a clean baseline.
The operating system used for these experiments is Windows 10 Enterprise. Figure 7 shows the layout of the laboratory. The guest operating system runs on a Type 1 hypervisor, which in this case is Nutanix Acropolis. The hypervisor isolates the guest VM and prevents the ransomware from reaching anything outside the experimental environment.

Figure 7. Analysis layout with a Type 1 hypervisor.
To populate the test VM with files, two Python programs were developed to generate data. The first program, shown in Figure 8, accepts a root folder path as a starting point and creates folders to a chosen depth.
def gen_tree(depth, parent_dir): while depth > 0: depth = depth - 1 new_directory = random_line('words.txt') path = os.path.join(parent_dir, new_directory) try: os.mkdir(path) except OSError as error: pass parent_dir = path return path
Figure 8. Python routine that creates a folder tree using common English words.
The code uses a list of the one thousand most common words in the English language [35]. According to Kharraz et al. [28], some ransomware variants compute the entropy of a file or folder name and will not trigger if the name appears too random, so realistic names matter.
For each path created, a second Python program generates Word files using the python-docx library [36]. To add images, two techniques were combined, one from Arrington [37] and one from Zita [38]. Finally, to increase the data volume, older documents were added, including PowerPoint presentations, PDF files, and additional images. Because only simulated ransomware was used, there was no risk of anyone stealing real data.
The sample data consisted of 16,447 files (see Appendix I). The final number of encrypted files was lower, approximately 12,000, because the system stalled on very large files such as .zip and .ova archives.
A second important consideration is the hardware isolation of the system in which the ransomware is triggered. Hardware isolation refers mainly to the network and to the ability of the monitoring environment to inject the ransomware without any risk of spreading it. To close network access, an isolated virtual switch with no uplink connections was used. The setup is flexible enough to move eth0 to a switch with internet access when software needs to be added to the operating system. In Figure 9, the br0 virtual switch has physical uplinks to a physical switch, while br1 is isolated.

Figure 9. Laboratory network diagram.
Because the test VMs run in a virtual environment, the console is available at any time without the risk of spreading the ransomware. With the environment and configuration described, the next section covers the key performance indicators that are available and how the data is collected.
4.4 Features
Feature identification is a broad subject. Features are the foundation of the dataset, and the dataset is only as useful as the features selected. The insight gained from the observations improves when the features chosen are well suited to the problem. This experiment had a rich set of features available; Appendix II lists them in full.
Dataset quality improves when features are selected through a formal process such as feature engineering [39]. In this case, a combination of heuristics and the findings of the research papers reviewed for this problem guided the selection. Table 2 lists the features used.
| # | Selected feature |
|---|---|
| 1 | ctl_random_ops_per_sec |
| 2 | ctl_read_io_bandwidth_kBps |
| 3 | ctl_write_io_bandwidth_kBps |
| 4 | ctl_num_read_iops |
| 5 | ctl_num_write_iops |
| 6 | hv_avg_read_io_latency_usecs |
| 7 | hv_avg_write_io_latency_usecs |
| 8 | ctl_total_read_io_size_kbytes |
| 9 | ctl_read_size_histogram_4kB |
| 10 | ctl_read_size_histogram_8kB |
| 11 | ctl_read_size_histogram_16kB |
| 12 | ctl_read_size_histogram_32kB |
| 13 | ctl_read_size_histogram_64kB |
| 14 | ctl_read_size_histogram_512kB |
| 15 | ctl_read_size_histogram_1024kB |
| 16 | ctl_write_size_histogram_4kB |
| 17 | ctl_write_size_histogram_8kB |
| 18 | ctl_write_size_histogram_16kB |
| 19 | ctl_write_size_histogram_32kB |
| 20 | ctl_write_size_histogram_64kB |
| 21 | ctl_write_size_histogram_512kB |
| 22 | ctl_write_size_histogram_1024kB |
Table 2. Features selected for the detection model.
The intuition behind this selection is that, during a ransomware attack, the I/O statistics rise above their normal levels and the characteristics of the steady state I/O pattern change. The two clearest signals were the block size and the randomness of access. For a complete list of candidate features, see Appendix II.
4.5 Dataset
A dataset is a collection of data samples. The dataset in this experiment contains measurements collected every 120 seconds through a REST API. There are several ways to collect this data, as shown in Figure 10. A REST API request was chosen because the results can be written to a comma separated value (.csv) file for use in training. Most modern storage arrays expose the same measurements, so the results apply to enterprises of any size without vendor lock in.

Figure 10. Monitoring tools for the Nutanix Acropolis cluster.
Figure 10 shows Prism, the built in monitoring tool, which includes I/O and network flow monitoring. The hyperconverged system provides an HTML5 user interface, a REST API, and a command line utility. The experiment assumes that the operating system, in this case Windows 10 Enterprise, has reached a steady state I/O pattern and is free of any ransomware infection.
When building a machine learning dataset, the ground truth data is split into a training dataset and a testing dataset. The algorithm is trained on the training data and then evaluated on its ability to perform on the testing data [40].

Figure 11. REST API access to the Nutanix data platform [41].
To retrieve the performance indicators, the Nutanix cluster is queried with a request that includes:
the unique identifier of the virtual disk (UUID);
the metric, or KPI, being requested;
the start time and end time in microseconds, using the 24 hour Unix epoch format; and
the interval in seconds, where the minimum for this version of Nutanix is 120 seconds.
Appendix V contains the code used to retrieve the KPI through the REST API.
One of the main challenges in behavioral detection is distinguishing a valid application from an actual ransomware attack. Some families go further and become adversarial by using several evasion techniques. One technique that would make this approach less robust [42] is for the malware to observe its environment and imitate normal behavior.
To model a valid application workload, the experiment used DISKSPD, a command line tool for micro benchmarking [43], [44]. The following options were used:
diskspd –b8K –d30 –o4 –t8 –h –r –w25 –L –Z1G –c20G C:\iotest.dat > iotestResults.txt
This command runs a 30 second random I/O test against a 20 GB file on the C: drive, with a 25 percent write and 75 percent read ratio and an 8 KB block size. It uses eight worker threads, each with four outstanding I/Os, and a write entropy seed of 1 GB, and it saves the results to a text file. The equivalent utility on Linux is fio [45], [46].
The DISKSPD emulator was used to model a SQL database as a representative application workload. The Future Work section returns to the need to model many suitable applications in order to build a more robust model.
5. Analysis
With the data collected (see Appendix VI for an example), it must be prepared before it can train the model. Two columns were added. The first is a VM identifier, which keeps the model ready for future experiments with additional test VMs. The second is the target metric, a column that indicates whether the data was collected during a ransomware attack. The value is zero for a normal operating system and one for an operating system under a ransomware attack. The word controller was shortened to ctl_, because the dataset import process appears to limit the length of a feature name and the characters it can contain.
5.1 Azure Machine Learning
There are two ways to apply Azure Machine Learning here. In the first, the collected data trains a model that classifies an operating system as either clean or infected. This is a binary classification problem, which can use Azure Automated Machine Learning. In the second, the collected data serves as a baseline and Azure Anomaly Detection identifies departures from it. Open source workbenches such as WEKA [47] offer comparable modeling capabilities, but a managed cloud service keeps the workflow accessible without local setup [48].
5.2 Automated Machine Learning
Automated machine learning, also called automated ML or AutoML, automates the time consuming, iterative tasks of model development. It lets data scientists, analysts, and developers build models at scale with efficiency and productivity while preserving model quality [49]. The automated workflow used here was as follows:
A .csv file with the collected data was uploaded. The data includes the I/O information of the test VM both with and without ransomware [50].
The target metric for the classification is the Ranso column.
Azure Automated ML evaluated several algorithms, trained the corresponding models, and recommended the best model based on accuracy. This process is time consuming.
The recommended pipeline used MaxAbsScaler with a random forest [51].
The data was flagged as imbalanced, most likely because there were far more samples without ransomware than with it.
Automated ML ran for close to an hour and recommended a random forest (Figures 12 and 13).

Figure 12. Top ranked algorithms reported by Azure Automated ML.
Figure 13. Lowest ranked algorithms reported by Azure Automated ML.


Figure 14. Supervised learning pipeline using a two class decision forest.
In Figure 14, the trained model uses the two class decision forest. The steps are:
upload the .csv to create a dataset, in this case Win10WithRanso;
normalize the data using min and max scaling;
split the data into 70 percent for training and 30 percent for evaluation;
train a model using the two class decision forest algorithm; and
score and evaluate the model (see the Findings section for details).
At this point there is a trained model that can detect at least the type of ransomware in which encryption proceeds by reading, overwriting, and renaming the file. Because there are many families of ransomware, broader coverage would require additional models.
5.3 Azure Anomaly Detection
Because a ransomware event is not common, collecting data about it is difficult, and by the nature of malicious activity the datasets are imbalanced. To handle imbalanced data, Azure Machine Learning provides a category called anomaly detection.
The collected data fits that category well: it is numerical data gathered as a uniformly spaced time series. Azure ML can detect trends and spikes and report the changes as anomaly scores. It uses principal component analysis (PCA), a technique often used in exploratory data analysis because it reveals the inner structure of the data and explains its variance [52].

Figure 15. Learning pipeline for the anomaly detection approach.
Once the model is trained with data collected while the operating system has no ransomware, future KPI readings can be evaluated through a deployed API. The Python code below tests the model with new data, and the JSON sample that follows shows the source KPI data.
def test_model(sample_file_path = '_samples.json'): service_name = 'ransomaly' ws = Workspace.get( name='RansoML', subscription_id='e7af3a72-63c8-4a9c-a78c-d28c017f238a', resource_group='Ranso' ) service = Webservice(ws, service_name) with open(sample_file_path, 'r') as f: sample_data = json.load(f) score_result = service.run(json.dumps(sample_data)) print(f'Inference result = {score_result}') return score_result
Figure 16. Python source used to query the anomaly model.
[ { "VM": 1, "ctl_random_ops_per_sec": 612, "ctl_read_io_bandwidth_kBps": 3438, "ctl_write_io_bandwidth_kBps": 1187, "ctl_num_read_iops": 428, "ctl_num_write_iops": 145, "hv_avg_read_io_latency_usecs": 0, "hv_avg_write_io_latency_usecs": 0, "ctl_total_read_io_size_kbytes": 412656, "ctl_read_size_histogram_4kB": 0, "ctl_read_size_histogram_8kB": 41125, "ctl_read_size_histogram_16kB": 0, "ctl_read_size_histogram_32kB": 43, "ctl_read_size_histogram_64kB": 0, "ctl_read_size_histogram_512kB": 0, "ctl_read_size_histogram_1024kB": 0, "ctl_write_size_histogram_4kB": 127, "ctl_write_size_histogram_8kB": 13831, "ctl_write_size_histogram_16kB": 5, "ctl_write_size_histogram_32kB": 13, "ctl_write_size_histogram_64kB": 8, "ctl_write_size_histogram_512kB": 0, "ctl_write_size_histogram_1024kB": 1280, "Ranso": 0 },
Figure 17. Sample JSON file with source KPI data.
6. Findings
This section interprets the data and points to directions for further research. The results are presented with a confusion matrix, the standard way to evaluate a classification model [53]. In these matrices, cases where both the predicted and actual values are one (true positives) appear at the top left, and cases where both the predicted and actual values are zero (true negatives) appear at the bottom right.
Data was collected for two ransomware encryption events. The first dataset contained 1,441 rows collected while the operating system was normal and nine rows collected during encryption. Azure Machine Learning trained the model by splitting the data into 70 percent for training and 30 percent for evaluation. Figure 18 shows the results.

Figure 18. Confusion matrix for the first run.
As Figure 18 shows, the model produced a 100 percent true positive rate, and on only one occasion it predicted no encryption while encryption was in progress. In a second, fully independent experiment, the model was retrained using 63 rows collected during encryption. This time the output was 100 percent true positives and 100 percent true negatives, as shown in Figure 19.

Figure 19. Confusion matrix for the second run.
The more actions that are considered, and the more of them that are present during ransomware activity, the higher the identification rate. Collecting all of these actions, however, requires letting the ransomware run freely long enough to encrypt and destroy many files. In these experiments, the ransomware encrypted approximately seven hundred files per minute.
A second model was configured with Azure anomaly detection. For ransomware encryption, it identified the anomaly in 100 percent of cases. The anomaly model was trained on data from the operating system without ransomware. Table 3 shows the output of the anomaly model on the collected data.
| Ranso | Scored Label | Probability |
|---|---|---|
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
| 1 | 1 | 0.938535 |
| 1 | 1 | 0.927582 |
| 1 | 1 | 0.923774 |
| 1 | 1 | 0.93596 |
| 1 | 1 | 0.914103 |
| 1 | 1 | 0.930306 |
| 1 | 1 | 0.911739 |
| 1 | 1 | 0.938852 |
| 1 | 1 | 0.761196 |
Table 3. Output of the anomaly model on the collected data.
The decision threshold can be tuned between zero and one; the default is 0.5. In this experiment the lowest certainty probability was 0.76, which stayed well above the default threshold.
7. Discussion and Limitations
Both trained models, the binary classifier and the anomaly detector, effectively detect an attack. Once detection occurs, rapid response techniques and good operational practices can support recovery. A short script can take a snapshot of the system as soon as an anomaly is detected, and on most modern storage arrays a snapshot does not affect performance.
After several weeks of research, reading, and testing, a number of limitations of this approach became clear:
The model was trained on the behavior of one specific VM, so it is not a generic model. Addressing this would require an automated training process that builds one model per VM.
Collecting data from a VM that is in production is difficult. In the laboratory it was possible to infect the test VM and take it down to collect data, but a production approach would need to clone the production VM, infect the clone, aggregate the data from both the production VM and the infected clone into a dataset, train and deploy the model, and then periodically update the model by repeating those steps.
The number of observations in these experiments is small. To confirm the encouraging early findings, the observations and the collected data should be extended to many operating systems running multiple workloads.
For the two reasons above, the anomaly detector is likely a better choice than the two class algorithm.
8. Future Work
This section offers a few ideas for stronger protection and higher detection accuracy. Rather than relying on a single way to detect ransomware after the static defenses have been defeated, the proposal is to combine several subsystems that together form a layered defense around the environment:
I/O to the storage array. This is the approach presented in this study. It would be worth adding both per VM and total storage array performance, since that combination did not appear in the reviewed material.
File decoys. A honey file technique helps to reduce false positive results [54], [55].
Compression and deduplication. Both measurements drop during encryption, because encrypted files are poor candidates for compression and deduplication. The idea is promising, although by the time the change is visible it may be too late to stop the encryption.
Backup verification. Most attacks try to stop the backup system; the challenge is that backup systems vary from one environment to another.
Network communication. This signal supports the overall strategy and is very effective when combined with the other layers.
There is also a newer way to consume storage in a virtualized environment. The underlying storage is divided into chunks using virtual volumes (VVols) [56]. As the limitations show, it would help for the system to be aware of the specific files the ransomware accesses. Correlating file metadata with VVol utilization could give the model more insight and raise confidence in detection.

Figure 20. Proposed system for future work.
In Figure 20, the proposed system has two dynamic behavior monitors on the left. When an anomaly occurs, the message queue receives the alert. The top right shows a backup process monitor, and the lower right shows the honey file, or canary file, check. The bottom center shows the two outputs for a positive ransomware detection: on the left, the process that snapshots the system, and on the right, the alert module.
Acknowledgments
I thank my family for their support and patience during this research. I also thank Dr. Mustaque Ahamad for his guidance during the semester; his feedback and recommendations helped me meet the learning objectives of the course. Finally, I thank my fellow students, whose positive attitude and strong work in the weekly progress reports kept motivation high.
Appendix
Appendix I. File Extension Detailed Count
The sample dataset used to exercise the simulated ransomware contained the file types listed below, grouped by extension with total size and file count.
| File Extension | Total Size(Mb) | File Count |
|---|---|---|
| Total | 107.846 | 421 |
| .al | 0.672 | 1 |
| .at | 0.053 | 2 |
| .backup | 0.121 | 2 |
| .bak | 34.416 | 7 |
| .basex | 0.007 | 10 |
| .bash_history | 0.01 | 1 |
| .bashrc | 0.003 | 1 |
| .bat | 0.035 | 17 |
| .bin | 3.099 | 6 |
| .boot | 0 | 1 |
| .bz2 | 0.832 | 1 |
| .c | 2.46 | 27 |
| .c32 | 0.338 | 4 |
| .cat | 0.706 | 5 |
| .cfg | 0.288 | 6 |
| .changed | 0 | 1 |
| .class | 1.942 | 794 |
| .clb | 0.046 | 2 |
| .com | 0.121 | 1 |
| .common | 0.005 | 1 |
| .conf | 0.69 | 6 |
| .config | 0.087 | 3 |
| .controlio | 0.172 | 5 |
| .cpgz | 24.645 | 1 |
| .cpp | 0.029 | 1 |
| .crash | 0.024 | 1 |
| .crdownload | 292.116 | 3 |
| .crit | 0 | 1 |
| .css | 0.141 | 43 |
| .csv | 3.082 | 114 |
| .ctd | 0.009 | 4 |
| .ctx | 0.575 | 5 |
| .db | 0.406 | 1 |
| .deb | 57.375 | 5 |
| .debug | 0.176 | 4 |
| .default | 0 | 1 |
| .der | 0.001 | 1 |
| .dir | 0.034 | 1 |
| .diskdefines | 0 | 1 |
| .dll | 0.787 | 2 |
| .dmg | 88.35 | 3 |
| .doc | 15.569 | 32 |
| .docm | 0.316 | 2 |
| .docx | 518.988 | 486 |
| .dotx | 0.398 | 3 |
| .drt | 0.776 | 1 |
| .dtd | 0.088 | 3 |
| .dump | 0.031 | 2 |
| .dylib | 0.073 | 2 |
| .EFI | 2.367 | 2 |
| .eml | 0.053 | 2 |
| .ena | 0.46 | 3 |
| .ent | 0.029 | 3 |
| .eps | 14.637 | 17 |
| .epub | 5.389 | 1 |
| .err | 0.019 | 6 |
| .exe | 2074.402 | 27 |
| .factoryio | 0.386 | 3 |
| .FCD | 0 | 4 |
| .flake8 | 0.001 | 1 |
| .gif | 0.119 | 43 |
| .gpg | 5.252 | 3 |
| .grp | 20.764 | 2 |
| .gz | 1374.106 | 205 |
| .h | 0.06 | 5 |
| .hpp | 0.008 | 1 |
| .htc | 0.002 | 2 |
| .htm | 0.028 | 3 |
| .html | 5.557 | 25 |
| .icns | 0.249 | 2 |
| .ico | 0.049 | 2 |
| .ics | 0.019 | 6 |
| .img | 9.604 | 5 |
| .in | 0 | 1 |
| .info | 0 | 2 |
| .ini | 0.002 | 7 |
| .input_i | 0.312 | 2 |
| .iso | 20275.221 | 16 |
| .jar | 156.356 | 342 |
| .jnlp | 0.056 | 15 |
| .jpeg | 3.906 | 18 |
| .jpg | 157.814 | 162 |
| .jpg_large | 0.325 | 1 |
| .js | 4.105 | 106 |
| .json | 24.039 | 164 |
| .kdb | 0.008 | 4 |
| .kdbx | 0.038 | 13 |
| .keystream | 0.855 | 3 |
| .kit | 0.033 | 2 |
| .lbb | 0.279 | 3 |
| .len | 0 | 7 |
| .lic | 0.209 | 93 |
| .license | 0.001 | 1 |
| .lock | 0 | 1 |
| .log | 204.64 | 302 |
| .lst | 0.01 | 8 |
| .manifest | 0.071 | 3 |
| .md | 0.053 | 15 |
| .md5 | 0 | 2 |
| .mgmtd | 0.71 | 1 |
| .mod | 2.012 | 236 |
| .mp4 | 493.219 | 1 |
| .mpp | 0.133 | 1 |
| .msg | 0.092 | 1 |
| .msi | 125.717 | 7 |
| .names | 0.004 | 1 |
| .nar | 107.019 | 32 |
| .ndp-proxy | 0.006 | 1 |
| .netconfig | 0.011 | 1 |
| .netrwhist | 0 | 1 |
| .nib | 0.346 | 22 |
| .notice | 0 | 1 |
| .nvram | 0.071 | 1 |
| .old | 1881.357 | 11 |
| .omsg | 0.084 | 1 |
| .one | 0.836 | 1 |
| .out | 31.379 | 27 |
| .ova | 20349.079 | 9 |
| .pak | 92.225 | 2 |
| .pcap | 0.003 | 1 |
| .pcf | 0.001 | 2 |
| 1264.323 | 1457 | |
| .pem | 0.003 | 2 |
| .pf2 | 0.005 | 1 |
| .pfx | 0.002 | 1 |
| .pg_dump | 0.017 | 1 |
| .pkg | 16.106 | 2 |
| .plist | 0.005 | 2 |
| .png | 108.406 | 1415 |
| .policy | 0.013 | 1 |
| .potx | 5.945 | 1 |
| .ppt | 34.654 | 4 |
| .pptx | 529.425 | 171 |
| .profile | 0 | 1 |
| .properties | 0.024 | 8 |
| .psd | 1.427 | 10 |
| .pxd | 0.005 | 3 |
| .py | 6.193 | 647 |
| .pyc | 0.54 | 63 |
| .pyd | 4.313 | 16 |
| .pyi | 0.463 | 184 |
| .pyx | 0.085 | 3 |
| .rar | 5.278 | 1 |
| .rdp | 0.011 | 6 |
| .rll | 1.357 | 1 |
| .rpc | 0.041 | 2 |
| .rpm | 1764.365 | 27 |
| .rpm-utils | 0.002 | 1 |
| .rsrc | 0.001 | 1 |
| .rtf | 1.823 | 102 |
| .run | 140.476 | 1 |
| .s | 0.018 | 2 |
| .sample | 0.02 | 12 |
| .sb | 1987.047 | 2 |
| .SET | 0.037 | 3 |
| .sh | 223.587 | 20 |
| .SHData | 0.295 | 12 |
| .size | 0 | 1 |
| .slf | 0.007 | 7 |
| .so | 12.772 | 59 |
| .sql | 0.004 | 1 |
| .sqlite | 1.125 | 1 |
| .st | 0.013 | 9 |
| .strings | 0 | 2 |
| .symbolMap | 50.915 | 6 |
| .sys | 0.066 | 1 |
| .tar | 933.62 | 15 |
| .template | 1.885 | 8 |
| .tex | 0 | 1 |
| .tgz | 1520.12 | 27 |
| .thrift | 0.001 | 1 |
| .tif | 2.037 | 3 |
| .tiff | 5.499 | 75 |
| .TORRENT | 0.018 | 1 |
| .ts | 0.221 | 14 |
| .ttf | 3.288 | 21 |
| .txt | 91.105 | 567 |
| .url | 0 | 1 |
| .vdx | 8.448 | 7 |
| .vib | 103.035 | 6 |
| .viminfo | 0.011 | 1 |
| .vmdk | 26319.876 | 4 |
| .vmsd | 0 | 2 |
| .vmsn | 0.028 | 1 |
| .vmx | 0.006 | 2 |
| .vmxf | 0.004 | 2 |
| .vscodeignore | 0.001 | 41 |
| .vsd | 116.928 | 50 |
| .vsdx | 11.482 | 14 |
| .vss | 216.994 | 13 |
| .vssx | 5.846 | 2 |
| .war | 3.833 | 2 |
| .warn | 0 | 2 |
| .x32 | 0.199 | 1 |
| .x64 | 0.2 | 1 |
| .xls | 64.106 | 203 |
| .xlsm | 98.673 | 121 |
| .xlsx | 242.441 | 415 |
| .xml | 70.862 | 4454 |
| .xq | 8.21 | 1735 |
| .xqm | 0.196 | 31 |
| .xsd | 0.122 | 20 |
| .xslt | 0.001 | 2 |
| .yml | 0.001 | 1 |
| .zip | 16207.155 | 232 |
Appendix II. Available Features
The Nutanix platform exposes the performance indicators below at the VM, cluster, and storage container levels. The subset used for the model appears in Table 2.
| VM | Cluster | Storage Container |
|---|---|---|
| CPU Usage (%) | CPU Usage (%) | Storage Controller IOPS (IOPS) |
| CPU Ready Time (%) | Memory Usage (%) | Storage Controller Read IOPS (IOPS) |
| Memory Usage (%) | Controller IOPS (IOPS) | Storage Controller Write IOPS (IOPS) |
| Storage Controller IOPS (IOPS) | Controller Read IOPS (IOPS) | Storage Controller Latency (ms) |
| Storage Controller Read IOPS (IOPS) | Controller Write IOPS (IOPS) | Storage Controller Read Latency (ms) |
| Storage Controller Write IOPS (IOPS) | Controller AVG Latency (ms) | Storage Controller Write Latency (ms) |
| Storage Controller Latency (ms) | Controller AVG Read Latency (ms) | Storage Controller I/O Bandwidth (Mbps) |
| Storage Controller Read Latency (ms) | Controller AVG Write Latency (ms) | Storage Controller Read Bandwidth (Mbps) |
| Storage Controller Write Latency (ms) | Controller I/O Bandwidth (Mbps) | Storage Controller Write Bandwidth (Mbps) |
| Storage Controller I/O Bandwidth (Mbps) | Controller Read Bandwidth (Mbps) | |
| Storage Controller Read Bandwidth (Mbps) | Controller Write Bandwidth (Mbps) | |
| Storage Controller Write Bandwidth (Mbps) | ||
| Disk Usage (GiB) | Virtual Disk | |
| Disk Usage (%) | Random I/O (%) | |
| Snapshot Usage (GiB) | Read Source Cache (KBps) | |
| Shared Data (GiB) | Read Working Set size (MiB) | |
| I/O Working Set size (MiB) | Write Working Set size (MiB) | |
| Read I/O Working Set size (MiB) | Union Working Set Size | |
| Write I/O Working Set size (MiB) | ||
| Read Size Distribution (bytes/%) | ||
| Write Size Distribution (bytes/%) | ||
| Network Receive Packets Dropped (# packets) | ||
| Network Transmit Packets Dropped (# packets) | ||
| Network Rx (KiB) | ||
| Network Tx (KiB) |
Appendix III. Simulated Ransomware Script (Python)
The script below traverses a target folder and, for each file whose extension matches the encryption list, encrypts the file in place using a symmetric key. It was used only against the isolated test VM, and the equivalent PowerShell technique is described by Rayner [57].
import osfrom cryptography.fernet import Fernetdef encrypt_file(filename): # process one file here #Generate a key key = Fernet.generate_key() #Save the key to the file my_key.key with open('my_key.key', 'wb') as my_key: my_key.write(key) # Initialize fernet object fernet_object = Fernet(key) # Read the file with open(filename, 'rb') as original_file: original = original_file.read() # Encrypt the file encrypted = fernet_object.encrypt(original) # Overwrite the file try: with open(filename, 'wb') as encrypted_file: encrypted_file.write(encrypted) except: passdef decrypt_file(filename): # Read the key from the file "my_key.key" with open('my_key.key', 'rb') as my_key: key = my_key.read() # Initialize fernet object fernet_object = Fernet(key) # Read the encrypted file with open(filename, 'rb') as encrypted_file: encrypted = encrypted_file.read() # Decrypt the file decrypted = fernet_object.decrypt(encrypted) # Overwrite the file with open(filename, 'wb') as decrypted_file: decrypted_file.write(decrypted)def get_file_list(root_folder): file_list = [] # for root, dirs, files in os.walk(root_folder, topdown=False): #to list bottom-up for root, dirs, files in os.walk(root_folder): for name in files: #print("Filename ", os.path.join(root, name)) file_list.append(os.path.join(root, name)) # for folder in dirs: # print("Folder :",os.path.join(root, folder)) return file_listdef test_file_extension(file_name): encryptable = False extensions = [".pptx", "txt", "csv", ".db", ".mdb", ".log", ".sav", ".sql", ".xml",".key", ".cert", ".pem", ".doc", ".pdf", ".email", ".eml", ".msg", ".oft", ".ost", ".pst", ".vcf", ".apk", ".bat", ".pl", "ps1", ".pl", ".vsd" , ".vss" , ".vst" , ".vdx" , ".vsx" , ".vtx" , ".vsw" , ".vsl", ".dot", ".xls", ".py", ".jpg", ".jpeg", ".png", ".pgp", ".tiff", "sys", ".pfx", "plist", ".vmx", ".gif", ".lic", ".kit", ".ctx", ".sh", ".conf", ".ttf", ".ico", ".exe", ".dmg", "kdbx", ".java", ".jar", ".yml", ".json", "kdb", ".dll", ".img", ".msi", ".wsf", ".htm", ".php", ".vb", ".c", ".pcap"] for ext in extensions: if ext in file_name.rpartition('\\')[2]: encryptable = True return encryptableif __name__ == '__main__': root_folder = 'C:\\Users\\Win\\Documents\\' if(os.path.exists(root_folder)): file_list = get_file_list(root_folder) count = 0 #''' for file_name in file_list: #print(file_name) if test_file_extension(file_name): print(file_name) #encrypt_file(file_name) #os.rename(file_name, file_name + ".pysa") #''' ''' # To decrypt: uncomment lines 65 and 70 and comment lines 72 and 78 for file_name in file_list: decrypt_file(file_name) print(file_name.split('.')) #os.rename(file_name, file_name.split('.pysa') ''' else: print("Folder does not exist")
Appendix IV. Sequence of Graphical Data During Ransomware Encryption
The snapshots below show the Prism performance metrics captured at successive timestamps while the simulated ransomware encrypted the Documents folder.

Figure A4.1. Performance metrics snapshot 1 of 9 during encryption.

Figure A4.2. Performance metrics snapshot 2 of 9 during encryption.

Figure A4.3. Performance metrics snapshot 3 of 9 during encryption.

Figure A4.4. Performance metrics snapshot 4 of 9 during encryption.

Figure A4.5. Performance metrics snapshot 5 of 9 during encryption.

Figure A4.6. Performance metrics snapshot 6 of 9 during encryption.

Figure A4.7. Performance metrics snapshot 7 of 9 during encryption.

Figure A4.8. Performance metrics snapshot 8 of 9 during encryption.

Figure A4.9. Performance metrics snapshot 9 of 9 during encryption.
Appendix V. Python Code to Retrieve KPI Using the REST API
import pprintimport jsonimport osimport randomimport timeimport requestsimport sysimport traceback# This block initializes the parameters for the request.class AHVRestApi(): def __init__(self): # Initializes the options and the logfile from GFLAGS. self.serverIpAddress = "NUTANIX SERVER IP ADDRESS" self.username = "USERNAME" self.password = "PASSWORD" # Base URL at which REST services are hosted in Prism Gateway. BASE_URL = 'https://%s:9440/api/nutanix/v2.0/' self.base_url = BASE_URL % self.serverIpAddress self.session = self.get_server_session(self.username, self.password) def getVirtualDiskInformation(self, virtual_disk_id, start_time_usecs, end_time_usecs, interval_secs, metric ): URL = self.base_url + "virtual_disks/"+virtual_disk_id+"/stats/?metrics="+metric+ \ "&start_time_in_usecs="+start_time_usecs+"" \ "&end_time_in_usecs="+end_time_usecs+"" \ "&interval_in_secs="+interval_secs serverResponse = self.session.get(URL) return json.loads(serverResponse.text)if __name__ == "__main__": try: ahvRestApi = AHVRestApi() ckoo_virtual_disk_id = 'c2193bad-29f2-4156-94d8-7bfc928f25c0' #win10_virtual_disk_id = '8a337f0a-d6d4-4157-a26a-93729680fb70' #old id win10_virtual_disk_id = '5065fba7-0671-409c-a746-eba05c38dda9' win2019_virtual_disk_id = 'd2e69200-82c8-4f7f-bc4a-8de856f905cc' #start_time_usecs = 1614429000000000 #Saturday, February 27, 2021 7:30:00 AM GMT-05:00 #start_time_usecs = 1614774600000000 #Saturday, March 3, 2021 7:30:00 AM GMT-05:00 #start_time_usecs = 1615077000000000 #Saturday, March 6, 2021 7:30:00 AM GMT-05:00 start_time_usecs = 1616247600000000 #Wed, March 17, 2021 10:45:00 AM GMT-05:00 end_time_usecs = 1616248620000000 #Wed, March 17, 2021 1:15:00 PM GMT-05:00 interval_secs = "120" metrics = ["controller.random_ops_per_sec", "controller_read_io_bandwidth_kBps", "controller_write_io_bandwidth_kBps", "controller_num_read_iops", "controller_num_write_iops", "hypervisor_avg_read_io_latency_usecs", "hypervisor_avg_write_io_latency_usecs", "controller_total_read_io_size_kbytes", "controller.read_size_histogram_4kB", "controller.read_size_histogram_8kB", "controller.read_size_histogram_16kB", "controller.read_size_histogram_32kB", "controller.read_size_histogram_64kB", "controller.read_size_histogram_512kB", "controller.read_size_histogram_1024kB", "controller.write_size_histogram_4kB", "controller.write_size_histogram_8kB", "controller.write_size_histogram_16kB", "controller.write_size_histogram_32kB", "controller.write_size_histogram_64kB", "controller.write_size_histogram_512kB", "controller.write_size_histogram_1024kB" ] with open("data.txt",'w') as my_file: for metric in metrics: win10_virtual_disk = ahvRestApi.getVirtualDiskInformation(win10_virtual_disk_id, str(start_time_usecs), str(end_time_usecs), interval_secs, metric) this_value = win10_virtual_disk['stats_specific_responses'][0]['values'] print(metric + "," + str(this_value) + "\n") my_file.write(metric + "," + str(this_value) + "\n") except Exception as ex: print(ex) ex sys.exit(1)
Appendix VI. Collected Data
Figure A6.1 shows an example of the data collected through the REST API and prepared for training.

Figure A6.1. Example of the collected and prepared dataset.
Appendix VII. DISKSPD Output
Command Line: C:\DISKSPD\x86\diskspd.exe -b8k -d30 -o4 -t4 -h -r -w25 -Z1G -L -c20G c:\iotest.datInput parameters:timespan: 1-------------duration: 30swarm up time: 5scool down time: 0smeasuring latencyrandom seed: 0path: 'c:\iotest.dat'think time: 0msburst size: 0software cache disabledhardware write cache disabled, writethrough onwrite buffer size: 1073741824performing mix test (read/write ratio: 75/25)block size: 8192using random I/O (alignment: 8192)number of outstanding I/O operations: 4thread stride size: 0threads per file: 4using I/O Completion PortsIO priority: normalSystem information:computer name: Winstart time: 2021/02/27 13:53:11 UTCResults for timespan 1:*******************************************************************************actual test time: 30.01sthread count: 4proc count: 2CPU | Usage | User | Kernel | Idle------------------------------------------- 0| 23.02%| 7.92%| 15.10%| 76.98% 1| 24.43%| 14.64%| 9.79%| 75.57%-------------------------------------------avg.| 23.72%| 11.28%| 12.45%| 76.28%Total IOthread | bytes | I/Os | MiB/s | I/O per s | AvgLat | LatStdDev | file----------------------------------------------------------------------------------------------------- 0 | 84271104 | 10287 | 2.68 | 342.73 | 11.662 | 16.062 | c:\iotest.dat (20GiB) 1 | 81010688 | 9889 | 2.57 | 329.47 | 12.127 | 17.012 | c:\iotest.dat (20GiB) 2 | 84172800 | 10275 | 2.67 | 342.33 | 11.676 | 16.164 | c:\iotest.dat (20GiB) 3 | 80904192 | 9876 | 2.57 | 329.04 | 12.142 | 17.595 | c:\iotest.dat (20GiB)-----------------------------------------------------------------------------------------------------total: 330358784 | 40327 | 10.50 | 1343.58 | 11.897 | 16.710Read IOthread | bytes | I/Os | MiB/s | I/O per s | AvgLat | LatStdDev | file----------------------------------------------------------------------------------------------------- 0 | 62570496 | 7638 | 1.99 | 254.48 | 11.267 | 16.182 | c:\iotest.dat (20GiB) 1 | 60710912 | 7411 | 1.93 | 246.91 | 11.872 | 16.114 | c:\iotest.dat (20GiB) 2 | 63102976 | 7703 | 2.01 | 256.64 | 11.461 | 16.900 | c:\iotest.dat (20GiB) 3 | 60448768 | 7379 | 1.92 | 245.85 | 12.000 | 18.401 | c:\iotest.dat (20GiB)-----------------------------------------------------------------------------------------------------total: 246833152 | 30131 | 7.84 | 1003.88 | 11.645 | 16.920Write IOthread | bytes | I/Os | MiB/s | I/O per s | AvgLat | LatStdDev | file----------------------------------------------------------------------------------------------------- 0 | 21700608 | 2649 | 0.69 | 88.26 | 12.802 | 15.654 | c:\iotest.dat (20GiB) 1 | 20299776 | 2478 | 0.65 | 82.56 | 12.891 | 19.429 | c:\iotest.dat (20GiB) 2 | 21069824 | 2572 | 0.67 | 85.69 | 12.321 | 13.705 | c:\iotest.dat (20GiB) 3 | 20455424 | 2497 | 0.65 | 83.19 | 12.560 | 14.952 | c:\iotest.dat (20GiB)-----------------------------------------------------------------------------------------------------total: 83525632 | 10196 | 2.65 | 339.70 | 12.643 | 16.050total: %-ile | Read (ms) | Write (ms) | Total (ms)---------------------------------------------- min | 0.442 | 1.430 | 0.442 25th | 7.628 | 8.612 | 7.870 50th | 9.215 | 10.198 | 9.463 75th | 10.993 | 11.980 | 11.277 90th | 14.605 | 15.977 | 14.952 95th | 22.197 | 24.319 | 22.712 99th | 68.312 | 70.150 | 68.5433-nines | 285.154 | 274.683 | 285.1544-nines | 468.722 | 467.886 | 468.7225-nines | 473.159 | 472.866 | 473.1596-nines | 473.159 | 472.866 | 473.1597-nines | 473.159 | 472.866 | 473.1598-nines | 473.159 | 472.866 | 473.1599-nines | 473.159 | 472.866 | 473.159 max | 473.159 | 472.866 | 473.159
References
[1] CrowdStrike, 2020 Global Threat Report. Sunnyvale, CA, USA: CrowdStrike, Inc., 2020.
[2] Cybersecurity and Infrastructure Security Agency, “Protecting against ransomware,” Security Tip ST19-001, Apr. 11, 2019. [Online]. Available: https://www.cisa.gov/news-events/news/protecting-against-ransomware
[3] The Hacker News, “Everything you need to know about evolving threat of ransomware,” thehackernews.com, Feb. 2021. [Online]. Available: https://thehackernews.com/2021/02/everything-you-need-to-know-about.html
[4] PurpleSec, “Ransomware statistics, data, and trends,” 2021. [Online]. Available: https://purplesec.us/resources/cyber-security-statistics/ransomware/
[5] G. Hull, H. John, and B. Arief, “Ransomware deployment methods and analysis: Views from a predictive model and human responses,” Crime Science, vol. 8, no. 2, 2019, doi: 10.1186/s40163-019-0097-9.
[6] E. Berrueta, D. Morato, E. Magana, and M. Izal, “A survey on detection techniques for cryptographic ransomware,” IEEE Access, vol. 7, pp. 144925-144944, 2019, doi: 10.1109/ACCESS.2019.2945839.
[7] B. Scott, “Case for HCI in the modern datacenter,” MyPureSupport Community, 2017. [Online]. Available: https://community.mypuresupport.com/case-for-hci-over-legacy-3-tier/
[8] M. S. Abbasi, H. Al-Sahaf, and I. Welch, “Particle swarm optimization: A wrapper-based feature selection method for ransomware detection and classification,” in Applications of Evolutionary Computation (EvoApplications 2020), Lecture Notes in Computer Science, vol. 12104. Cham, Switzerland: Springer, 2020, pp. 181-196, doi: 10.1007/978-3-030-43722-0_12.
[9] O. M. K. Alhawi, J. Baldwin, and A. Dehghantanha, “Leveraging machine learning techniques for Windows ransomware network traffic detection,” in Cyber Threat Intelligence, Advances in Information Security, vol. 70. Cham, Switzerland: Springer, 2018, pp. 93-106, doi: 10.1007/978-3-319-73951-9_5.
[10] R. Moussaileb, N. Cuppens, J.-L. Lanet, and H. Le Bouder, “Ransomware network traffic analysis for pre-encryption alert,” in Foundations and Practice of Security (FPS 2019), Lecture Notes in Computer Science, vol. 12056. Cham, Switzerland: Springer, 2020, pp. 20-38, doi: 10.1007/978-3-030-45371-8_2.
[11] G. Cusack, O. Michel, and E. Keller, “Machine learning-based detection of ransomware using SDN,” in Proc. 2018 ACM Int. Workshop on Security in Software Defined Networks & Network Function Virtualization (SDN-NFV Sec), 2018, pp. 1-6, doi: 10.1145/3180465.3180467.
[12] H. Daku, P. Zavarsky, and Y. Malik, “Behavioral-based classification and identification of ransomware variants using machine learning,” in Proc. 2018 17th IEEE Int. Conf. Trust, Security and Privacy in Computing and Communications / 12th IEEE Int. Conf. Big Data Science and Engineering (TrustCom/BigDataSE), 2018, pp. 1560-1564, doi: 10.1109/TrustCom/BigDataSE.2018.00224.
[13] S. K. Shaukat and V. J. Ribeiro, “RansomWall: A layered defense system against cryptographic ransomware attacks using machine learning,” in Proc. 2018 10th Int. Conf. Communication Systems & Networks (COMSNETS), 2018, pp. 356-363, doi: 10.1109/COMSNETS.2018.8328219.
[14] G. Ramesh and A. Menen, “Automated dynamic approach for detecting ransomware using finite-state machine,” Decision Support Systems, vol. 138, art. 113400, 2020, doi: 10.1016/j.dss.2020.113400.
[15] B. A. S. Al-rimy, M. A. Maarof, and S. Z. M. Shaid, “Ransomware threat success factors, taxonomy, and countermeasures: A survey and research directions,” Computers & Security, vol. 74, pp. 144-166, 2018, doi: 10.1016/j.cose.2018.01.001.
[16] D. W. Fernando, N. Komninos, and T. Chen, “A study on the evolution of ransomware detection using machine learning and deep learning techniques,” IoT, vol. 1, no. 2, pp. 551-604, 2020, doi: 10.3390/iot1020030.
[17] O. Or-Meir, N. Nissim, Y. Elovici, and L. Rokach, “Dynamic malware analysis in the modern era: a state of the art survey,” ACM Computing Surveys, vol. 52, no. 5, art. 88, pp. 1-48, 2019, doi: 10.1145/3329786.
[18] A. Mohaisen, O. Alrawi, and M. Mohaisen, “AMAL: High-fidelity, behavior-based automated malware analysis and classification,” Computers & Security, vol. 52, pp. 251-266, 2015, doi: 10.1016/j.cose.2015.04.001.
[19] D. Sgandurra, L. Munoz-Gonzalez, R. Mohsen, and E. C. Lupu, “Automated dynamic analysis of ransomware: Benefits, limitations and use for detection,” arXiv:1609.03020, Sep. 2016.
[20] M. E. Ahmed, H. Kim, S. Camtepe, and S. Nepal, “Peeler: Profiling kernel-level events to detect ransomware,” in Computer Security: ESORICS 2021, Lecture Notes in Computer Science, vol. 12972. Cham, Switzerland: Springer, 2021, pp. 240-260, doi: 10.1007/978-3-030-88418-5_12.
[21] A. Y. Huang, “Towards robust malware detection,” M.Eng. thesis, Dept. Electr. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 2018.
[22] A. Fattori, A. Lanzi, D. Balzarotti, and E. Kirda, “Hypervisor-based malware protection with AccessMiner,” Computers & Security, vol. 52, pp. 33-50, 2015, doi: 10.1016/j.cose.2015.03.007.
[23] N. Paul, S. Gurumurthi, and D. Evans, “Towards disk-level malware detection,” in Proc. Workshop on Code Based Software Security Assessments (CoBaSSA), 2005.
[24] S. Baek, Y. Jung, A. Mohaisen, S. Lee, and D. Nyang, “SSD-Insider: Internal defense of solid-state drive against ransomware with perfect data recovery,” in Proc. 2018 IEEE 38th Int. Conf. Distributed Computing Systems (ICDCS), 2018, pp. 875-884, doi: 10.1109/ICDCS.2018.00089.
[25] W. Xie, N. Chen, and B. Chen, “Poster: Incorporating malware detection into flash translation layer,” in Proc. 2020 IEEE Symp. Security and Privacy (Poster Session), 2020.
[26] A. Continella, A. Guagnelli, G. Zingaro, G. De Pasquale, A. Barenghi, S. Zanero, and F. Maggi, “ShieldFS: A self-healing, ransomware-aware filesystem,” in Proc. 32nd Annu. Computer Security Applications Conf. (ACSAC), 2016, pp. 336-347, doi: 10.1145/2991079.2991110.
[27] N. Scaife, H. Carter, P. Traynor, and K. R. B. Butler, “CryptoLock (and drop it): Stopping ransomware attacks on user data,” in Proc. 2016 IEEE 36th Int. Conf. Distributed Computing Systems (ICDCS), 2016, pp. 303-312, doi: 10.1109/ICDCS.2016.46.
[28] A. Kharraz, W. Robertson, D. Balzarotti, L. Bilge, and E. Kirda, “Cutting the Gordian knot: A look under the hood of ransomware attacks,” in Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2015), Lecture Notes in Computer Science, vol. 9148. Cham, Switzerland: Springer, 2015, pp. 3-24, doi: 10.1007/978-3-319-20550-2_1.
[29] D. Sebayan, “How threat modeling can prevent your next ransomware attack,” ThreatModeler, 2019. [Online]. Available: https://threatmodeler.com/
[30] Datacadamia, “I/O: workload (access pattern),” 2019. [Online]. Available: https://datacadamia.com/io/access_pattern
[31] J. Layton, “IO patterns: what you do not know can hurt you,” Enterprise Storage Forum, 2013. [Online]. Available: https://www.enterprisestorageforum.com/management/io-patterns-what-you-dont-know-can-hurt-you/
[32] C. Rossow, C. J. Dietrich, C. Grier, C. Kreibich, V. Paxson, N. Pohlmann, H. Bos, and M. van Steen, “Prudent practices for designing malware experiments: Status quo and outlook,” in Proc. 2012 IEEE Symp. Security and Privacy, 2012, pp. 65-79, doi: 10.1109/SP.2012.14.
[33] Cuckoo Foundation, “Preparing the host: Cuckoo Sandbox v2.0.7 book,” 2019. [Online]. Available: https://cuckoo.readthedocs.io/en/latest/installation/host/
[34] D. Murchison, “Home lab series: Cuckoo Sandbox on ESXi,” murchisd.github.io, Jan. 25, 2019. [Online]. Available: https://murchisd.github.io/pr0j3cts/2019/01/25/Cuckoo-Sandbox-and-ESXi.html
[35] EF Education First, “1000 most common words in English,” 2015. [Online]. Available: https://www.ef.com/wwen/english-resources/english-vocabulary/top-1000-words/
[36] S. Canny, “python-docx documentation,” 2013. [Online]. Available: https://python-docx.readthedocs.io/
[37] A. Arrington, “Automate Google image downloads with Python,” Medium, Apr. 19, 2020. [Online]. Available: https://medium.com/@austin_9875/automate-google-image-downloads-with-python-91b633130ba9
[38] C. Zita, “How to download Google images using Python (2021),” Level Up Coding (Medium), Jan. 25, 2021. [Online]. Available: https://levelup.gitconnected.com/how-to-download-google-images-using-python-2021-82e69c637d59
[39] DataRobot, “Feature variables,” DataRobot AI Wiki, 2019. [Online]. Available: https://www.datarobot.com/wiki/
[40] W. Arbash, “Dataset vs ground-truth dataset,” wao.ai, 2019. [Online]. Available: https://wao.ai/blog/dataset-vs-ground-truth-dataset
[41] Nutanix, “API reference,” Nutanix.dev, 2020. [Online]. Available: https://www.nutanix.dev/api-reference/
[42] Wikipedia contributors, “Robustness (computer science),” Wikipedia, The Free Encyclopedia, 2019. [Online]. Available: https://en.wikipedia.org/wiki/Robustness_(computer_science)
[43] G. Berry, “Using Microsoft DiskSpd to test your storage subsystem,” SQLPerformance.com, Aug. 4, 2015. [Online]. Available: https://sqlperformance.com/2015/08/io-subsystem/diskspd-test-storage
[44] J. Yi, “Use DISKSPD to test workload storage performance,” Azure Stack HCI Documentation, Microsoft Learn, 2020. [Online]. Available: https://learn.microsoft.com/azure-stack/hci/manage/diskspd-overview
[45] B. Sjerps, “Pinpointing I/O bottlenecks on Linux,” Dirty Cache, Mar. 4, 2011. [Online]. Available: https://bartsjerps.wordpress.com/2011/03/04/io-bottleneck-linux/
[46] Wikipedia contributors, “Memory access pattern,” Wikipedia, The Free Encyclopedia, 2020. [Online]. Available: https://en.wikipedia.org/wiki/Memory_access_pattern
[47] G. Holmes, A. Donkin, and I. H. Witten, “WEKA: A machine learning workbench,” in Proc. 2nd Australia and New Zealand Conf. Intelligent Information Systems (ANZIIS), 1994, pp. 357-361.
[48] Microsoft, “Create machine learning models,” Microsoft Learn Training, 2020. [Online]. Available: https://learn.microsoft.com/training/paths/create-machine-learn-models/
[49] Microsoft, “What is automated machine learning (AutoML)?,” Azure Machine Learning Documentation, Microsoft Learn, 2020. [Online]. Available: https://learn.microsoft.com/azure/machine-learning/concept-automated-ml
[50] Microsoft, “Tutorial: Train a classification model with no-code automated ML in the Azure Machine Learning studio,” Microsoft Learn, 2020. [Online]. Available: https://learn.microsoft.com/azure/machine-learning/tutorial-first-experiment-automated-ml
[51] F. Lazzeri, “How to select algorithms for Azure Machine Learning,” Microsoft Learn, 2020. [Online]. Available: https://learn.microsoft.com/azure/machine-learning/how-to-select-algorithms
[52] Microsoft, “PCA-based anomaly detection (ML Studio classic),” Azure Machine Learning Studio Module Reference, 2019. [Online]. Available: https://learn.microsoft.com/previous-versions/azure/machine-learning/studio-module-reference/pca-based-anomaly-detection
[53] Microsoft, “Train and evaluate classification models,” Microsoft Learn Training, 2020. [Online]. Available: https://learn.microsoft.com/training/modules/train-evaluate-classification-models/
[54] C. Moore, “Detecting ransomware with honeypot techniques,” in Proc. 2016 Cybersecurity and Cyberforensics Conf. (CCC), 2016, pp. 77-81, doi: 10.1109/CCC.2016.14.
[55] Kaspersky, “What is a honeypot?,” Kaspersky Resource Center, 2020. [Online]. Available: https://usa.kaspersky.com/resource-center/threats/what-is-a-honeypot
[56] C. Hosterman, “The case for vVols and ransomware,” codyhosterman.com, Mar. 17, 2020. [Online]. Available: https://www.codyhosterman.com/2020/03/the-case-for-vvols-and-ransomware/
[57] T. Rayner, “Simulating a ransomware attack with PowerShell,” CanITPro Blog, Microsoft TechNet, Jan. 27, 2016. [Online]. Available: https://learn.microsoft.com/archive/blogs/canitpro/simulating-a-ransomware-attack-with-powershell













