CoopIS 2024 Abstracts


Area 1 - Human Aspects and Social Interaction in Information Systems

Full Papers
Paper Nr: 23
Title:

Using Eye-Tracking to Detect Search and Inference During Process Model Comprehension

Authors:

Amine Abbad-Andaloussi, Clemens Schreiber and Barbara Weber

Abstract: Understanding process models involves different cognitive processes. These processes typically manifest in users’ visual behavior and thus can be captured using eye-tracking. In this paper, we focus on the detection of two very essential behaviors: information search and inference. Using a set of eye-tracking features allowing to discern these two behaviors, we train several machine learning (ML) models to predict whether the user is involved in a search phase or an inference one. Following a cross-validation approach inspired by the leave-one-out method, our ML models attain 85% precision, 82% recall, and an F1 score of 80%. The outcome of this work enables the creation of novel adaptive systems, detecting whether the user is involved in a search or inference phase and accordingly providing adequate support. Moreover, it opens up new opportunities to better understand how different process model, tool, user and task-related factors affect users’ search and inference behaviors.

Area 2 - Inductive Learning, Machine-Learning and Knowledge Discovery

Full Papers
Paper Nr: 50
Title:

FleX: Interpreting Graph Neural Networks with Subgraph Extraction and Flexible Objective Estimation

Authors:

Duy Nguyen, Thanh Le and Bac Le

Abstract: Graph Neural Networks (GNNs) have shown remarkable results in graph-related tasks, yet interpreting their decision-making process remains challenging. Most existing methods for interpreting GNNs focus on finding a subgraph that preserves the model’s predictions. However, removing edges alters the original structure, making the optimization process heavily dependent on the loss function. In this paper, we propose FleX, a novel approach that transcends these limitations by using a distillation model to estimate prediction values after subgraph extraction. Our method combines implicit and explicit edge masking techniques to identify the most relevant subgraph. We introduce a flexible loss estimation strategy that allows for a more robust optimization process. Experimental results demonstrate that FleX outperforms most existing Graph XAI models across various benchmark datasets, achieving superior performance in interpreting GNNs. This approach enhances the interpretability of GNNs while maintaining high accuracy, contributing to more trustworthy and explainable graph-based machine learning models.

Paper Nr: 78
Title:

Automating Pathway Extraction from Clinical Guidelines: A Conceptual Model, Datasets and Initial Experiments

Authors:

Daniel Grathwol, Han van der Aa and Hugo A. López

Abstract: Clinical pathways are structured, multidisciplinary care plans utilized by healthcare providers to standardize the management of specific clinical problems. Designed to bridge the gap between evidence and practice, clinical pathways aim to enhance clinical outcomes and improve efficiency, often reducing hospital stays and lowering healthcare costs. However, maintaining pathways with up-to-date, evidence-based recommendations is complex and time-consuming. It requires the integration of clinical guidelines, algorithmic procedures, and tacit knowledge from various institutions. A critical aspect of updating clinical pathways involves extracting procedural information from clinical guidelines, which are textual documents that detail medical procedures. This paper explores how Large Language Models (LLMs) can facilitate this extraction to support clinical pathway development and maintenance. Concretely, we present a conceptual model for using LLMs in this extraction task, provide a dataset comprising thousands of clinical guidelines for academic research, and share the results of initial experiments demonstrating the efficacy of LLMs in extracting relevant pathway information from these guidelines.

Short Papers
Paper Nr: 84
Title:

Discovering Order-Inducing Features in Event Knowledge Graphs

Authors:

Christoffer O. Back and Jakob G. Simonsen

Abstract: Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled, probabilistic framing based on featured-derived partial orders on events. We derive an EKG discovery algorithm based on statistical inference rather than an ad-hoc or heuristic-based strategy, or manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the likelihood term involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We show that the posterior probability as defined is antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.

Area 3 - Knowledge Graphs, Data, Information, and Knowledge Engineering

Full Papers
Paper Nr: 34
Title:

A User-Driven Hybrid Neuro-Symbolic Approach for Knowledge Graph Creation from Relational Data

Authors:

Jan-David Stütz, Oliver Karras, Allard Oelen and Sören Auer

Abstract: In all kinds of organizations, relational data is prevalent and ubiquitous in a plethora of systems. However, the integration and exchange of such data is cumbersome, time-consuming, and error-prone. Semantic technologies, such as ontologies, KGs, and linked data, were developed to facilitate this but require comprehensive technical skills and complex methods for mapping relational data to semantic formalisms. Naturally, this process lacks speed, scalability, and automation. This work presents a novel user-driven neuro-symbolic approach to transform relational data into KGs. In our approach, users are supported by neural models (in particular Large Language Models) and symbolic formalisms (ontologies and mappings) to automate various mapping tasks and thus speed up and scale up the transformation from relational to linked data. We implemented our approach in a comprehensive intelligent assistant dubbed LXS. Our experimental evaluation, conducted primarily with participants from the Robert Bosch GmbH, demonstrates enhanced mapping quality compared to manual creation, a competitive application, and AI-only generations. Additionally, it significantly reduces user interaction time by nearly half, independent of the user’s experience level. Also, qualitatively, users appreciated the attractiveness and novelty of the user interface. Furthermore, the neuro-symbolic approach of LXS contributes to a more trustworthy human-AI interaction since it keeps users in the loop and provides transparency in the transformation process.

Short Papers
Paper Nr: 60
Title:

Enhancing Temporal Knowledge Graph Reasoning with Contrastive Learning and Self-Attention Mechanisms

Authors:

Bao T. Kim and Thanh Le

Abstract: Recent advancements in reasoning over Temporal Knowledge Graphs have leveraged historical data to forecast future events more effectively. Traditional models primarily rely on the recurrence and periodicity of events, using past occurrences to predict future ones. These methods often use a self-relation mechanism to account for the influence of timestamps in predictions but typically overlook the significance of interconnected entities within the temporal framework. Addressing this oversight, we introduce a new model called CA-GCN, which is based on a relational graph convolution network. This model not only taps into historical data through a self-attention mechanism but also integrates previously unseen static information. It further extracts insights from the graph’s structure using contrastive learning techniques. The embeddings generated by our model are utilized to train a linear binary classifier, aimed at identifying entities crucial for future predictions. Our model demonstrates substantial improvements, showing up to a 5.78% increase in Mean Reciprocal Rank (MRR) and a 10.88% rise in Hits@1 accuracy, when tested across several standard datasets such as ICEWS14, ICEWS18, YAGO, and WIKI. These results indicate that CA-GCN significantly outperforms existing models, providing enhanced predictive accuracy in various evaluation metrics.

Paper Nr: 63
Title:

Graph Convolution Transformer for Extrapolated Reasoning on Temporal Knowledge Graphs

Authors:

Hoa Dao, Nguyen Phan and Thanh Le

Abstract: Extrapolation on Temporal Knowledge Graphs presents a critical challenge, driven by its applications in predicting future events by analyzing historical data. While recent methods leverage graph structure and temporal dynamics, they often struggle to prioritize neighborhood messages and capture evolving temporal attributes at local and global scales. To address these issues, we introduce a novel forecasting architecture, named Graph Convolution Transformer, which incorporates a time-aware self-attention mechanism. Our approach integrates a Fact Graph Transformer to structure historical data and a Temporal Transformer with advanced position encoding for enhanced time series representation. Also, we propose Query-ConvTransE in the decoder to handle query-based data. Extensive evaluations across six benchmark datasets demonstrate that the model outperforms prior approaches, improving the Mean Reciprocal Rank metric by roughly 2% to 3%, with a notable advancement of 5.23% on the GDELT dataset experiment.

Area 4 - Process Analytics and Technology

Full Papers
Paper Nr: 21
Title:

Handling Catastrophic Forgetting: Online Continual Learning for next Activity Prediction

Authors:

Tamara Verbeek and Marwan Hassani

Abstract: Predictive business process monitoring focuses on predicting future process trajectories, including next-activity predictions. This is crucial in dynamic environments where processes change or face uncertainty. However, current frameworks often assume a static environment, overlooking dynamic characteristics and concept drifts. This results in catastrophic forgetting, where training while focusing merely on new data distribution negatively impacts the performance on previously learned data distributions. Continual learning addresses, among others, the challenges related to mitigating catastrophic forgetting. This paper proposes a novel approach called Continual Next Activity Prediction with Prompts (CNAPwP) which adapts the DualPrompt algorithm for next-activity prediction to improve accuracy and adaptability while mitigating catastrophic forgetting. New datasets with recurring concept drifts are introduced, alongside a task-specific forgetting metric that measures the prediction accuracy gap between initial and subsequent task encounters. Extensive testing on both synthetic and real-world datasets shows that this approach outperforms five competing methods, demonstrating its potential applicability in real-world scenarios. An open source implementation of our method together with datasets and results are available under: https://github.com/TamaraVerbeek/CNAPwP

Paper Nr: 25
Title:

SwiftMend: An Approach to Detect and Repair Activity Label Quality Issues in Process Event Streams

Authors:

Savandi Kalukapuge, Arthur M. ter Hofstede and Moe T. Wynn

Abstract: Process mining (PM) techniques extract insights from event logs to discover, monitor, and improve business processes. The quality of input data significantly impacts the reliability and accuracy of these insights. Existing approaches to detect and repair these issues are limited to offline data pre-processing. Given the potential of real-time process analysis to provide valuable business-related insights, online PM has gained interest. However, process-data quality (PDQ) issues in process event streams (PES) beyond anomalous events or traces have not yet been addressed. Existing PDQ management approaches lack the adaptability and incremental processing capabilities necessary for streaming event data and evolving processes. This paper presents a novel approach for dynamically detecting and repairing synonymous, polluted, and distorted activity labels in PES, which are common issues affecting the quality of PM outcomes. By incrementally maintaining stabilised activity control-flow context using memory-efficient approximate data structures, the approach detects and merges similar labels or splits dissimilar labels on the fly. An incremental hierarchical clustering algorithm, incorporating decaying and forgetting mechanisms, is employed for the dynamic repair of similar activities, ensuring efficiency and adaptability. The approach is validated using publicly available real-life logs from two hospitals.

Paper Nr: 41
Title:

Event Log Extraction for Process Mining Using Large Language Models

Authors:

Vinicius Stein Dani, Marcus Dees, Henrik Leopold, Kiran Busch, Iris Beerepoot, Jan M. van der Werf and Hajo A. Reijers

Abstract: Process mining is a discipline that enables organizations to discover and analyze their work processes. A prerequisite for conducting a process mining initiative is the so-called event log, which is not always readily available. In such cases, extracting an event log involves various time-consuming tasks, such as creating tailor-made structured query language (SQL) scripts to extract an event log from a relational database. With this work, we investigate the use of large language models (LLMs) to support event log extraction, particularly by leveraging LLMs ability to produce SQL scripts. In this paper, we report on how effectively an LLM can assist with event log extraction for process mining. Despite the intrinsic non-deterministic nature of LLMs, our results show the potential of future LLM-assisted event log extraction tools, especially when domain and data knowledge are available. The implementation of such tools can increase access to event log extraction to a broader range of users within an organization by reducing the reliance on specialized technical skills for producing relational database query scripts and minimizing manual effort.

Paper Nr: 46
Title:

All Optimal k-Bounded Alignments Using the FM-Index

Authors:

Astrid Rivera-Partida, Abel Armas-Cervantes, Luciano García-Bañuelos and Luis Rodríguez-Flores

Abstract: Alignments are a popular technique in process mining to compare pairs of process executions. Given a pair of process executions, an optimal alignment represents the commonalities with the minimum number of differences. Compared process executions are event sequences representing process model runs or traces in an event log. Alignments are used in different process mining operations, such as conformance checking and comparison of event logs, a.k.a. variants analysis or log delta analysis. Given an event sequence, several optimal alignments can exist, but the majority of alignment techniques focus on computing a single (optimal) solution. Often, it is due to the exponential complexity associated to the computation of all optimal alignments. To tackle this problem, we present a novel approach to compute all k-bounded optimal alignments, which uses a text indexing technique called FM-Index. Given a k, our approach computes optimal alignments with up to k differences. The approach is evaluated in the context of conformance checking and variants analysis using synthetic and real-life event logs. The results show the feasibility to compute all optimal alignments in a reasonable time.

Paper Nr: 61
Title:

A Decomposed Hybrid Approach to Business Process Modeling with LLMs

Authors:

Ali Nour Eldin, Nour Assy, Olan Anesini, Benjamin Dalmas and Walid Gaaloul

Abstract: This paper proposes a hybrid and decomposed approach to automate process model generation from textual descriptions using Large Language Models (LLMs). Leveraging LLMs with prompting techniques is promising due to the scarcity of training data. While recent approaches explore LLMs’ potential in process modeling, the inherent complexity of this task limits their applicability to real-world scenarios where descriptions by non-experts may be complex or incomplete. Our approach addresses these challenges by modularizing the task into distinct steps within a hybrid pipeline: the LLM analyzes, clarifies, and completes the textual description, and extracts process entities and relationships. The process model is then constructed using a structured algorithm. This hybrid methodology integrates LLMs’ natural language understanding with a deterministic approach for robust model creation. Evaluation results demonstrate that our approach uses less tokens, and generates more accurate and understandable models compared to existing methods.

Paper Nr: 69
Title:

Towards Fairness-Aware Predictive Process Monitoring: Evaluating Bias Mitigation Techniques

Authors:

Mickaelle Caldeira da Silva, Marcelo Fantinato and Sarajane M. Peres

Abstract: Predictive process monitoring (PPM) faces fairness issues due to biases in historical data, causing discriminatory practices. Balancing fairness and performance in PPM is crucial but underexplored, possibly requiring the adaptation of ML fairness techniques to process mining. This study assesses Reweighing, Adversarial Debiasing, and Equalized Odds Postprocessing to reduce discrimination in PPM models and understand their trade-offs. Using synthetic event logs of a hiring process with varying discrimination levels, we analyzed the models’ performance and fairness metrics. Reweighing improved fairness with minimal performance loss, Adversarial Debiasing greatly boosted fairness but reduced accuracy and recall, and Equalized Odds Postprocessing kept performance without notable fairness gains. Our study offers insights into applying fairness techniques in PPM, advancing equitable and effective predictive models.

Paper Nr: 70
Title:

Assisted Data Annotation for Business Process Information Extraction from Textual Documents

Authors:

Julian Neuberger, Han van der Aa, Lars Ackermann, Daniel Buschek, Jannic Herrmann and Stefan Jablonski

Abstract: Machine-learning based generation of process models from natural language text process descriptions provides a solution for the time-intensive and expensive process discovery phase. Many organizations have to carry out this phase, before they can utilize business process management and its benefits. Yet, research towards this is severely restrained by an apparent lack of large and highquality datasets. This lack of data can be attributed to, among other things, an absence of proper tool assistance for business process information extraction dataset creation, resulting in high workloads and inferior data quality. We explore two assistance features to support dataset creation, a recommendation system for identifying process information in the text and visualization of the current state of already identified process information as a graphical business process model. A controlled user study with 31 participants shows that assisting dataset creators with recommendations lowers all aspects of workload, up to −51.0%, and significantly improves annotation quality, up to +38.9% in F1 score. We make all data and code available to encourage further research on additional novel assistance strategies.

Paper Nr: 76
Title:

Unsupervised Anomaly Detection of Prefixes in Event Streams Using Online Autoencoders

Authors:

Zyrako Musaj and Marwan Hassani

Abstract: In this work we address the problem of unsupervised online detection of anomalies in traces of logs. Our input is an event log containing multiple traces where each trace is an ordered and finite sequence of activities. This problem presents a significant challenge due to the need to identify abnormal sequence patterns without the benefit of labeled data or the advantage of being able to forget individual event data since an instance is represented by a specific sequence of events. This requires methods that can adapt to evolving data streams and provide timely and accurate anomaly detection while efficiently managing limited memory resources. This paper presents an efficient unsupervised-learning method for online anomaly detection. We leverage a limited data structure to store prefixes. Event stream prefixes are transformed into vector representations using word2vec or one-hot encoding, which are fed into an online autoencoder. The discrepancy between input and output generates a reconstruction error, serving as an anomaly score. We also introduce Progressive Anomaly Labelling (PAL), a dynamic method for real-time anomaly detection which helps in labelling suffixes as anomalous once their prefix is labelled as such. Our approach excels in detecting control-flow and data-flow anomalies, early anomaly identification, and reduced execution time, outperforming state-of-the-art online anomaly detection techniques. The implementation and the datasets are publicly available at https://github.com/zyrako4/sequence-online-ad

Paper Nr: 85
Title:

Autoencoder-Based Detection of Delays, Handovers and Workloads over High-Level Events

Authors:

Irne Verwijst, Robin Mennens, Roeland Scheepens and Marwan Hassani

Abstract: Detecting delays, anomalous work handovers, and high workloads is a challenging process mining task that is typically performed at the case level. However, process mining users would benefit from analyzing such behaviors at the process level where instances of such behavior are called high-level events. We propose a novel framework for high-level event mining that leverages anomaly detection and clustering methods to identify and analyze high-level events in an unsupervised setting. Our framework, called High-level Event Mining Machine Learning Approach (HEMMLA), utilizes an autoencoder-based anomaly detection method and requires no predefined time window or anomaly thresholds. An extensive experimental evaluation over real and synthetic datasets highlights the high scalability of our approach. An additional user study over real datasets underlines the ability of our framework to detect more interesting and explainable anomalies than the state-of-the-art.

Paper Nr: 86
Title:

Conversationally Actionable Process Model Creation

Authors:

Nataliia Klievtsova, Timotheus Kampik, Juergen Mangler and Stefanie Rinderle-Ma

Abstract: With the recent success of large language models, the idea of AI-augmented Business Process Management systems is becoming more feasible. One of their essential characteristics is the ability to be conversationally actionable, allowing humans to interact with the system effectively. However, most current research focuses on single-prompt execution and evaluation of results, rather than on continuous interaction between the user and the system. In this work, we aim to explore the feasibility of using chatbots to empower domain experts in the creation and redesign of process models in an effective and iterative way. In particular, we experiment with the prompt design for a selection of redesign tasks on a collection of process models from literature. The most effective prompt is then selected for the conducted user study with domain experts and process modelers in order to assess the support provided by the chatbot in conversationally creating and redesigning a manufacturing process model. The results from the prompt design experiment and the user study are promising w.r.t. correctness of the models and user satisfaction.

Short Papers
Paper Nr: 51
Title:

Predictive Process Approach for Email Response Recommendations

Authors:

Ralph B. Nader, Marwa Elleuch, Ikram Garfatta, Walid Gaaloul and Boualem Benatallah

Abstract: Process prediction requires analyzing traces to forecast future activities in a process. Traces can be found in information systems’ logs, such as email systems used by business actors. While email traces can aid in process prediction, their unstructured textual nature poses challenges for existing techniques. Additionally, predicting process-oriented emails goes beyond identifying future business process (BP) activities, as it also involves recommending the emails needed for BP actors to perform these activities. Current approaches to email prediction primarily focus on email management, with limited attention to BP contexts, and often only reach the BP discovery or email classification stages. This paper presents an overview of a novel process-activity aware email response recommendation system, designed to enhance both relevance and efficiency in business communications by offering BP knowledge and tailored response templates for incoming emails. The system provides specific recommendations on activities to include in responses, their intent (speech act), and associated business data. Unlike existing approaches, this work uniquely leverages unstructured email data to predict process activities for email responses and incorporates BP knowledge to offer BP-oriented guidance.

Paper Nr: 56
Title:

Achieving Fairness in Predictive Process Analytics via Adversarial Learning

Authors:

Massimiliano de Leoni and Alessandro Padella

Abstract: Predictive business process analytics has become important for organizations, offering real-time operational support for their processes. However, these algorithms often perform unfair predictions because they are based on biased variables (e.g., gender or nationality), namely variables embodying discrimination. This paper addresses the challenge of integrating a debiasing phase into predictive business process analytics to ensure that predictions are not influenced by biased variables. Our framework leverages on adversial debiasing is evaluated on four use cases, showing a significant reduction in the contribution of biased variables to the predicted value. The proposed technique is also compared with the state of the art in fairness in process mining, illustrating that our framework allows for a more enhanced level of fairness, while retaining a better prediction quality.

Paper Nr: 65
Title:

Collaboration Miner: Discovering Collaboration Petri Nets

Authors:

Janik-Vasily Benzin and Stefanie Rinderle-Ma

Abstract: Collaborative discovery techniques mine models that represent behavior of collaborating cases within multiple process orchestrations that interact via collaboration concepts such as organizations, agents, and services. In this work, we rely on collaboration Petri nets as models and propose the Collaboration Miner (CM) to improve the quality of the discovered models. Moreover, CM can discover heterogeneous collaboration concepts and types such as resource sharing and message exchange, resulting in fitting and precise collaboration Petri nets. The evaluation shows that CM achieves its design goals: no assumptions on concepts and types as well as fitting and precise models, based on 26 artificial and real-world event logs.

Area 5 - Services and Cloud in Information Systems

Full Papers
Paper Nr: 30
Title:

TALOS: Task Level Autoscaler for Apache Flink

Authors:

Ourania Ntouni and Euripides G. Petrakis

Abstract: Apache Flink must scale its computational resources at run-time to comply with the real-time response requirements of fast-paced and changing workloads. TALOS is a task autoscaler designed to optimize Flink jobs’ performance while minimizing infrastructure usage costs in the cloud. Most autoscaling methods solve the resource adaptation problem by allocating new resources to the entire Flink job (i.e., pipeline). These solutions are suboptimal since not all tasks are equally stressed and do not need to be scaled, leading to over or under-provisioning of resources. TALOS monitors each task individually to decide how to scale the task based on its own data processing needs, without being affected by the performance of other upstream or downstream tasks. TALOS provides a better performance-to-cost ratio compared to the state-of-the-art Autoscaler of Flink Kubernetes Operator. Both agents are tested on sophisticated workloads running a click fraud detection application for several hours.

Paper Nr: 36
Title:

Self-Organising Approach to Anomaly Mitigation in the Cloud-to-Edge Continuum

Authors:

Bruno Faria, David P. Abreu, Karima Velasquez and Marília Curado

Abstract: The cloud-to-edge continuum paradigm has permeated various application domains, including critical urban-city safety systems. In these contexts, anomalies can compromise public safety, for example, by disrupting the communication between smart city infrastructure and vehicles, which aims to prevent accidents at pedestrian crossings. Given these environments’ heterogeneous and large-scale nature, manual recovery from anomalies is not feasible. Machine Learning techniques have emerged as an alternative, supporting a zero-touch approach that enables self-organising and self-healing solutions for anomaly prediction, detection, and mitigation. This paper proposes an Artificial Intelligence-driven, self-organising approach for anomaly management in the cloud-to-edge continuum, integrating both reactive and proactive mechanisms. We evaluate different Machine Learning models, including Random Forest Classifiers, Neural Networks, and Convolutional Neural Networks, to predict node performance anomalies. The simulation results obtained using the COSCO framework showcase the effectiveness of our method. It achieves an F1 score of 73% for multiclass classification, predicting different levels of anomaly severity, and 87% for binary classification, distinguishing between normal and abnormal states.

Area 6 - Applications of AI augmented Information Systems

Short Papers
Paper Nr: 33
Title:

IML4DQ: Interactive Machine Learning for Data Quality with Applications in Credit Risk

Authors:

Elena Tiukhova, Adriano Salcuni, Can Oguz, Fabio Forte, Bart Baesens and Monique Snoeck

Abstract: Data Quality (DQ) has gained popularity in recent years due to the increasing reliance on data in machine learning (ML). The DQ domain itself can benefit from ML, which is able to learn from large amounts of data, saving time and resources required by manual DQ assurance. To extend the accessibility of ML solutions and incorporate human input, Interactive ML (IML) integrates ML with a user interface (UI) that facilitates a human-in-the-loop approach. Both high-quality data and human involvement are critical in credit risk management (CRM), where poor DQ can lead to incorrect decisions, causing both ethical issues and financial losses. This paper introduces IML4DQ, a novel IML-based solution designed to ensure DQ in CRM through a dedicated UI. The IML4DQ design is grounded in established IML practices and key UI design principles. A rigorous evaluation using behavioral change theories reveals new insights into the significance of instrumental attitude and government- and management-based norms in shaping attitudes towards DQ in CRM, as well as positive attitude towards automating DQ processes with IML.

Area 7 - Internet of Things, Cyber Physical Systems and Digital Twins

Short Papers
Paper Nr: 35
Title:

Optimizing B-trees for Memory-Constrained Flash Embedded Devices

Authors:

Nadir Ould-Khessal, Scott Fazackerley and Ramon Lawrence

Abstract: Small devices collecting data for agricultural, environmental, and industrial monitoring enable Internet of Things (IoT) applications. Given their critical role in data collection, there is a need for optimizations to improve on-device data processing. Edge device computing allows processing of the data closer to where it is collected and reduces the amount of network transmissions. The B-tree has been optimized for flash storage on servers and solid-state drives, but these optimizations often require hardware and memory resources not available on embedded devices. The contribution of this work is the development and experimental evaluation of multiple variants for B-trees on memory-constrained embedded devices. Experimental results demonstrate that even the smallest devices can perform efficient B-tree indexing, and there is a significant performance advantage for using storage-specific optimizations.