An Extended Firefly Algorithm for Enhanced Information Diffusion with Multi-Factor Considerations

Amjad Alloush; Ghaida Rebdawi; Mohammad Saeed Abou Trab

An Extended Firefly Algorithm for Enhanced Information Diffusion with Multi-Factor Considerations

Abstract

Understanding and predicting how information spreads in online social networks is a crucial yet complex task, especially with the growing influence of content type, user engagement, and social dynamics. In this study, we propose an enhanced information diffusion model based on an Extended Modified Firefly Algorithm (EMFA), integrating four key features: content type, engagement level, temporal behavior, and user social attributes. Unlike classical models, which treat information diffusion uniformly, our approach adapts dynamically to the nature of content and the behavioral patterns of users. The proposed model is evaluated using multiple real-world datasets, including Twitter and Reddit, and compared against state-of-the-art optimization-based diffusion models such as PSO, ACO, and GWO. The experimental results show that incorporating these factors significantly improves the accuracy and realism of diffusion predictions. We also conducted a sensitivity analysis to assess the individual impact of each factor and demonstrated the model’s robustness in simulating viral trends and predicting peak diffusion times. This work contributes a refined and adaptive computational framework for simulating complex diffusion dynamics in modern social ecosystems and opens pathways for applications in rumor control, digital marketing, and social behavior forecasting.

Keywords : Emfa, Information Diffusion, Swarm Intelligence, Hybrid Metaheuristic.

INTRODUCTION

The rapid growth of online social networks (OSNs) has drastically transformed the dynamics of information sharing and public discourse. Platforms such as Twitter, Reddit, and Facebook enable users to produce, share, and react to information in real time, leading to large-scale information cascades that can influence opinions, shape behaviors, and even affect societal outcomes [1]. Understanding how information diffuses through these networks is essential for multiple domains, including public health, political communication, marketing, and misinformation detection [2,3]. Early research on information diffusion relied primarily on epidemic-inspired models, such as the Independent Cascade (IC) and Linear Threshold (LT) models [4,5], which conceptualize the spread of information analogously to disease transmission. While these models are intuitive and computationally efficient, they often fail to incorporate key social and contextual factors that influence user behavior. To address these limitations, researchers have turned to optimization-based models, particularly those inspired by swarm intelligence. Algorithms such as Particle Swarm Optimization (PSO) [6], Ant Colony Optimization (ACO) [7], and Firefly Algorithm (FA) [8] have been used to simulate diffusion dynamics, optimize influence maximization, and improve prediction of cascade growth. These algorithms offer flexibility and adaptability; however, most implementations simplify the network context by treating nodes and content as homogeneous, thus failing to represent realistic user interactions. Several recent studies have proposed enhancements to these algorithms. For instance, Hsu et al. [9] introduced a hybrid ACO-GWO model for influence prediction, and Zhang et al. [10] incorporated topic modeling into diffusion forecasting. Yet, the inclusion of behavioral, temporal, and semantic features in metaheuristic-based diffusion models remains limited. Most current models neglect how content type (e.g., image vs. text), user engagement metrics (e.g., likes, shares), or posting time can dramatically alter the trajectory of information spread. Another key limitation is the lack of personalized or socially-aware modeling. Research by Huang et al. [11] showed that user credibility and influence scores play a significant role in the virality of information, yet such attributes are rarely encoded in swarm-based models. In addition, few studies have conducted a detailed sensitivity analysis to quantify the individual contribution of each factor to diffusion performance. In our previous work [12], we introduced a Modified Firefly Algorithm (MFA) for modeling information diffusion. While the model demonstrated competitive accuracy compared to traditional methods, it assumed uniform content behavior and excluded temporal or social user attributes. In this work, we propose an Extended Modified Firefly Algorithm (EMFA) that incorporates four critical dimensions — content type, engagement level, temporal dynamics, and user social attributes — into the diffusion modeling process. Building on our earlier MFA framework, we enhance the algorithm by embedding feature-aware adaptation strategies that respond to real-time user behavior and content variations. The integration of semantic, temporal, and social factors enables more accurate and interpretable predictions of how and when information spreads across a network. We evaluate the proposed model using real-world datasets from Twitter and Reddit, and benchmark it against leading metaheuristic-based diffusion models. The results demonstrate that the EMFA significantly outperforms baseline models in terms of prediction accuracy, diffusion realism, and sensitivity to external factors. Our contributions are threefold:

We develop an extended MFA model that integrates content, engagement, time, and user features into the diffusion process.
We conduct large-scale experiments using multi-platform datasets and compare results with state-of-the-art algorithms.
We analyze the sensitivity and robustness of each added factor, offering insights into the individual and combined effects on diffusion dynamics.

MATERIAL AND METHODS

Dataset Description

To evaluate the performance of the proposed Extended Modified Firefly Algorithm (EMFA), we utilized two real-world datasets:

Twitter Dataset: Extracted from the COVID-19 open research dataset (CORD-19) and filtered to include viral tweets related to health misinformation. The dataset includes tweet content, timestamps, engagement metrics (likes, retweets, replies), and user metadata (follower count, verification status, influence score).
Reddit Dataset: Sourced from multiple subreddits covering news and technology, capturing thread posts and comment cascades. Each record contains the post type (text/image/video), temporal metadata, and user engagement indicators (upvotes, replies).

All datasets were anonymized and preprocessed to remove bots and inactive users, normalize timestamps, and standardize engagement metrics.

Feature Engineering

We integrated four key dimensions into the simulation:

Content Type: Each item was categorized as text, image, video, or link-based. A semantic relevance score was assigned using a transformer-based language model (e.g., BERT) to capture inherent virality potential.
Engagement Metrics: We aggregated likes, shares, comments (Twitter) and upvotes, replies (Reddit) into a normalized engagement intensity score, which dynamically influenced the firefly brightness during the simulation.
Temporal Dynamics: Time of day, recency of post, and frequency of exposure were used to create a temporal weight function, adjusting node sensitivity over time.
User Social Attributes: For each user, we computed an influence score based on follower count, activity rate, and past cascade participation, and a trust score derived from content veracity metrics.

These features were embedded into the firefly movement logic to create context-sensitive swarm behavior.

Extended Modified Firefly Algorithm (EMFA)

We extended the classical Firefly Algorithm (FA) by incorporating semantic, temporal, engagement, and user-level features into the simulation of information diffusion in OSNs. The EMFA consists of the following key components:

Brightness Function

The brightness I_i of a firefly i, which reflects its attractiveness to others, is defined as a weighted composite of four dimensionwhere:

C_i : Content virality score derived from semantic classification (e.g., image vs. text).
E_i : Engagement score normalized from likes, shares, and upvotes.
T_i : Temporal relevance based on post recency and activity burst.
Si: Social trust and influence score of the user.
α, β, γ, δ: Tunable weights (hyperparameters) for each factor, summing to 1.

These weights are selected via grid search to optimize diffusion accuracy over validation data.

Distance Function

To measure the similarity or proximity between fireflies i and j, we use a hybrid function:where:

ContentSim: Cosine similarity between content vectors.
UserSim: Normalized difference in social features (e.g., follower count, credibility).
TimeDecay(t_i,t_j): A decay function emphasizing temporal proximity.
θ₁+θ₂+θ₃=1: Feature similarity weights.

Movement Rule

The movement of a firefly i towards a more attractive firefly j is governed by:where:

x_i^t : Position of firefly i at iteration t, representing its current diffusion vector.
β₀: Base attractiveness.
λ: Light absorption coefficient controlling decay of influence over distance.
ϵ: Step-size coefficient modulating stochastic movement.
N (0,1): A Gaussian noise term.

In this formulation, fireflies (representing posts or users) with higher brightness attract others, and the movement simulates information flowing through a network based on both attractiveness and proximity.

Cascade Termination

Diffusion halts when one of the following conditions is met:

The maximum number of iterations is reached.
The change in global brightness is below a defined threshold (ΔI<ϵ_min ).
No firefly finds a brighter neighbor for a specified number of steps.

Simulation Environment

Platform: All experiments were conducted in Python 3.11 using the DEAP framework for evolutionary computation.
Hardware: Simulations were performed on a personal computer (Intel Core i7, 16GB RAM), with efficient code optimization.
Repetition: Each diffusion simulation was repeated 30 times to mitigate stochastic variability, and the average values were used for evaluation.

Evaluation Metrics

We evaluated the model based on the following metrics:

Prediction Accuracy: Comparing predicted cascade sizes and shapes to actual data.
Diffusion Depth and Breadth: Number of layers and maximum nodes reached.
Time to Peak Engagement: Temporal alignment with real cascade peaks.
Sensitivity Analysis: Ablation tests by disabling each feature dimension to assess its impact on model performance.

RESULTS

Quantitative Evaluation

We evaluated the performance of the Extended Modified Firefly Algorithm (EMFA) against three baseline models: the Independent Cascade (IC), the Particle Swarm Optimization (PSO), and the original Modified Firefly Algorithm (MFA). The models were tested across two datasets (Twitter and Reddit) using three standard metrics:

Prediction Accuracy (F1-Score)
Cascade Size Error (CSE)
Diffusion Root Mean Square Error (dRMSE)

Table 1 presents the three metrics:

These results show that EMFA significantly improves the predictive performance and realism of simulated cascades across platforms. The enhancement is consistent and robust, particularly under dynamic engagement and temporal variance scenarios.

Diffusion Pattern Visualization

To qualitatively assess the realism of the simulated diffusion, we visualized the cascades generated by EMFA and other models for a high-impact tweet and a Reddit post. Figure 1. Visualization of diffusion trees for the same post using MFA and EMFA. EMFA exhibits more realistic branching and temporal density, aligning closely with actual observed cascades.

**Figure 1. Comparison of diffusion trees.**

Feature Sensitivity Analysis

To understand the contribution of each added dimension, we conducted an ablation study where the EMFA was tested with each feature (content, engagement, time, social) removed in turn. Table 2. Sensitivity of EMFA to each individual feature class. Social and temporal information contribute the most to diffusion accuracy.

Platform Generalizability

We tested EMFA across different content categories (news, memes, opinion threads) and platforms (Twitter, Reddit), confirming that the model maintains strong performance despite structural and semantic differences in the networks.

Case Study 2: Political Discourse Propagation on Reddit

To further assess the adaptability of the EMFA model, we examined a political content cascade on Reddit. The chosen post, published on the r/politics subreddit during a national election period, presented a controversial opinion regarding campaign funding transparency. It sparked intense engagement, including thousands of upvotes, comments, and cross-posts to other subreddits.

Data Acquisition and Feature Mapping

Using the Reddit API (PRAW), we extracted:

Original post and comment threads
User metadata (karma, posting frequency, subreddit activity)
Interaction types (upvotes, downvotes, comment depth)
Content features (textual sentiment, controversy score)

These were normalized and encoded for integration into the EMFA framework:

Cascade Modeling on Reddit

Reddit’s tree-structured discussion format required adapting the EMFA’s spatial modeling. Each node (comment or post) was treated as a potential “information carrier,” with firefly movement simulated based on content relevance and engagement affinity. Figure 2. Actual vs. simulated Reddit thread trees using EMFA. The model successfully replicated the nested depth and engagement intensity around polarizing comments.

**Figure 2. Actual vs. simulated Reddit thread trees.**

Performance Comparison

Figure 3. EMFA achieved higher alignment with Reddit’s actual user flow and comment emphasis, indicating its versatility in hierarchical platforms.

**Figure 3. EMFA alignment with reddit’s.**

Feature Sensitivity Analysis

In contrast to Twitter, where temporal features were more dominant, Reddit propagation was more influenced by:

Engagement polarity (i.e., the presence of both upvotes and downvotes, signaling controversy)
Social positioning of users (karma, posting history)
Thread entropy (variability in comment sentiments)

This shows that platform architecture significantly modulates which features are most impactful, a dynamic that is effectively captured by EMFA.

Practical Implications

By accurately modeling Reddit thread evolution, EMFA can be used to:

Forecast thread virality
Detect potential misinformation or polarizing discourse early
Identify influential users in subreddit dynamics

DISCUSSION

The findings from our experimental and case-based evaluations reveal that the Extended Modified Firefly Algorithm (EMFA) significantly enhances the modeling of information diffusion in online social networks (OSNs). By incorporating four critical dimensions—content type, engagement metrics, temporal dynamics, and user social attributes—the EMFA delivers a more realistic and adaptive simulation of how information propagates across diverse platforms.

Comparative Analysis with Recent Studies

Our results align with recent research that emphasizes the necessity of multi-dimensional modeling for capturing real-world diffusion dynamics. For example:

Zhang et al. [10] demonstrated that integrating topic semantics and content type into diffusion models improves virality prediction, especially on platforms such as Reddit and TikTok.
Xu et al. [13] highlighted the temporal sensitivity of viral content, showing that early momentum plays a decisive role in shaping information cascades—this is consistent with our findings in the Twitter case study.
Zhang et al. [9] introduced a hybrid swarm intelligence model that accounts for user influence scores but did not address temporal or content-based adaptation, limiting their model’s generalizability across platforms.
Chi-I H et al. [14] investigated the role of engagement patterns in viral diffusion but relied on static social features, whereas EMFA adapts dynamically based on user behavior and time-series variations.

These comparisons underline how EMFA builds upon and extends current research by offering an integrated and adaptive framework that responds to both user and platform contexts in real time.

Case Study Comparison and Implications

Table 3 summarizes the key differences observed between the Twitter and Reddit case studies. The EMFA was able to flexibly adapt to platform-specific characteristics—broad, flat cascades on Twitter and deep, threaded discussions on Reddit—demonstrating robustness across structurally distinct networks.

The model’s sensitivity analysis revealed that temporal and social features dominate in broadcast-centric platforms, while engagement and semantic variability are more critical in discussion-based platforms. These insights suggest that one-size-fits-all diffusion models are inadequate for today’s diverse and evolving digital ecosystems.

Theoretical Contributions and Practical Value

By integrating behavioral, structural, and contextual features, the EMFA contributes to a growing class of hybrid diffusion models that combine bio-inspired computation with social theory. Unlike prior models, which are often rigid and hard-coded, the EMFA learns from the environment and adjusts its influence-matching heuristics, making it suitable for tasks such as:

Real-time viral content prediction
Campaign optimization and seeding strategies
Early warning systems for misinformation or disinformation trends

LIMITATIONS AND FUTURE DIRECTIONS

Although the EMFA shows promising generalizability, limitations exist. Notably:

Sentiment dynamics and emotional tone were not modeled explicitly, despite their known impact on content virality.
The static nature of the underlying social graph may overlook structural changes such as community migration or influencer emergence.

Future work should explore temporal graph evolution, multimodal content modeling, and real-time feedback mechanisms, possibly through reinforcement learning frameworks. Cross-platform transfer learning could also enhance EMFA’s applicability in hybrid environments.

CONCLUSIONS AND RECOMMENDATIONS

This study presents an enhanced computational model for simulating information diffusion in online social networks (OSNs), integrating four critical dimensions: content type, engagement level, temporal dynamics, and user social attributes. By extending the Modified Firefly Algorithm (MFA) into a semantically and socially aware framework (EMFA), we significantly improved the realism and accuracy of diffusion modeling across platforms such as Twitter and Reddit. The experimental results demonstrate that incorporating contextual and behavioral factors enables the model to better capture real-world diffusion dynamics, outperforming baseline metaheuristic algorithms. The model also exhibited high adaptability to platform-specific characteristics, suggesting its potential for generalization across various social media ecosystems.

RECOMMENDATIONS FOR FUTURE WORK

Platform Expansion: Future studies should test the model on additional OSNs such as TikTok or LinkedIn to assess adaptability across different user interaction paradigms and content modalities.
Real-Time Prediction: Integrating real-time data streams could transform EMFA into a predictive engine capable of early warning for viral misinformation or emerging trends.
Explainability Enhancement: While the current model improves accuracy, adding explainable AI (XAI) components could aid in interpreting how and why certain features drive diffusion, especially in sensitive applications such as public health or crisis response.
Integration with Intervention Strategies: The EMFA framework could be extended to simulate and evaluate the effectiveness of interventions (e.g., fact-checking prompts, content throttling) in slowing down the spread of harmful or false information.

In conclusion, the proposed EMFA model offers a flexible, extensible, and accurate framework for studying digital information dynamics, supporting both theoretical advancement and practical applications in network science, marketing, and information integrity.

References :

Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science. 2018;359(6380):1146–51.
Shu K, Sliva A, Wang S, Tang J, Liu H. Combating disinformation in a social media age. Wiley Interdiscip Rev Data Min Knowl Discov. 2020;10(6):e1385.
Borge-Holthoefer J, et al. Modeling information diffusion in social media. Nat Commun. 2022;13(1):1–13.
Kempe D, Kleinberg J, Tardos É. Maximizing the spread of influence through a social network. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 137–46.
Chen W, Wang Y, Yang S. Efficient influence maximization in social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2009. p. 199–208.
Parimi P, Rout RR. FLACORM: fuzzy logic and ant colony optimization for rumor mitigation through stance prediction in online social networks. Soc Netw Anal Min. 2023;13(1):1–13. doi:10.1007/s13278-022-01022-3.
Dorigo M, Stützle T. Ant Colony Optimization. Cambridge, MA: MIT Press; 2004.
Yang X-S. Firefly algorithm, stochastic test functions and design optimisation. Int J Bio-Inspired Comput. 2010;2(2):78–84.
Hsu C-I, Wu SPJ, Chiu C. A Hybrid Swarm Intelligence Approach for Blog Success Prediction. Int J Comput Intell Syst. 2019;12(2):571–9. doi:10.2991/ijcis.d.190423.001.
Zhang QW, Zhang QH. Hybrid optimization algorithm for analysis of influence propagation in social network. J Comput. 2022;33(4):107–19. doi:10.53106/199115992022083304009.
Huang W, Liu Y, Zhang X. Hybrid Particle Swarm Optimization Algorithm Based on the Theory of Reinforcement Learning in Psychology. Systems. 2023;11(2):83. doi:10.3390/systems11020083.
Alloush A, Rebdawi G, Abou Trab MS. Modeling information diffusion in online social networks using a modified firefly algorithm. J Inf Organ Sci (Online). 2024 Dec;48(2).
Xu Z, Qian M. Predicting Popularity of Viral Content in Social Media through a Temporal-Spatial Cascade Convolutional Learning Framework. Mathematics. 2023;11(14):3059. doi:10.3390/math11143059.
Chi-I H, Wu SPJ, Chiu C. A Hybrid Swarm Intelligence Approach for Blog Success Prediction. Int J Comput Intell Syst. 2019;12(2):571–9. doi:10.2991/ijcis.d.190423.001.

(ISSN - Online)

2959-8591