Seleção adaptativa de proxies com amostragem de Thompson e métodos Bayesianos
Carregando...
Data
Autores
Título da Revista
ISSN da Revista
Título de Volume
Editor
Universidade Federal de Goiás
Resumo
This study investigated strategies for proxy selection in automated data capture systems,
comparing traditional approaches with adaptive Bayesian strategies. The main goal was
to evaluate the operational efficiency, stability, and adaptive capacity of different selection
algorithms in both controlled and real environments. The methodology involved controlled
simulations in four distinct scenarios (intermittent proxies, blocked proxies, permanently
failed proxies, and heterogeneous proxies) and experimental validation in a real operational
environment with 10 different robots performing public data capture from various domains
over one week, processing 549,114 requests. Seven strategies were evaluated: four Bayesian
(Beta, Gamma, Normal, Chi-Square), one deterministic (Exponential Backoff), and two
basic (Round Robin and Random). The simulation results demonstrated the consistent
superiority of Bayesian strategies, with the Beta distribution achieving success rates above
99% in critical scenarios and maintaining leadership in the real environment with an
average rate of 76.00%. The stability analysis revealed significantly lower coefficients of
variation for Bayesian strategies (0.191–0.334) compared to the basic ones (0.498–0.668).
The temporal analysis showed that Bayesian strategies wasted 2.5 times fewer resources
than basic approaches, demonstrating superior operational efficiency. The Beta distribution
stood out for its exceptional ability to differentiate between resources and adapt over
time, as evidenced by the detailed analysis of probability distributions. Beyond direct
applications in data capture, the developed techniques show significant potential for
adaptive anti-scraping systems, where the ability to identify suspicious behavioral patterns
and dynamically adapt to evasion techniques can enhance protection mechanisms against
automated activities that violate web resource usage policies. It is concluded that Bayesian
strategies, particularly the Beta distribution, provide significant operational advantages
for data capture systems and transformative potential for the development of adaptive
countermeasures in web protection.