(L-R): Doctoral student Zhewei Wang, assistant professor Marcos Vasconcelos, Ph.D., and postdoctoral scholar Amit Dutta, Ph.D., pose with robots in Vasconcelos's lab in the Center for Advanced Power Systems in the FSU Research Foundation Building A in Tallahassee, Florida. Top-K algorithms have many applications, including in multi-robot systems, where the controller can communicate with only a small subset of agents. The MINDS lab uses these principles to design robots and AI agents to work on complex problems with minimal communication. (Scott Holstein/FAMU-FSU College of Engineering)
Key Points
- A FAMU-FSU College of Engineering researcher has developed a novel algorithm that identifies the most valuable data in large distributed networks without exposing private information.
- Marcos Vasconcelos, Ph.D., an assistant professor of electrical and computer engineering, created the method using smoothed quantile estimation: a technique that speeds up data ranking by up to 10 times compared with previous approaches.
- The research, published in IEEE Transactions on Automatic Control, addresses a core challenge in modern networked systems: how to surface critical information efficiently while keeping individual data points confidential.
- The work was supported by the Commonwealth Cyber Initiative at Virginia Tech and by the FAMU-FSU College of Engineering, and has potential applications in machine learning, smart infrastructure and disaster-response networks.
A researcher at the FAMU-FSU College of Engineering is helping solve a fundamental problem in today’s data-driven world: how to quickly and securely identify the most valuable information inside vast, interconnected networks without exposing private data or wasting energy.
Marcos Vasconcelos, Ph.D., an assistant professor in the Department of Electrical and Computer Engineering at Florida State University and the joint college, has developed a new algorithm that could reshape how information is ranked and privacy protected in digital networks.
What Is the Challenge of Ranking Private Data Across a Network?
His approach produces results faster, helps preserve the privacy of individual data points and cuts energy use in networked systems. The algorithm is broadly applicable across any setting where distributed data needs to be ranked without being fully disclosed.
The work was published in IEEE Transactions on Automatic Control.
Why Is Privacy-Preserving Data Ranking So Difficult?
Vasconcelos uses a clear example to frame the problem: “Imagine a room full of people, each holding a private number—maybe a salary, a health statistic or a test score—something you’d prefer to keep confidential. Now suppose you want to find out who has the highest score without revealing anyone’s private information. How do you do that?”
“We developed an algorithm that solves this challenge,” Vasconcelos said. “It uses a distributed top-k solution, which is useful in fields ranging from machine learning and signal processing to control systems.”
Top-k algorithms identify the most significant elements, such as the highest, lowest or most frequent values, from a dataset without fully sorting or disclosing all the underlying data. Traditional methods often require participants to share their values with a central authority or with one another, which raises serious concerns about privacy and scalability.
“Because we can streamline the sorting process, our algorithm operates up to 10 times faster than previous methods,” Vasconcelos said. “Its applications go far beyond this simple example: from machine learning and smart infrastructure to signal processing, distributed top-k algorithms are now ubiquitous in modern engineering.”
How Does the New Algorithm Work?
Vasconcelos and his collaborator drew on quantile regression and advanced smoothing techniques, both of which offer distinct advantages for distributed data ranking.
Unlike standard regression, which predicts an average outcome, quantile regression can reveal outcomes across an entire distribution—from the highest and lowest values to any point in between. These methods filter out noise and surface the most important patterns in complex datasets, enabling identification of key information without sacrificing privacy.
“I realized that quantile regression could be distributed across multiple computers,” Vasconcelos said. “By carefully choosing the quantile, we can solve the top-K problem for any K using distributed optimization—a method pioneered in the 1980s by control theorists John Tsitsiklis and Dimitri Bertsekas at MIT. In this approach, multiple processors work together to solve challenging problems, all while preserving the privacy of local data.”
The original version of the algorithm converged slowly because it relied on what is known as the “pinball loss” function, which created optimization challenges. Working with Xu Zhang, a professor at Xidian University, Vasconcelos applied a signal-processing technique called convolution to smooth the objective function and accelerate results, making the algorithm up to 10 times faster than traditional methods.
“Since no one had developed top-K algorithms using this technique, we were forging a new path,” Vasconcelos said. “A major advantage is that our method identifies the top-K data without revealing the actual numbers. Participants only indicate if their value is above or below a threshold, so the top-K can be found without disclosing individual results.”
By averaging and smoothing data points, the approach produces a more reliable and interpretable representation of the information landscape. The mathematics underlying the method is complex, but the practical outcome is straightforward: the algorithm converges quickly, scales to large networks and is well-suited for the volume and distribution of data in modern systems.
What Are the Real-World Applications of This Research?
This work could benefit systems that must act on the most urgent or relevant information under tight constraints. Disaster-response networks, for example, could use the algorithm to surface only the most critical signals—without broadcasting sensitive details—while digital platforms could apply it to make smarter decisions while protecting personal data.
“While our research is primarily theoretical, we have shown that it can be used for real-world applications,” Vasconcelos said. “Looking ahead, I am excited to explore projects where humans and AI agents can work together even more effectively.”
The research also fits within a broader shift in the field. Privacy-enhancing technologies, including differential privacy, homomorphic encryption and secure multi-party computation, have become increasingly important tools across industries handling sensitive data at scale. The algorithm developed by Vasconcelos addresses a complementary need: fast, scalable ranking in distributed systems without centralized data exposure.
This work was supported by the Commonwealth Cyber Initiative at Virginia Tech and by the FAMU-FSU College of Engineering.
Frequently Asked Questions
What is a privacy-preserving algorithm and why does it matter?
A privacy-preserving algorithm is a computational method that analyzes or ranks data without exposing the underlying individual values. In networked systems where many devices or users hold sensitive information, such as health statistics, financial scores or sensor readings, these algorithms allow important decisions to be made without requiring anyone to share private data with a central server or with other participants. As data volumes grow and privacy regulations tighten, this type of approach is increasingly important in engineering and computing.
What is a distributed top-k algorithm?
A distributed top-k algorithm identifies the k most significant values—such as the highest scores, the most urgent sensor readings or the most relevant data points—from a dataset spread across many connected devices or agents. Unlike centralized approaches, which require all data to flow to a single location, distributed top-k algorithms allow each participant to keep their data local. Only minimal information, such as whether a value falls above or below a threshold, is shared. The result is a ranked selection without full data disclosure.
How does Marcos Vasconcelos’s algorithm differ from previous approaches?
Previous distributed top-k algorithms relied on an optimization function called the pinball loss, which converges slowly and scales poorly with network size. Vasconcelos and collaborator Xu Zhang of Xidian University applied convolution-based smoothing techniques drawn from signal processing to replace this function. The smoothed version of the problem converges much more quickly—empirically up to 10 times faster than earlier methods, while maintaining the same privacy protections and accuracy.
What is quantile regression and how is it used here?
Quantile regression is a statistical method that, rather than predicting an average outcome, can estimate outcomes at any point in a distribution—including the highest and lowest values. Vasconcelos’s research shows that this technique can be reformulated as a distributed optimization problem, allowing a network of computers to jointly identify top-k data points by converging on an optimal threshold value. The approach draws on distributed optimization methods developed in the 1980s by MIT researchers John Tsitsiklis and Dimitri Bertsekas.
Where was this research published and who funded it?
The research was published in IEEE Transactions on Automatic Control, one of the leading peer-reviewed journals in the field of control systems and automation, published by the Institute of Electrical and Electronics Engineers (IEEE). The work was supported by the Commonwealth Cyber Initiative at Virginia Tech and by the FAMU-FSU College of Engineering. The FAMU-FSU College of Engineering is the joint engineering school of Florida A&M University and Florida State University, located in Tallahassee, Florida.
What are the practical applications of this research?
The algorithm has broad potential applications wherever distributed data must be ranked without exposing private values. These include disaster-response networks that need to surface only the most urgent sensor signals, machine learning pipelines that select the most informative training data from distributed sources, and smart infrastructure systems (such as power grids or transportation networks) that must make real-time decisions based on data collected across many nodes. The researchers note that the work is primarily theoretical at this stage, though they have demonstrated its applicability to real-world scenarios.
Editor’s Note: This article was edited with a custom prompt for Claude Sonnet 4.6, an AI assistant created by Anthropic. The AI optimized the article for SEO/GEO discoverability, improved clarity, structure and readability while preserving the original reporting and factual content. All information and viewpoints remain those of the author and publication. This article was edited and fact-checked by college staff before being published. This disclosure is part of our commitment to transparency in our editorial process. Last edited: 05/18/2026.
RELATED ARTICLES
Engineering Professor Will Improve Electric Grid Cybersecurity with $2.9M Department of Energy Award
RIDER Center Partners in $28.7M Grid Resilience Project
Engineering Team Wins U.S. DOT Stage 1A Intersection Safety Challenge Award
