Doing research is about asking questions, and researchers at the University of Rochester ask a lot of questions. For example, can artificial intelligence help create sustainable energy sources? Scientists at Rochester’s Laboratory for Laser Energetics are part of a team that thinks machine learning models will help design high-performing implosions that move us closer to creating fusion energy sources.
But research isn’t the sole domain of Rochester’s faculty. Each year, undergraduate and graduate students are engaged in research projects across disciplines—often without funding. That’s why the River Campus Libraries (RCL) launched a Data Grant Program, which enables undergraduate and graduate students in Arts, Sciences & Engineering, the Warner School of Education and Human Development, and Simon Business School to purchase the data sets they need to complete their research.
On September 12, the spring 2023 recipients presented their projects in Rush Rhees Library's Evans Lam Square. Below are summaries of their research.
Commercial VPNs and privacy
A virtual private network (VPN) creates a secure, encrypted connection between your device and a remote server operated by the VPN provider. This means your IP address is hidden, your location is untraceable, and any data you’re receiving or transmitting is unreadable.
But do VPNs actually work? More than 1.5 billion users around the globe would say “Yes.” Still, they’re not impenetrable. Computer science major Steven Oufan Hai '24 had his doubts about their effectiveness against an easily deployed tactic that does not require extensive computing resources.
Hai seeks to find how well commercial for-profit VPN services protect consumers against fingerprinting attacks—a technique that aims to figure out what websites someone is visiting by analyzing internet traffic patterns—by making himself the attacker.
Using the funds from his data set grant, he purchased subscriptions to 10 popular commercial VPNs and mini-PCs for data collection. He then conducted fingerprinting attacks as he visited the top 43 most-visited websites. Using simple models with little configuration, he succeeded in achieving prediction accuracies of roughly 26, 64, and 98 percent for three models.
With his findings, Hai aims to encourage VPN services to consider implementing fingerprinting countermeasures. He’s now in the process of creating a new dataset in an open-world setting that closely mimics real-world browsing.
“This research project would not be possible without the data set grant,” Hai says. “I would need to revise my research questions and methodology completely, and I would have to focus on free privacy-enhancing technologies and adopt methodologies that do not require extensive data collection.”
Improvements in knowledge distillation
Let’s say you were to encounter an alien. You blurt out “I come in peace.” The alien asks, “What is peace?” You explain that you don’t intend to hurt it. But that prompts a question about what “hurt” means. So, you explain to hurt something is to cause it pain. The alien asks, “What is pain?”
This extraterrestrial conversation is a rough example of frame semantics, a linguistic theory that says to understand a word, one must understand all related words. And frame semantics play a big role in how Alexander Martin '24 aims to improve how we distill knowledge from large amounts of unstructured text.
Martin’s project tackles the understanding of events in text. For large language model-based chatbots, such as ChatGPT, understanding events is a critical task in natural language processing. But current approaches to this are focused on sentence-level tasks, while Martin is working at the document-level.
Using a FrameNet ontology—a group of lexical databases based on frame semantics—Martin is developing a method for document-level event extraction and a report generation process. The model he’s creating through his work is poised to help easily summarize information on any topic and spend less time searching for information they need—like if your house was flooded during a hurricane.
“First, you might look to find all the event information about floods that were caused by the hurricane,” Martin says. “In terms of the model, I might say ‘summarize any flooding events.’ However, there might be a large number of floods caused by the hurricane. So, you might want to say ‘summarize any flooding events in Atlanta,’ and then the model would only extract information about floods in that location.”
Martin’s work is being prepared for publication in October.
Labor market vs. nurse staffing regulations
A recent study released by the National Council of State Boards of Nursing found that about 100,000 registered nurses have left the workforce due to burnout and stress, and more than 600,000 intend to leave within the next five years. And it’s not just the fault of COVID-19. The fact is ongoing shortages, poor working conditions, and the general demands of the job have had nurses feeling this fatigue for decades.
Nurses leaving in droves burden those left behind and put patient outcomes at risk. That’s why, in 1999, California became the first state to establish a minimum registered nurse-to-patient ratio for acute-care, acute-psychiatric, and specialty hospitals. It’s also why Massachusetts implemented the ICU Nurse Staffing Law, establishing patient assignment limits to RNs in intensive care units in acute hospitals in 2014. And it’s why economics students Woosuk Choi, a fifth-year PhD candidate, and So Yong Kim, a third-year PhD candidate, decided to investigate to put nurse staffing regulations under the microscope.
Using nationwide hospital-level data on financial reports and staffing, Choi and Kim are studying how policies like those implemented by California and Massachusetts—and now being considered by New York and Oregon—affect nurse wages and employment. They’re also looking into the unintended consequences, such as changes to the labor composition and shifts in capital investment. Here’s what their analysis of regulation implementation has shown:
- The hospitals that had fewer nurses per patient hired more nurses. These hospitals increased the number of hours nurses worked.
- On average, nurses’ wages decreased.
- Other non-medical, skilled workers saw a reduction in their working hours by approximately six percent. Meanwhile, unskilled workers experienced a three percent decrease in their wages.
- Hospitals invested more in building construction and equipment acquisition.
“The library grant has been immensely valuable to our research,” Choi says. “As far as we know, the American Hospital Association Annual Survey data is the only nationwide dataset that provides historical records and allows for serial cross-sectional studies each year. Without this grant, we would have had to use other publicly available data sources.”
Kim adds that the American Hospital Association data has a significant advantage over alternatives, such as the National Sample Survey of Registered Nurses (NSSRN) or financial reports from California's Department of Health Care Access and Information. “The NSSRN lacks hospital-specific information,” she says, “and the California financial reports would not have allowed us to conduct cross-state analyses. Therefore, the grant has played a crucial role in enabling us to access the most comprehensive and relevant data for our research.” ∎
Image by Jan Alexander from Pixabay. For more student research, take a look at the fall 2022 grant recipients. For questions about data purchasing or the data set grant program, please contact Kathy Wu, social science librarian for business, economics, government information, and law. And if you are interested in supporting the grant program, please contact Pamela Jackson, senior director of advancement for the River Campus Libraries.
Enjoy reading about the University of Rochester Libraries? Subscribe to Tower Talk.