$ 1.2 million NSF grant to create a search engine for online privacy research


UNIVERSITY PARK, Pa .– A team of researchers led by Penn State recently received a $ 1.2 million grant from the National Science Foundation (NSF) to create a search engine and other resources that can make the web better. safe for users by helping scientists navigate billions of documents online to more effectively collect and classify privacy-related documentation.

The search engine – called PrivaSeer – will use a type of artificial intelligence (AI), called natural language processing – or NLP – to help researchers collect, review and analyze privacy documents, including privacy policies, terms of use, cookie policies, privacy bills and laws, regulatory guidelines and other related texts on the web.

NLP combines linguistics, computing and AI to program computers that can better process and analyze large amounts of data in natural language.

Ultimately, the search engine could help researchers better understand online privacy and online privacy trends, while also helping users browse the web more safely and securely, according to Shomir wilson, assistant professor of information science and technology, Penn State and Institute of Computer Science and Data affiliate.

“Privacy policies are documents we come across in our day-to-day lives when we visit websites and, in theory, we are supposed to read them,” Wilson said. “But in practice, few people do. It is impractical and it does not match the way people use the internet. People often don’t have the legal knowledge to understand these documents either.

Wilson, who is the principal principal investigator (PI) of the project, said the search engine is necessary because although there is a lot of material on organizations’ privacy and data practices available on the web, researchers are faced with a major challenge to identify and collect these documents. According to the researchers, the current way of collecting this information requires scientists to perform careful manual research.

“There has been previous work on privacy policies, but one thing researchers have come across is that there is a lack of good data on these policies,” Wilson said.

The search engine can also offer information on how policies change and help users navigate the complex area of ​​online privacy, according to C. Lee Giles, Professor David Reese of Information Science and Technology, Penn State, and a co-PI of the project.

“One of the reasons for having a privacy policy search engine is that you can get a feel for how different companies treat their users’ privacy now and over time,” said Giles, who is also a partner of ICDS. “It can also let users know how they want to react to these businesses. “

The researchers said PrivaSeer will also advance NLP techniques for large-scale interpretation of these privacy documents. This technology will help scientists analyze the state of privacy on an unprecedented scale.

Creating the search engine poses several challenges for the team, according to Giles.

“One of the challenges of building a privacy policy search engine is crawling the web for those pages,” Giles said. “There is no URL list for this. Are we trying a URL – for example, “https://company.com/privacy.html” – or something different? After the page is returned, how do we know it is a privacy page? “

In addition to the search engine, the team also plans to develop corpora – large sets of text data – and application programming interfaces, or APIs.

Other IPs also include Florian Schaub, Assistant Professor of Information, Electrical Engineering and Computer Science, University of Michigan, and Gabriela Zanfir-Fortuna, Director of Global Privacy at the Future of Privacy Forum.

Source link


Leave A Reply