The ability to collect and store information about individuals and their actions and habits is easier than ever before. Advances in information technology make the storage, cataloging, and use of such information trivial. Many educational institutions have stored both paper and electronic data about individuals, either through the direct collection of such data for organizational purposes or data stored as a result of the provision of services to individuals. Due to privacy concerns, oftentimes such data must be de-identified or anonymized before it is used or studied.
Educational institutions may have a number of reasons for using de-identified data for business, academic, or operational functions. For instance, data can be made available for institutional use, without identifying the underlying data subjects, for research purposes, institutional effectiveness studies, performance and operational studies, information technology security and operational reviews, and for public health purposes. Other uses of de-identified data may require the ability to retain unique identifiers for individuals in the data set, without identifying the actual identity of the individuals. For example, a researcher may need to know that certain actions were all taken by the same individual, in order to form conclusions about how individuals use the data or service. A web site designer may want to determine how long individuals stay on the site, or how individuals traverse the site in order to find the information sought. Systems development, test, and training environments may require the use of data that simulates real production data, while not actually consisting of real data elements such as Social Security numbers. In such cases, de-identification processes are complicated by the need to replace unique identifiers such as Social Security numbers or IP numbers with alternate unique identifiers that cannot be used to identify the actual individual.
Special challenges with logs, network traffic, web traffic, etc.
The challenge in assuring that data is fully de-identified or anonymized is compounded when attempting to de-identify huge sets of systems operations data in unstructured formats. There are no search terms that can be reliably used to find and remove all potential instances of personally identifiable data (for example, names and addresses). Anonymizing tcpdump packet captures is extremely difficult to do because the packet contents reveal a great deal of information about the users. In flow dumps, even if address information is anonymized, traffic and pattern analysis would allow analysis that may be personally identifiable. In addition, there is currently a debate as to whether the IP Address, when it appears in log or traffic data, constitutes personally identifiable data. Some have chosen to truncate the last one or two octets of the IP address in order to avoid that debate; however, others believe this truncation is still not de-identified enough.
“Consumers will be most shocked to learn that companies are instantaneously combining the details of their online lives with information from previously unconnected offline databases without their knowledge, let alone consent,” said Ed Mierzwinski, U.S. PIRG consumer program director. “In just the last few years, a growing and barely regulated network of sellers and marketers has gained massive information advantages over consumers.”
Internet2, Interim IPv6 Netflow Anonymization Policy, Version 1.0
The Top 10 Online Privacy Threats
http://www.mywot.com/en/blogs
The Internet and other digital media have transformed global communications, commerce and communities. We have always-on, always-everywhere connectivity via computers, cell phones and other devices. But concerns over how accountable and responsive this new media society will be to its ‘netizens’ are constantly being raised. We take a look at ten of the hot issues in online privacy today and give you some tools to protect your privacy.
Behavioral targeting
Learning about your purchase patterns and activities on the Web is essentially what behavioral marketing is about. Your Internet service provider (ISP) has access to where you go online and combined with third party cookies gathering information about your behavior, marketers have effective means to serve you targeted.
Whether real or imaginary, the loss of privacy disturbs people. If the advertising industry would give its customers choice, control and transparency into its tracking and profiling practices they would go a long way in gaining consumer trust.
Cloud computing
Cloud computing refers to the concept of huge data centers operating in a networked infrastructure collectively known as “the cloud.” In other words, stuff is stored on someone else’s server accessed via the Internet. When you store your data with programs hosted on someone else's hardware, for instance your email and calendar on Gmail; your photos on Flickr; your online computer backup on Mozy; your health records on Microsoft’s HealthVault, the responsibility for protecting that information from hackers and internal data breaches falls into the hands of the hosting company rather than the individual user.
Cookies
Cookies are online files that can be used for authenticating, session tracking and to set your preferences or shopping cart contents. For example, the information in a cookie might be a login ID for your online email account so you don't have to login to each page. Some cookies are temporary and some may stay on the hard drive and be used when visiting the site again. What people object to are third party cookies used to gather consumer behavior for marketing analysis. The purpose is to profile users as they move around on various sites and deliver precise, personalized advertising as they surf the Internet. Even if anonymous, these profiles have been the subject of privacy concerns.
Privacy advocates suggest that consumers be alerted to how they are profiled and targeted with a full and fair description of all marketing practices, so they can control what their consumer experience will be. Browsers such as Mozilla Firefox, Internet Explorer and Opera block third party cookies if requested by the user.
Online Shopping
Many consumers are reluctant to make purchases on the Internet out of fear that the personal information they provide will be misused or compromised. Well written privacy policies are essential to inform the consumer, in plain language, what will be done with their information and whether any information provided will be disclosed to third parties. Another way for a website to convey integrity and trust to users is to have the site certified by an organization that provides a seal of approval.
Phishing
Cybercriminals use “phishing,” or e-mail scams, to bait people with legitimate looking requests from what appear to be reliable sources. Banks and other financial institutions, news outlets and stores are the most usual organizations to be used in this deceit. Bad guys use sneaky social engineering with the aim to collect personal information - social security numbers, passwords and pin numbers - that can be used to access bank and credit card accounts, resulting in stolen funds and identity theft.
Besides the risks of spam, phishing doesn't necessarily harm your computer, but it can do a lot of damage if it results in identity theft. Do not give sensitive information to anyone—on the phone, in person or through email—unless you are sure that they are who they claim to be and that they should have access to the information. Phishing cases should be handled seriously and reported to local police. You can also file a report with the Anti-Phishing Working Group
Photo and video sharing
Digital cameras and camera phone applications that can upload photos or video content directly to the web, make publishing of personal content increasingly easy. Privacy advocates are concerned because much of a user’s personal life and social environment are revealed in these multimedia collections. Integrating photo sharing within social networking communities has also provided the opportunity for tagging, annotating and linking images to the identities of the people in them. The persistence of multimedia can be problematic. Researchers found that nearly half of the social networking sites don't immediately delete pictures
when a user requests they be removed. Even after you think you have deleted a photo you can still find it in Google's caching system which is remarkably efficient at archiving copies of web content, long after it's removed from the web.
Social networking
Participation in social networking sites has increased dramatically recently. Services such as Facebook, Twitter or Friendster have millions of members with online profiles sharing personal and sensitive information freely and publicly with vast networks of friends – and an unknown number of strangers. Risks range from identity theft, online or physical stalking to embarrassment and blackmailing.
Spyware and adware
Applications which alter your computer’s settings, such as your browser's home page, cause annoying pop-ups and insert advertisements into web pages are known as spyware or adware. These apps can be programs, cookies or registry entries that secretly gather information about your online activity. This class of advertising methods is considered unethical and perhaps even illegal. Often these applications are disguised as a simple service like a search bar, and some of them can be malicious, open dangerous security holes and lead to frequent crashes or hangs.
Web bugs
A web bug is a tiny graphic on a webpage or email message that monitors the user who is reading the page or email. Advertising networks use web bugs to collect information enabling them to build a profile around your tendencies and interests. This profile is then identified by ad network cookies which track your movements and behavior across sites.
Web browsing history
Search engines gather detailed information including your entire search history and browsing habits, as well as the time, date and location of the computer submitting the search, known as the Internet Protocol (IP) addresses. This data can be personally identifiable or can be made personally identifiable. Information collected is used for marketing and consumer profiling purposes to achieve precise targeted advertising: to get the proper advertisements to the proper users. It is also used by search engines to carry out research and generate statistical usage data.
No comments:
Post a Comment