Help with my home work
This assignment helps you practice with “joins” using an email dataset. Only “inner joins” will be needed. Connect to a different database: c cinf201_enron
Background and disclaimer
Enron Corporation, once an energy, commodities, and services company, is infamous for having engaged in corporate fraud and corruption that resulted in its bankruptcy in late 2001. Enron employed 20,000 people and claimed revenues of $111 billion at the end of 2000 (https://en.wikipedia.org/wiki/Enrons ource (Links to an external site.)). The scandal was so disastrous that a federal law known as the Sarbanes-Oxley Act (https://en.wikipedia.org/wiki/Sarbanes–Oxl….) of 2002 was passed, which sets accounting standards for U.S. public companies.
During the investigation of Enron’s accounting practices by the Federal Energy Regulatory Commission (FERC), over 600,000 emails generated by 158 employees were collected. When the investigation ended, FERC deemed the emails to be in the public domain and may be used for historical research and academic purposes. The email database has been available on the web for more than a decade from Carnegie Mellon University (https://www.cs.cmu.edu/~./enron/). Over the years it has been reviewed and various emails have been removed in response to privacy concerns.
Carnegie Mellon’s site has an important disclaimer:
I am distributing this dataset as a resource for researchers who are interested in improving current email tools, or understanding how email is currently used. This data is valuable; to my knowledge it is the only substantial collection of “real” email that is public. The reason other datasets are not public is because of privacy concerns. In using this dataset, please be sensitive to the privacy of the people involved (and remember that many of these people were certainly not involved in any of the actions which precipitated the investigation.)Why are we using this dataset?
I believe coursework should be as realistic as possible in order to model the environment that students will face outside of school. In our case, that means using real datasets and answering meaningful questions about those data. Sometimes, the data that we are asked to analyze in the real world is confidential and possibly even embarrassing to certain individuals. The Enron email dataset includes confidential emails that were never intended to be seen by the general public. During the course of this assignment, I expect all students to exhibit maturity and respect for the individuals represented in these emails.
Tasks
All tasks should be performed with a single query.
Show the message date and subject for all emails sent by one of Enron’s CEOs, Jeffrey Skilling (email [email protected]); order by date (earliest first).
Show the email address and recipient type (TO/CC/BCC) for recipients of the message with subject ‘FERC Issues’; do not apply any particular sort to the results.
Show the date and subject, and sender email of the 10 most recent emails sent by anyone with the last name “Williams”.
Show the date, subject, sender email, and all TO recipients of all emails that have a subject with the word “bankruptcy”. Limit results to messages sent between November 1, 2001 and the end of the year (include Nov 1, but do not include Dec 31), and sent from emails that end with ‘@enron.com’. Order by date, newest last.
Show a sorted distinct list of people (firstname, lastname) who ever sent a BCC email to a CEO, President, or Vice President.