Algorithms for Modern Data Sets (COMP 3801)
Weekly Schedule
Assignments
Announcements
Instructor: Anil
Maheshwari
Office: HP 5125B (None of us can access it!)
E-mail: anil@scs.carleton.ca
Lectures: Some prerecorded
videos and reference material will be posted in cuLearn and its
only meant for students in the course. Please do not share
with anybody. I have already posted prerecorded lectures for
Weeks 1-6. After the Fall break the lectures will be "online" via
zoom. You may attend and participate in live recordings in
zoom. Official Lecture Timings are Mondays and Wednesdays from
8:35 to 9:55AM
Office hours: Likely during the lecture slot on
one of the days. Tentatively, we will have office hours from
8:35-9:55 on Wednesdays during the class. After the Fall Break,
we will find another time slot.
Teaching Assistant:
Yunkai Wang (He will mark Assignments and Tests)
Course objectives: Algorithm
design techniques for modern data sets arising in, for example,
data mining, web analytics, epidemic spreads, search engines and
social networks. Topics may include data mining, hashing,
streaming, clustering, recommendation systems, link analysis,
dimensionality reduction, online, social networking, game
theoretic and probabilistic algorithms.
Caution: Note that you need a minimum of B+ in COMP 2804 to
register in this course. The contents of this course are fairly
broad, and will cover a spectrum of techniques from the design and
analysis of algorithms. It is assumed that you have a very good
grasp on the analysis of algorithms (O-notation, recurrences, and
complexity analysis), elementary probability theory including
expectation and indicator random variables (contents of COMP
2804), the knowledge of basic data structures (lists, trees,
hashing), and the knowledge of discrete mathematics (counting,
permutations and combinations, proof techniques:
induction, contradiction, ..). Note that there will not be time to
review these material, and to appreciate the contents of this
course, you must have a very good grasp on these topics.
Textbooks:
Useful References related to various topics:
- PageRank:
- Probability:
- Locality Sensitive Hashing
- Data Streaming
- Adwords
- Chapter on Advertisement on the Web from the
MMDS Book.
- SVD/Dimensionality Reduction:
- Chapter in the MMDS book on Dimensionality
Reduction
- Try a few SVDs using Wolfram Alpha.
- Randomized Load Balancing & Perfect
Hashing
- http://pages.cs.wisc.edu/~shuchi/courses/787-F09/scribe-notes/lec7.pdf
- Kleinberg&Tardos Algorithm Design Book,
Chapter 13.
Topics
We will likely cover parts of MMDS Chapters 3, 4, 5, 7, 8, 9,
11, and possibly 10 and 6. In addition to this there will
be more material on Data Streaming from some research articles.
In broad terms, some combination of the following topics:
Link Analysis
Mining Data Streams - Frequency and Moment Estimates
Finding Similar Items - Locality Sensitive Hashing
Advertising on the web - Adwords & Online matching
Recommendation Systems - Collaborative Filtering
Dimensionality Reduction - Eigenvalues, PCA, SVD
Clustering - K-Means
Mining Social Networks - Community Detection, Partitioning of
Graphs, Dynamic Graph Algorithms
Frequent Itemset
+ some probability+linear algebra as and when required.
Grading Scheme (Tentative):
As decided by the majority of students:
- 4-5 Assignments: 50%
- 2-3 Tests: 50% (Best 2 out of 3)
Tentative plan is to hold tests on
Saturdays: October 17, November 20, and December 5.
Weekly Schedule
Please see cuLearn for video recordings and additional
details.
The topics reflect the title of the Videos (in most cases)
Week
#
|
Topic
|
References
|
Week
1 |
What
is this course about? (This video will
be replaced by the Online Lecture of Sept. 9th)
Introduction, Course Logistics
Majority and Heavy Hitters |
cuLearn Video
|
Week
2 |
Count-Min
Sketch
Bloom Filters |
cuLearn Video |
Week
3 |
Probability
Basics
Balls and Bins Experiments
|
cuLearn Video |
Week
4
|
Hashing
Estimating Frequency
Moment F_0 |
cuLearn Video |
Week
5 |
Frequency
Moment F_2
Stream
Statistics over Sliding Window |
cuLearn Video |
Week
6 |
Locality
Sensitive Hashing
Matching
Fingerprints |
cuLearn Video |
Week
7 |
Linear
Algebra
Eigenvalues and Eigenvectors
|
Online |
Week
8
|
Pagerank
Algorithm |
Online |
Week
9 |
Clustering
Social Networks
Graph Partitioning
|
Online |
Week
10
|
Adwards
Problem
Balance Algorithm |
Online |
Week
11 |
Recommendation
Systems |
Online |
Week
12 |
Dimensionality
Reduction
|
Online |
Important Considerations:
Late assignments are not accepted. Assignments submissions are
handled electronically (i.e., through cuLearn) and there is no
"grace period" with respect to a deadline. Technical problems do not
exempt you from this requirement. You are advised to:
• periodically upload you progress (e.g. upload your progress at
least daily)
• attempt to submit your final submission at least one hour in
advance of the
due date and time.
Undergraduate Academic Advisor:
The Undergraduate Advisor for the School of Computer Science is
available in Room 5302C HP; by telephone at 520-2600, ext. 4364; or
by email at undergraduate_advisor@scs.carleton.ca. The
undergraduate advisor can assist with information about
prerequisites and preclusions, course substitutions/equivalencies,
understanding your academic audit and the remaining requirements for
graduation. The undergraduate advisor will also refer students to
appropriate resources such as the Science Student Success Centre,
Learning Support Services and Writing Tutorial Services.
University Policies
We follow all the rules and regulations set by Carleton
University, Dean of Science, and the School of Computer Science
regarding accommodating students with any kind of need(s). Please
consult with the appropriate authorities to see how you can be
accommodated and please follow their procedures. For
information about Carleton's academic year, including registration
and withdrawal dates, see Carleton's Academic Calendar.
Following is a standard list of recommendations that we have been
advised to provide you with respect to whom to contact for the
appropriate action(s):
Pregnancy Obligation. Please contact your instructor with
any requests for academic accommodation during the first two weeks
of class, or as soon as possible after the need for accommodation is
known to exist. For more details, visit Equity Services.
Religious Obligation. Please contact your instructor with
any requests for academic accommodation during the first two weeks
of class, or as soon as possible after the need for accommodation is
known to exist. For more details, visit Equity Services.
Academic Accommodations for Students with Disabilities If you
have a documented disability requiring academic accommodations in
this course, please contact the Paul Menton Centre for Students with
Disabilities (PMC) at 613-520-6608 or pmc@carleton.ca for
a formal evaluation or contact your PMC coordinator to send your
instructor your Letter of Accommodation at the beginning of the
term. You must also contact the PMC no later than two weeks before
the first in-class scheduled test or exam requiring accommodation
(if applicable). After requesting accommodation from PMC, meet with
your instructor as soon as possible to ensure accommodation
arrangements are made. For more details, visit the Paul
Menton Centre website.
Survivors of Sexual Violence. As a community, Carleton
University is committed to maintaining a positive learning, working
and living environment where sexual violence will not be tolerated,
and survivors are supported through academic accommodations as per
Carleton's Sexual Violence Policy. For more information about the
services available at the university and to obtain information about
sexual violence and/or support, visit:
carleton.ca/sexual-violence-support.
Accommodation for Student Activities. Carleton University
recognizes the substantial benefits, both to the individual student
and for the university, that result from a student participating in
activities beyond the classroom experience. Reasonable accommodation
must be provided to students who compete or perform at the national
or international level. Please contact your instructor with any
requests for academic accommodation during the first two weeks of
class, or as soon as possible after the need for accommodation is
known to exist. For more details, see the policy.
Student Academic Integrity Policy. Every student should
be familiar with the Carleton University student academic integrity
policy. A student found in violation of academic integrity standards
may be awarded penalties which range from a reprimand to receiving a
grade of F in the course or even being expelled from the
program or University. Examples of punishable offences include:
plagiarism and unauthorized co-operation or collaboration.
Plagiarism. As defined by Senate, "plagiarism is
presenting, whether intentional or not, the ideas, expression of
ideas or work of others as one's own". Such reported offences will
be reviewed by the office of the Dean of Science.
Unauthorized Co-operation or Collaboration. Senate policy
states that "to ensure fairness and equity in assessment of term
work, students shall not co-operate or collaborate in the completion
of an academic assignment, in whole or in part, when the instructor
has indicated that the assignment is to be completed on an
individual basis". For this course, the following holds:
- Students are NOT allowed to collaborate on
assignments. Please avoid using search engines to look for
answers etc. Just a word of caution - in theoretically oriented
courses, it is important to come up with your own ideas for the
proof/an algorithm/a contradiction/etc. Sometimes these are like
logical puzzles - if somebody tells you a solution then they are
trivial and hard part is to come up with a solution. What we
want to learn is how to solve them ourselves.
- Past experience has shown conclusively that those who do not
put adequate effort into the assignments do not learn the
material and have a probability near 1 of doing poorly on the
exams/tests.
-
Penalties for
academic offences can be found on the ODS webpage:
https://science.carleton.ca/academic-integrity/.
Announcements:
- All communications will be done via forums in
cuLearn if it is for the interest for the whole class.
Discussion that are of private nature can be done by sending
me an e-mail.
- Material for each Week will likely be posted
in advance.
- I am planning to hold my office hours during
one of the lecture slots where you can ask any type of
questions and clarifications etc. I request the TA to hold
their office hour in the other time slot. If you need
additional office hours, we can try to accommodate.
- I am planning to record some of the contents
of this course in a series of short videos during Summer.
Authorities are not allowing us to access the office on a more
regular basis and they are understandably not able to provide
a clear timeline due to the unpredictable and evolving
situation of COVID-19, Unfortunately the video recordings will be noisy
due to the limitation of the location and equipment. I will try to provide ample references etc. for you
to do self reading and try to help you as much as possible
with extra discussion time etc. during the Fall Term.
- I am new to online lectures and there will
several bumps! I am still hoping that you will appreciate how
beautiful algorithmic techniques have influenced so many
modern applications.