This project was motivated by a desire to create an interface to explore the rich relationships between courses at Cornell. Cornell currently offers a free public API endpoint that provides an enormous amount of information on courses each semester. However, programatically exploring the relationships between courses through this public endpoint is quite difficult. For instance prerequiste informaiton is messy and semantically difficult to parse. This makes tasks such as building an system to plan out corsework and create flowcharts difficult and inefficient.
The use of a graph DB is motivated by this fact (to quickly find the path from the current set of Course nodes a student has taken to the courses they'd like to take, among other relationship based tasks). The graph DB is also useful for representing other relationships, such as major, minor, and college requirements (represent these as nodes with "Requires" edges to respective types of courses.
Building this was a great pipeline problem, and required building a system that is capable of parsing just over 5,000 courses as efficienlty as possible (thanks to some multi-threaded logic). I also intentionally exposed some of my DB through an API I built and deployed using Flask (in the be/endpoints folder) so that I could use them in related user-facing project.
Currently, every node in the graph is a Course node, each of which holds the following properties: course name, level, description, course title, department, distributions, distribution groups (categories), and if it's a valid liberal studies.
A Course node may have a REQUIRES edge to another Course node, indicating that Course B is a prerequisite of Course A. To account for equivalent prerequisite courses (i.e CS 1112 vs CS 1110) each REQUIRES edge has one property: group. All REQUIRES edges with the same tail and share the same group property are interchangable.
For instance, say Course A requires Course B AND (Course C OR Course D). The DB will represent this with three REQUIRES edges from Course A to B, C, and D respectivly. REQUIRES(A --> B) has property group = 1. REQUIRES(A --> C) and REQUIRES (A --> D) will both have property group = 2, indicating they are interchangable.
This representation creates new possibilities for systems that plot the flowcharts of a student's course plans. For instnace, say a Freshman CS major at Cornell has taken all courses in a Set A, and a developer would like to build an agent that takes in the student's preferences for specializations (say they want to develop a deep intution of distributed systems) and outputs a semester-by-semester courseplan that meets their requirements ASAP and satisfies major/college requirements. This DB could then be exposed as a tool, allowing the agent to quickly follow paths to build the schedule(s) this student can take.
Currently expanding project to include a few extra but more interesting classes of nodes: Major, Minor, College, and Course Type (to potentially represent looser Major/College requirements, since edges to every course is not necessarily appropriate for some requirements).
CS 3420, its two interchangable prereqs, and a bit of its properties:
All of the courses offered in the Spring 2026 semester that require CS 2110 at Cornell:
All courses that require CS 2800 (Foundations of Mathematics for Computing). Surprisingly, even a few non-CS courses require CS 2800, from linguistics courses to cross-listed ECON courses.