Ready to Start Your Career?

Introduction to Graph Theory

ProgrammerE 's profile image

By: ProgrammerE

January 2, 2017

A company hires you to cut costs on their infrastructure. This requires you to know what devices are part of their infrastructure. From there, you determine where the new fiber optic cable should be installed to maximize the benefit to the organization. Finally, you discover that a router is underutilized, and recommend reconfiguring some of the users through that router. A job done well due to graph theory.

A graph G is a set of vertices V along with a set of edges E, which is the cross product of V with itself. The set of vertices may represent users, roles, devices, states, files, etc. The set of edges represents a relation between two vertices, such as coworkers, read access, a connection between two devices, a transition from one state to another, etc.A vertex v is said to be incident to an edge e if the exists some vertex x such that e = (v, x). In this scenario, we would say that nodes v and x are adjacent. Note that if there existed some edge f and some node y such that f = (y, v) or f = (x, y) then we would say that edges e and f are incidents  A graph may be represented in many ways. Two of the most common ways to represent a graph is using an adjacency list or an adjacency matrix. Let V = {A, B, C, D}, E = {(A, B), (B, A), (B, D), (D, D)}. The corresponding adjacency list is E = {A : <B>, B : <A, D>, C : <>, D : <D>}. The ith row and jth column of the adjacency matrix for a directed graph are 0 if there is no edge from the ith vertex to the jth vertex, and 1 if the edge from the ith vertex to the jth vertex exists. For our graph G, the adjacency matrix is

0

1

0

0

1

0

0

1

0

0

0

0

0

0

0

1

A graph Is said to be undirected if for every edge e of E and for all nodes u and v of V, if e = (u,v), there exists an edge f in E such that f = (v, u). This is equivalent to saying that the transpose of our adjacency matrix is the adjacency matrix. Otherwise, the graph is said to be directed.A walk is a tuple of edges (e1, e2, …, en) such that for all 0 < i < n, ei = (u. v) implies e(i+1) = (v, w) (The destination of the ith edge becomes the source of the next edge). A cycle is a walk in which e1’s source and en’s destination are the same nodes. A graph is cyclic if the graph contains at least one cycle, and acyclic if it does not. Walks may or may not contain cycles. A path is a walk which does not contain any cycles. Two nodes are said to be connected if there exists a path between them. To find all the walks of length 2, I can multiply the adjacency matrix by itself. The resulting matrix is listed bellow.

1

0

0

1

0

1

0

1

0

0

0

0

0                                            

0

0

1

 

I can multiply the adjacency matrix by itself twice to final all walks of length 3.  The resulting matrix is listed bellow.

0

1

0

1

1

0

0

2

0

0

0

0

0

0

0

1

Note that there are two walks of length 3 from B to D. They are ((B, D), (D, D), (D, D)) and ((B, A), (A, B), (B, D)). You can repeat this process of matrix multiplications to find walks of any length.To find walks of length at most 3, we need only add the walks of length 1, 2, and 3. The resulting matrix is listed bellow.

1

2

0

2

2

1

0

4

0

0

0

0

0

0

0

3

In evaluating the various storage schemes and algorithms, we use two different metrics. Asymptotic time complexity and asymptotic space complexity. We represent this using ‘Big-Oh’ notation, meaning how much would it take in the worst case. Storing the graph in an adjacency list takes O(|V| + |E|) space. (|V| is the number of nodes, |E| is the number of edges). Storing the graph in an adjacency matrix takes O(|V|^2) space. Determining whether there is an edge from u to v takes O(|V|) time in an adjacency list, where it would take O(1) time in an adjacency matrix. Although we use commands or graphical user interfaces to search, We use searching all the time for computer security. Most windows machines use Discretionary Access Control (DAC). If you are explicitly told to remove all confidential files from a user account, you would need to traverse the file system looking for files marked “confidential”. Likewise, looking for a file or files containing passwords and/or secrets requires searching. Finding one or all the devices on a network requires searching. In layman’s terms, starting from a node s, we want to know about all nodes we can reach by following 1 or more edges. There are two famous algorithms for searching a graph. Breadth First Search (BFS) uses a queue, which allows it to search near the node before trying to search far from the node. Contrastingly, Depth First Search (DFS) allows you to search away from the node before searching near it. It’s better demonstrated in a YouTube video: https://www.youtube.com/watch?v=bIA8HEEUxZI Both BFS and DFS take O(|V| + |E|) time complexity, but BFS takes O(|V| + |E|) space complexity for an adjacency list and O(|V|^2) space complexity for an adjacency matrix while DFS uses O(|V|) space complexity. However, running BFS will give us the shortest path from our starting node s to all nodes, making it an extremely efficient searching algorithm.A weighted graph is an unweighted graph G along with a cost function c whose domain is the set of edges E in G. For the purposes of this article, we will assume all the edge weights are real numbers (i.e. c’s codomain is the set of real numbers). As we saw in the previous paragraph, BFS can be used to solve the shortest path in an unweighted graph. We can extend this functionality to graphs where all the weights are equal. However, many cases in both computer security and computer networks require finding the shortest path in a graph in which edge weights are not trivial. If you are penetration testing a network, each device can cost you a certain amount of time to hack. Given that you want to be on the network for the least amount of time possible, it would be worth finding which set of computers to compromise to gain access to your target. You’ll notice we applied a small trick to the problem. Because it takes c(v) time to compromise node v, we made the weight of edge e = (v, u) = c(u). Routing a packet to a destination takes time, so we would like to minimize the time a packet takes from device to device. Alternatively, we may have a trust rating on different devices and we may be trying to send the packet through the most secure channel. Clearly, finding the shortest path is important, and there are a couple of different algorithms which solve slightly different problems in different time complexities. If your cost function is strictly non-negative, we can find the shortest path between s and all nodes using Dijkstra’s algorithm. Using a priority queue, which operates like BFS except for the lower the priority, the sooner the node gets checked. Check out the following YouTube video for an in-depth explanation of Dijkstra’s algorithm: https://www.youtube.com/watch?v=8Ls1RqHCOPw . Once again, as an adjacency matrix, this takes O(|V|^2) time, but for adjacency lists, it takes O(|V|log|V| + |E|) time. The space complexity is O(|V|^2). If we allow negative edge weights, we can solve the shortest path between two nodes using the Bellman-Ford Algorithm. It can also be used to detect negative weight cycles, in which case the shortest path cannot be found. It does this by iterating |V| - 1 times through all the edges looking for improvements. A good YouTube video explaining the algorithm can be found here: https://www.youtube.com/watch?v=-mOEd_3gTK0 The run time of Bellman-Ford is O(|V|^3) for adjacency matrices and O(|V|(|V| + |E|)) for adjacency lists. The space complexity is O(|V|^2). If we want to find the shortest path between all vertices, we can use the Floyd-Warshall algorithm. We keep track of the path and the distance between every node. Here is the corresponding YouTube video explaining it: https://www.youtube.com/watch?v=LwJdNfdLF9s . The run time for Floyd-Warshall is O(|V|^3) regardless of your storage choice and the space complexity is O(|V|^2). If the graph has relatively few edges and there are no negative weight cycles, we may use Johnson’s algorithm to improve our runtime to O(|E||V|log|V|). See https://www.youtube.com/watch?v=ppDL4lDrP4o for more details.A Tree is an acyclic graph or a graph with no cycles. A spanning tree is a tree which contains every vertex in our graph. If we have a weighted undirected graph, we can have a minimum spanning tree, that is a tree which spans all the vertices where the edge weights of our tree are less than or equal to any other spanning tree. Finding a minimum spanning tree is important especially in networking. Broadcasts to all computers should be using a minimum spanning tree. Designing our network so that we connect all our devices with the least amount of cable requires a minimum spanning tree.  We solve these problems with two algorithms. Prim's algorithm considers two sets and moves all the vertices from one set to the other. Kruskal’s algorithm uses the disjoint set data structure and iterates through edges in increasing weight. Both algorithms are demonstrated in this YouTube video: https://www.youtube.com/watch?v=3fU0w9XZjAA and take O(|V|^2) time using an adjacency matrix and O(|E|log|V|) time using an adjacency list, and both algorithms have a space complexity of O(|V|). Prim’s algorithm works better in graphs with lots of edges. Such graphs are called dense graphs. Kruskal’s algorithm works better in graphs with few edges. Such graphs are called sparse graphs.Suppose you want to download a file from a server. Many packets will be sent from the server to your computer. Each wire carries a certain capacity c, and you want to maximize the flow between the server and your computer. Capacity has the same domain and codomain as our weights; however, unlike weights, our Capacity comes with another function, called the flow which also shares the same domain and codomain as the weight function. We solve this problem with the Ford-Fulkerson Algorithm but in practice, we use the Edmond-Karp Algorithm as it relies only on the number of vertices and the number of edges. The corresponding YouTube video can be found here: https://www.youtube.com/watch?v=SqGeM3FYkfo .The running time of this algorithm is O(|V||E|^2) and the space complexity of the algorithm is O(|V|^2) time.All the algorithm’s we have looked at so far are polynomial in size with respect to its input. There are several algorithms which may or may not be polynomial with respect to the size of its inputs. This is the famous P = NP problem in computer science. Examples of graph algorithms which seem to become exponentially harder with the size of the graph include finding a Hamilton path, where a Hamilton path visits each vertex exactly once, starting and ending at specific vertices. Follow this link: https://www.youtube.com/watch?v=oTN53mXXayw for more information. Finding the minimum vertex cover, or a subset of our nodes such that every node is incident to a node in our subset is NP-hard. Find out more here: https://www.youtube.com/watch?v=m_dOtat56vY Finding cliques or a subset of vertices where there exists an edge between all the nodes in the subset in a graph is NP-complete. Find more information here: https://www.youtube.com/watch?v=_ZKPP5PrHOkI hope I have convinced you that graph theory is an important topic in networking and computer security. There are so many other important topics in graph theory, such as Euler paths, Graph coloring, Approximation algorithms, etc. that would take way too long to cover in such a small amount of time. To learn in detail everything, I’ve said here and a lot more, check out this book used in the third year math course on graph theory: “Graph Theory”, third edition, by A. Bondy and U. S. R. Murty: Springer. There are several ways to study graph theory. The reason that I put the focus on running time is because you are not likely going to be implementing these algorithms, and even if you do you can read the algorithm at that time and adapt it to your needs. However, when you call these functions in your computer, I believe it is extremely important to have an approximation of how long it will take and how much you have to store. If you like my content, check out this similar article where I introduced mathematical logic: https://www.cybrary.it/0p3n/mathematical-introduction-logic-logic-provides-foundation-hacking/Regards,ProgrammerE
Schedule Demo