The Algorithms Design Manual, Combinatorial Problems

Contents:

Combinatorial Problems

Whats different?

Issues of Precision and Error. Floating point issues. Use both single and double precision, and think hard when they diverge.
Extensive Libraries of Code. There is no reason to not to use all thats already written.

Sorting

Input description: A set of \(n\) items.
Problem description: Arrange the items in increasing (or decreasing) order.

The most fundamental algorithmic problem in computer science.
First rule of algorithm design : "when in doubt, sort"
Questions to ask,
- How many keys will you be sorting? Use simple routines.
- Will there be duplicate keys in the data? Do you need sort to be stable?
- What do you know about your data?
  - Has the data already been partially sorted? Insertion sort would perform much better.
  - Do you know the distribution of the keys? A bucket or distribution sort would make sense.
  - Are your keys very long or hard to compare? It might make sense to use prefixes, or maybe use radix sort.
  - Is the range of possible keys very small? Use something like counting sort.
- Do I have to worry about disk accesses? Use external sorting, or use B-tree.
- How much time do you have to write and debug your routine?
If you want to implement your own quicksort,
- Use randomization,
- Median of three
- Leave small subarrays for insertion sort
- Do the smaller partition first. compiler optimizations using tail recursion might help.
Implementations : GNU sort, C++ STL sort and stable_sort, and many more.
Related : Dictionaries, searching, topological sorting

Searching

Input description: A set of \(n\) keys \(S\), and a query key \(q\).
Problem description: Where is \(q\) in \(S\)?

"Searching" is a word that means different things to different people.
We consider the task of searching for a key in a list, array or a tree.
Sequential search vs. Binary search.
Questions to ask,
- How much time can you spend programming? Binary search can be tricky to be made bug free.
- Are certain items accessed more often than other ones? Exploit there relative frequencies.
- Might access frequencies change over time? Use self-organizing structures like splay trees.
- Is the key close by? Use sequential search.
- Is my data structure sitting on external memory? Use B-trees.
- Can I guess where the key should be? Interpolation search
Implementations : C++ STL find and binary_search
Related : Dictionaries, sorting

Median and Selection

Input description: A set of \(n\) numbers or keys, and an integer \(k\).
Problem description: Find the key smaller than exactly \(k\) of the \(n\) keys.

Median finding is an essential problem in statistics, more robust than mean.
Special case of selection problem which arises in several applications,
- Filtering outlying elements : Remove 10% largest and smallest values.
- Identifying the most promising candidates : Select top 25%.
- Deciles and related divisions.
- Order statistics
Issues in median finding(arbitrary selection),
- How fast does it have to be? \(O(n\log n)\) or \(O(n)\) expected-time.
- What if you only get to see each element once? Define approximate deciles, or do random sampling, or mix both strategies.
- How fast can you find the mode?
Implementations : C++ STL nth_element
Related : Priority queues, sorting

Generating Permutations

Input description: An integer \(n\).
Problem description: Generate (1) all, or (2) a random, or (3) the next permutation of length \(n\).

Many algorithmic problems require seek the best way to order a set of objects, like travelling salesman, bandwidth, graph isomorphism.
A fundamental notion of order is required.
Two different paradigms : ranking/unranking and incremental change methods.
Define functions Rank/Unrank such that, \(p = Unrank(Rank(p), n)\).
- Sequencing permutations
- Generating random permutations
- Keep track of a set of permutations.
Incremental change algorithms can be tricky, but concise.
We can also save time by avoiding identical permutations if there are duplicate elements.
Implementations : C++ STL next_permutation and prev_permutation.
Related : Random-number generation, generating subsets, generating partitions.

Generating Subsets

Input description: An integer \(n\).
Problem description: Generate (1) all, or (2) a random, or (3) the next subset of the integers \(\{1, \cdots , n\}\).

Here, the order among the elements does not matter.
Many algorithmic problems seek the best subset : vertex cover, knapsack, set packing.
There are \(2^n\) different subsets of an \(n\)-element set. This is much less than the \(n!\) permutations.
It is a good idea to keep elements in subset sorted in a canonical order so as to speed up identity testing.
Options,
- Lexicographic order : surprisingly difficult to generate.
- Gray Code : Adjacent subsets differ by insertion/deletion of only one element.
- Binary counting : Use a bit-vector.
Useful in,
- K-subsets All subsets of length K.
- Strings
Related : Generating permutations, generating partitions

Generating Partitions

Input description: An integer \(n\).
Problem description: Generate (1) all, or (2) a random, or (3) the next integer or set partitions of length \(n\).

Integer partitions are multisets of nonzero integers that add up exactly to \(n\). Use in for eg. nuclear fision..
Set partitions divide the elements \(1, \ldots, n\) into nonempty subsets. Use in for eg. vertex/edge coloring and connected components.
Related : Generating permutations, generating subsets.

Generating Graphs

Input description: Parameters describing the desired graph, including the number of vertices \(n\), and the number of edges \(m\) or edge probability \(p\).
Problem description: Generate (1) all, or (2) a random, or (3) the next graph satisfying the parameters.

Useful for generating test data for programs.
A different application arises in network design. Adding redundancy and tolerance to vertex failures.
Questions to ask,
- Do I want labeled or unlabeled graphs?
- Do I want directed or undirected graphs?
How do you want to model randomness?
- Random edge generation
- Random edge selection
- Preferential attachment
Alternatively, we can generate organic graphs that can reflect relationships among real-world objects.
Other interesting graphs,
- Trees
- Fixed degree sequence graphs
Implementations : Stanford GraphBase, Combinatorica, and more resources.
Related : Generating permutations, graph isomorphism

Calendrical Calculations

Input description: A particular calendar date \(d\), specified by month, day, and year.
Problem description: Which day of the week did \(d\) fall on according to the given calendar system?

Important in business applications. More important for international applications dealing with different timezones and possibly different calender systems.
Complications arise with irregularity in years duration(leap years).
So it becomes pointless to implement them by hand.
Implementations : C++ Boost datetime.
Related : Arbitrary precision arithmetic, generating permutations

Job Scheduling

Input description: A directed acyclic graph \(G = (V,E)\), where vertices represent jobs and edge \((u,v)\) implies that task \(u\) must be completed before task \(v\).
Problem description: What schedule of tasks completes the job using the minimum amount of time or processors?

Devising a proper schedule to satisfy a set of constraints is fundamental to many applications.
Mapping tasks to processors is a critical aspect of any parallel-processing system.
Applications
- Topological sorting : precedence constraints.
- Bipartite matching : skill matching
- Vertex and edge coloring : avoid interference.
- Travelling salesman
- Eulerian Cycle
Interesting problems,
- Critical path
- Minimum completion time
- What is the tradeoff between the number of workers and completion time?
Can also be modeled using Integer-linear programming.
More variations,
- Assign jobs to identical machines to minimize the the total elapsed time.
- Tasks are provided with allowable start and required finish times.
Implementations : JOBSHOP (C), Tablix, LEKIN and more.
Related : Topological sorting, matching, vertex coloring, edge coloring, bin packing

Satisfiability

Input description: A set of clauses in conjunctive normal form.
Problem description: Is there a truth assignment to the Boolean variables such that every clause is simultaneously satisfied?

A primary application of testing hardware/software design on all the inputs.
The original NP-complete problem.
Issues,
- Is your formula the AND of ORs (CNF) or the OR of ANDs (DNF)? DNF can be solved easily, while CNF is NP-complete. We can use De Morgan's laws to convert CNF into DNF. However the translation itself could take exponential time.
- How big are your clauses? 3-SAT and above are NP-complete.
- Does it suffice to satisfy most of the clauses? This can make the problem easier.
Related : Constrained optimization, travelling salesman problem

Part 4 of the The Algorithms Design Manual series

Previous articles

Next articles