Sunday, May 26, 2013

Reading FASTA files with Scala

FASTA is a simple file format to store nucleotide or peptide sequences. A sequence in FASTA format starts with a greater-than sign followed by the sequence name (and possibly other information) and then several lines of nucleoide or amino acid letters. A multi-FASTA file contains multiple FASTA sequences. Here an example taken from wikipedia:

>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

It is a common task to read FASTA sequences from a file and I was aiming for an elegant/short implementation in Scala. Here is a recursive version I like:

  
  def readFasta(filename: String) = {
     import scala.io.Source
     def parse(lines: Iterator[String]): List[(String,String)] = {
       if(lines.isEmpty) return List()
       val name = lines.next.drop(1)
       val (seq,rest) = lines.span(_(0)!='>')
       (name, seq.mkString)::parse(rest)
     }
     parse(Source.fromFile(filename).getLines())
  }

It defines an internal function parse that takes an iterator over the lines in the file (generated by Source.fromFile(filename).getLines() and returns a list of tuples, where each tuple contains the sequence name and the string of nucleotide or amino acid letters.

The recursion stops when the iterator is empty (lines.isEmpty) otherwise a line is read, which must be the name and the first letter is dropped to get rid of the greater-than sign (lines.next.drop(1)). After the line with the name span(_(0)!='>') is used to take the following lines until the next sequence name is found and store them in seq. _(0)!='>' tests whether the first letter of the line is the greater sign. The remaining lines are stored in rest.

Actually, "storing" is the wrong word, since Scala beautifully returns two new iterators when span is called on an iterator. It is worth noting, that nowhere a list of the read lines is stored. It is turtles/iterators all the way through except for the last line ((name, seq.mkString)::parse(rest)), where the return list is created. (name, seq.mkString) builds a tuple with the sequence name and the the sequence letters - the latter is constructed by gluing all sequence lines together with mkString. This tuple is the head of the return list and the tail is build recursively by calling parse for the remaining lines.

Let's add three minor improvements. It will make the code less elegant but more robust. Firstly, lines could have leading or trailing white spaces. They shouldn't but hey, you know how it is. To play it save we add a to the part where the lines are read from the file (last line in code). Secondly, there could be empty lines that we filter out via filterNot(_.isEmpty()).

Thirdly, the line with the sequence name frequently contains additional annotation that is typically separated by the pipe "|" symbol. But there is really no solid standard and it could be in any format - and it will. We therefore define a regular expression to extract the name from the header line and tweak it if necessary. The expression """>(.+?)(\\|.+)?""".r has two groups. The first captures the name the second captures any annotation starting with the pipe "|" symbol, which needs to be escaped (because it functions as OR in Regex language). The final code looks like this:

  
  def readFasta(filename:String) = { 
     import scala.io.Source
     val header = """>(.+)(\\|.+)?""".r
     def parse(lines: Iterator[String]): List[(String,String)] = {
       if(lines.isEmpty) return List()
       val header(name,annotation) = lines.next
       val (seq,rest) = lines.span(_(0)!='>')
       (name, seq.mkString)::parse(rest)
     }
     parse(Source.fromFile(filename).getLines().map(_.trim).filterNot(_.isEmpty()))
  }    

While elegant the above solution has the problem of being recursive (specifically not even tail recursive), which will cause a stack overflow for files with large numbers of sequences. The following iterative implementation is ugly but I haven't found any really nice alternative. Take it as a temporary implementation ;)

  
  def readFasta(filename:String) = {
    val header = """>(.+)(\\|.+)?""".r
    var lines = Source.fromFile(filename).getLines().filterNot(_.isEmpty())
    var sequences = List[(String,String)]()
    while(lines.hasNext) {
      val header(name,annotation) = lines.next
      val (seq,rest) = lines.span(_(0)!='>')
      sequences = (name, seq.mkString)::sequences
      lines = rest
    }
    sequences.reverse
  }  

Also interesting is this solution that uses parser combinators to read FASTA sequences.

Alexandre Masselot has posted an interesting problem (see below). What if you want an iterator over the sequences instead of a list? We can easily modify the code above to achieve this by creating and returning an iterator when calling readFasta. An iterator needs a hasNext method, which essentially replaces the condition of the while loop used above and a next method, which contains the body of the while loop:

  
  def readFasta(filename: String) = 
    new Iterator[(String, Iterator[Char])] {
    val header = """>(.+)(\|.+)?""".r
    var lines = Source.fromFile(filename).getLines().filterNot(_.isEmpty())
    def hasNext = lines.hasNext
    def next = {
      val header(name, annotation) = lines.next
      val (seq, rest) = lines.span(_(0) != '>')
      lines = rest
      (name, seq.flatMap(_.iterator))
    }
  }

If you look at the code closely, you will notice on more change. I have replaced (name, seq.mkString) by (name, seq.flatMap(_.iterator)). This is where Scala shines. Instead of concatenating the lines to a sequence via mkString we can simply call flatMap(_.iterator) to convert lines to iterators over line characters and then concatenate the iterators via flatMap.

In practice, you will find an iterator over sequences frequently useful, but an iterator over sequence characters less so. Many computational evaluations of sequence data will require random access. If memory consumption is an issue, compressing the sequence is probably a better way. Also in many cases a set of sequence data will be related and storing only the differences to a reference sequence (e.g. the human reference genome) can be a very efficient representation of large sequence data.

Friday, May 24, 2013

Test if string has all unique characters

Here comes another typical question from coding interviews: "Test whether all characters within a string are unique". An inefficient solution with O(n^2) complexity is to loop over all characters of the string and test in a second, inner loop whether an equal character can be found:

  
def all_unique(str):
    for ch1 in str:
        for ch2 in str:
            if ch1==ch2:
                 return False
    return True 

This is obviously not what the interviewer is after. A simple and efficient solution is to convert the string to a set of characters and to compare the number of elements in the set with the length of the string:

 
def all_unique(str): 
    return len(set(str)) == len(str)   

Too easy and therefore the interviewer doesn't like it either ;-). What is wanted is a mapping of characters to a boolean/bit array that indicates which character has been found in the string so far:

  
def all_unique(str):
    test = [False]*256
    for ch in str:
        idx = ord(ch)
        if test[idx]: return False
        test[idx] = True
    return True 

This approach has linear time complexity but requires an array with the length of the size of the character set.

Implement a queue using two stacks

Implementing a queue using two stacks is a typical question for coding interviews. It is a bit artificial, at least I haven't come across any real need, but hey, maybe you need to implement a queue in FORTH. Apart from that it is a nice puzzle ;)

The key insight is that a stack reverses order (while a queue doesn't). A sequence of elements pushed on a stack comes back in reversed order when popped. Consequently, two stacks chained together will return elements in the same order, since reversed order reversed again is original order.

How to do it? We use an in-stack that we fill when an element is enqueued and the dequeue operation takes elements from an out-stack. If the out-stack is empty we pop all elements from the in-stack and push them onto the out-stack. Here is the implementation in Python:

  
class Queue(object):
    def __init__(self):
        self.instack = []
        self.outstack = []
    
    def enqueue(self,element):
        self.instack.append(element)
   
    def dequeue(self):
        if not self.outstack:
            while self.instack:
                self.outstack.append(self.instack.pop())
        return self.outstack.pop()    

Let's try it:

  
q = Queue()
for i in xrange(5):
    q.enqueue(i)
for i in xrange(5):
    print q.dequeue()
0 1 2 3 4

The complexity is amortized O(1) for enqueue and dequeue.

Tree iterator in Python

Sometimes you have a tree data structure and want to iterate over the nodes of the tree in breadth-first or depth-first order. This also seems to be a question that is occasionally posed in coding interviews. Since it is fun and especially easy to implement in Python let's do it. First things first. We design a Node class that represents the nodes of the tree.

  
class Node(object):

    def __init__(self, value, children=[]):
        self.value = value
        self.children = children

The constructor __init__ takes the value of the node and a list of child nodes that is empty by default. We can now construct a tree such as the following

  
root = Node(1, [Node(2, [Node(4), Node(5)]), Node(3)])

To be able to display the tree we add two method to the Node class. __str__ returns a string representation of a node and showTree performs a recursive, depth-first traversal of the tree and prints the node values indented by their hierarchical level.

  
class Node(object):

    def __init__(self, value, children=[]):
        self.value = value
        self.children = children

    def __str__(self):
        return str(self.value)

    def showTree(self, level=0):
        print "%s%s" % ("."*level, self)
        for node in self.children:
            node.showTree(level+2)

Running root.showTree() for the tree defined above produces the following output. 1 is the root node with two children 2 and 3, where node 2 has 4 and 5 as children.

1
..2
....4
....5
..3

Now, the iterator. There is two ways (actually there are more) to iterate over the nodes. Level by level (= breadth first) or the same way showTree works (depth first). Both are equally easy. The only difference is one implementation uses a stack while the other uses a queue. Let's start with breadth first. The following code implements an iterator class. As such it has to provide two methods: __iter__, which returns the class object itself and next, which returns the next node every time it is called.

  
class BreadthFirst(object):

    def __init__(self, node):
        self.queue = [node]

    def __iter__(self):
        return self

    def next(self):
        if not self.queue: raise StopIteration
        node = self.queue.pop(0)
        self.queue += node.children
        return node

The constructor takes a node and puts it into a list that we will treat as a queue. Therefore we name it "queue" and not "nodes" or even worse "list". What happens in the next method? When the queue is empty (not self.queue) a StopIteration exception is raised that stops the iterator. If the queue is not empty a node is dequeued (terrible word) via queue.pop(0) from the beginning of the list and the children of the node are added to the end of the list (enqueued). The iterator is then called as follows and returns the tree nodes in breadth-first order.

  
for node in BreadthFirst(root):
    print node,

1 3 2 5 4

Using lists as queues is not very efficient and there is a Queue class in Python, which is likely to be faster but the beauty lies in the fact that the implementation of the breadth-first iterator is mirror image of the depth-first iterator. As mentioned before, the only difference is that the list is now used as a stack:

  
class DepthFirst(object):

    def __init__(self, node):
        self.stack = [node]

    def __iter__(self):
        return self

    def next(self):
        if not self.stack: raise StopIteration
        node = self.stack.pop()
        self.stack += node.children
        return node

C'mon, admit it! This IS beautiful. Finally, using the iterator generates the following output

  
for node in DepthFirst(root):
    print node,
1 2 3 4 5

That's it.

Wednesday, May 22, 2013

Fun with Hopfield and Numpy

Hopfield networks are fun to play with and are very easily implemented in Python using the Numpy library. Simple as they are, they are the basis of modern machine learning techniques such as Deep Learning and programming models for quantum computers such as Adiabatic quantum computation.

We are going to use a Hopfield network for optical character recognition. This is obviously a toy example and there are much more better machine learning techniques for this kind of problem (e.g. deep learning) but it is a nice demonstration.

Among other things Hopfield networks work as autoassociative memories, allowing to store and to recall patterns, e.g. images of letters. We want to store two letters and then try to recognize noisy versions of those letters. In the following we first create images of the letters "A" and "Z" from a string representation:

  
A = """
.XXX.
X...X
XXXXX
X...X
X...X
"""

Z = """
XXXXX
...X.
..X..
.X...
XXXXX
"""

Hopfield networks operate on binary vectors and we therefore define a function to_pattern() that takes a letter image and transforms it into a bipolar vector by gluing all image rows together, and converting "X" to +1 and "." to -1.

  
def to_pattern(letter):
    from numpy import array
    return array([+1 if c=='X' else -1 for c in letter.replace('\n','')])

For instance, calling to_pattern(A) returns then

[-1  1  1  1 -1  1 -1 -1 -1  1  1  1  1  1  1  1 -1 -1 -1  1  1 -1 -1 -1  1]

To visualize a pattern as an image we write a function display(pattern) that reshapes the pattern vector to a matrix and displays it as an image using the pylab function imshow().

def display(pattern):
    from pylab import imshow, cm, show
    imshow(pattern.reshape((5,5)),cmap=cm.binary, interpolation='nearest')
    show()

Calling display(to_pattern(A)); display(to_pattern(Z)) produces the following pictures:

Having all the basic elements in place let's get serious. First thing to do is to create the Hopfield network from the given patterns. Training a Hopfield network is extremely simple and requires only computation of the outer products of all pattern vectors with each others and summation of the resulting matrices.

$$W = ∑↙{k}↖n \bo p_{k} \bo p^T_{k} $$

The resulting matrix W specifies the weights of the edges between the nodes of the network and the nodes represent the features of the patterns (= pixels in an image). There are variants but typically the weight matrix is normalized (division by the number of patterns) and diagonal elements are set to zero (subtraction of the identity matrix I), which leads to the following expression:

$$W = 1/n ∑↙{k}↖n \bo p_{k} \bo p^T_{k} - I$$

Let us organize all our patterns in a matrix (rows are patterns)

 
patterns = array([to_pattern(A), to_pattern(Z)])
and the implementation of the training formula is straight forward:

  
def train(patterns):
    from numpy import zeros, outer, diag_indices 
    r,c = patterns.shape
    W = zeros((c,c))
    for p in patterns:
        W = W + outer(p,p)
    W[diag_indices(c)] = 0
    return W/r

To recall a stored pattern from a Hopfield network the pattern itself (or a partial or nosy version of it) the dot product of the pattern vector and the weight matrix is computed. This produces a new pattern vector that is binarized and then fed back into the system:

$$\bo p^{(t+1)} = sgn( \bo W \bo p^{(t)})$$

The operation is repeated until the pattern vector becomes stable or after a specified number of iteration has been performed. We keep the implementation simple and stop after 5 steps, which is typically sufficient.

  
def recall(W, patterns, steps=5):
    from numpy import vectorize, dot
    sgn = vectorize(lambda x: -1 if x<0 else +1)
    for _ in xrange(steps):        
        patterns = sgn(dot(patterns,W))
    return patterns

Note that in contrast to the recall formula the code performs the recall on all pattern within the pattern matrix at once, while the formula describes the recall for an individual pattern vector. Also the pattern matrix contains the patterns a rows and therefore the dot product W*p in the formula becomes dot(patterns,W) in the code (Since W is symmetric, transposing W is not required).

To test the network I created noisy version of the training patterns by randomly flipping 10 bits. The image sequence below shows the resulting input patterns and the output patterns after the second and third iteration. As you see the original patterns have been recalled perfectly after three iterations (from left to right).

Patterns are stored as low energy states (attractors) within the system and the recall procedure describes the trajectory of an input pattern to its attractor. Each step reduces the energy and the energy for a pattern is defined as follows:

$$E(\bo p) = \bo p \bo W \bo p^T$$

Here is the corresponding code to compute the energy for the pattern in the pattern matrix.

  
def hopfield_energy(W, patterns):
    from numpy import array, dot
    return array([-0.5*dot(dot(p.T,W),p) for p in patterns])

A few final comments. The Hopfield network does not only store the original patterns but also their inverse versions and possibly linear combinations of patterns. Furthermore the capacity of the network is limited. For random patterns the network can store roughly 0.14*N pattern without errors (N = Number of network nodes). There are many variations of the Hopfield network with different training and recall procedures that give better performance but the beauty of the original Hopfield network is in its simple implementation but complex behavior.

Last but not least cudos to the developers of jqmath, the JavaScript module used here to display the mathematical expressions.

more reading

Hopfield network
Hopfield network
Weave a neural net with Python
Associative recall of memory without errors
Minimum Probability Flow Learning
Finite memory loading in hairy neurons
Error tolerant associative memory

Friday, May 17, 2013

Coding interview questions

In preparation for a coding interview I collected numerous coding questions from the internet. Since they might be useful for other job applicants as well, I post them here.

The question are not sorted in any way, of varying difficulty and there are probably some duplicates. I have fixed only a few typographical errors but the questions are largely as I found them. Don't hold me responsible for bad spelling or grammar ;-)

Two books that I also found quite useful for my preparation were "Cracking the Coding Interview" by Gayle Laakmann and "Programming Interviews Exposed" by John Mongan. Well, here comes the list of questions:



Find a string in a very long string - a needle in a haystack.

Given a binary search tree and an integer k, describe the most efficient way to locate two nodes of the tree that sum to k.

Please describe a data structure that accomplishes the functions INSERT and FIND-MEDIAN in the best possible asymptotic time.

If you have one billion numbers and one hundred computers, what is the best way to locate the median of the numbers?

Say you are implementing exponentiation for a certain processor. How would you minimize the number of multiplications required for this operation?

You are given a grid, with points on the intersections (think a map of streets, people are standing on random corners). Write code to calculate the point on the grid that is the shortest distance from every point on the grid.

Write a function that finds the median of a set of three numbers, also find the Big O. Can it be done with only 2 comparisons, or do you need 3? .

Design and describe a system/application that will most efficiently produce a report of the top 1 million Google search requests.

Design and write a Sudoku solver.

How to computer the longest increasing sequence in an array with only one extra variable.

Implement core functionality of a skip list.

Given an array of different integers, print out all the inversion pairs. (i.e, a[i] > a[j] where i < j).

Given a 2D boolean array. Return the length of the biggest square where all elements inside it is true.

Get the two highest number in a binary search tree.

Find the maximum sum possible in an array of positive integers by selecting the elements in such a way that no two elements are next to each other.

Same as above but another one is taken out again. Find the two missing numbers.

Given a set of strings, a number 'n', and a function that takes a string and gives back a score, find the n largest scored strings in the set.

Given two numbers m and n, write a method to return the first number r that is divisible by both (e.g., the least common multiple.

You are given a text file too large to fit in memory and 3 strings A, B, C. For each string, you have a sorted array listing the positions of the string in the file (e.g., inverted indices). Find the smallest window containing the 3 strings. Order of appearance does not matter.

Given 2 numbers x and y, check if x is an integer power of y. For instance, x = 8, y = 2 would return true and x = 10 and y = 2 would return false since the first one is an integer power but the second isn't.

Suppose you have an arbitrarily connected graph with n nodes. Come up with an algorithm to identify each set of connected nodes (i.e. identify all the islands in the graph). What's the complexity? Can you find a solution in O(n log n)?

Implement a random number generator such that the random number generated is always in a particular range. Perform through time complexity analysis of it. How would you improve the solution.

Get the kth largest number from two sorted arrays.

Find median given two sorted arrays.

Write a function to convert a collection of strings into a single string and a function to convert it back.

Write a method which will remove any given character from a String.

Write a program to find all prime number up to a given numbers.

Write program for word-wrap which should work on any screen size.

Write code for Generate Random No in a range from min to max.

Describe the design and implementation of a Tic Tac Toe game.

Given a graph, find if it represents a tree.

You have a sorted array of real numbers drawn from the uniform distribution between 0 and 1. How can you quickly find a number in it?

If you have a ransom letter and magazines, can you construct the ransom letter from the words in the magazine.

Find the intersection of two integer lists.

Write a program that reads a file, randomizes it's lines and writes the output to another file.

Find the balance point in an array. (The index where the sum of the elements to the left it is the same as the sum of the elements to the right of it.) .

Implement iterator for a list of lists which may contain empty lists.

Given a list of sorted words, output the order of the letters used to sort them.

Given a base 10 number, print the hexidecimal (base 16) representation of that number.

Sort integer array in linear time.

Given a bunch of N random points in space, how would you draw a line such that N/2 of them were to the left of the line and N/2 to the right of it. Complexity.

In a sorted array, find the number closest to a given number .

Describe the algorithm for a depth-first graph traversal.

Given List of Strings. Return a list of lists of Strings where each to anagram Strings in the input list exist in the same list in the output list.

How would you calculate the number of 1's in a list of 10,000 integers.

Given a file consisting of names, what data structure would you use to valid if a name is in the list. What if now the we say a name is valid if the if it differs no more than one character against a name in the file?

Write a method that generates a random sequence of numbers of specific percentages.

If you have 10 bags of marbles with 10 marbles each and one bag has marbles that weigh differently than the others, how would you figure it out from one weighing.

Given a binary tree, print out the whole tree by levels.

Implement a function that takes a string with '?' marks and returns all possible values when '?' is replaced with '0' or '1.

Implement a hashmap.

Write function to compute n-th Fibonacci number? Both iterative and recursive?

How do you reverse a singly linked list?

How do you find depth of binary tree?

Write code to print Inorder traversal of a tree?

Print out all leaf node of a binary tree?

Write a program to sort numbers using quick sort?

Write a program to implement binary search algorithm.

In an array 1-100 exactly one number is duplicate how do you find it?

Given two arrays, 1,2,3,4,5 and 2,3,1,0,5 find which number is not present in the second array.

How will you determine two graphs are the same.

What is the strategy pattern in Java.

Find the number of connected components in a Graph.

Write code to find the second shortest path between two given nodes in an undirected graph.

Write a function to return if a number is a palindrome (eg, 113848311) .

Reverse the word order in a string.

Reverse bits of a digit.

Write a method that finds depth of a (non-balanced) binary tree.

Convert decimal number 99 to base 7.

Define an algorithm that converts a string to an integer without using a built in method like int() .

Write a program to find depth of binary search tree without using recursion.

Write an iterator over multiple collections.

Most phones now have full keyboards. Before there there three letters mapped to a number button. Describe how you would go about implementing spelling and word suggestions as people type.

Given a dictionary of English words, find sets of anagrams. For instance, “pots”, “stop”, and “tops” are all anagrams of one another because each can be found by permuting the letters of the others.

Although Quicksort uses only O(log n) stack space on the average, it can use linear space in the worst case. Explain why, then modify the program to use only logarithmic space in the worst case.

Print all permutation of String both iterative and Recursive way.

Write code to check a String is palindrome or not?

Write code to check whether a number is power of two or not?

Design an algorithm to find the frequency of occurrence of a word in an article?

Large data set, in memory sorting with limited RAM.

Implement a special type of integer iterator. It will be a wrapper over a regular integer iterator, except that this iterator will only iterate over negative numbers. Show how you'd write the next() and hasNext() functions to work with the next negative integer instead of just the next integer.

Sorting in general with big-O complexity, followed by partition sort (quick sort), followed by finding the k-th largest element without full sorting.

Implement peek() and pop() from java iterator(). for example [1,2,3,4,5], peek() = 1, pop() = 1, peek() = 2, peek() = 2, pop() = 2 .

Mutable, non mutable classes in Java.

Declaring constants in Java.

Count the number of set bits for an 32 bits integer. How to improve the process for the second time use if the memory is unlimited.

What is the complexity of Quicksort? (mean and worst case).

Find the minimum window in a string which contains a list of characters .

Implement a set class with the following method: get(), put(), getRandom() .

Given a graph node, write a code to return a copy of the sub-graph starting with that node.

Create a Java method that clones a DAG (Directed Acyclic Graph).

Describe preorder, inorder, postorder.

What's the difference between a hashtable and a hashmap (in Java).

Explain the difference between Array Lists, Linked Lists, Vectors, Hash Maps, (from Java's JDK) etc. and when one choice is better of another.

Delete an element from a singly linked list in Java.

Out of 10 coins, one weighs less then the others. You have a scale. How can you determine which one weighs less in 3 weighs? Now how would you do it if you didn't know if the odd coin weighs less or more?

Addition of 2 big numbers (represented by 2 int arrays, assuming each integer can only be 0-9) .

Compute all the intersections of two sets of segments in a line.

Given a set of cities, each with a given population, select randomly a city with a probability that is proportional to the population.

You have a collection of html documents grouped by a source web site . You need to count frequencies of bi-grams across all documents and present it in a sorted order (may be return top N frequent bi-grams). How would you approach it? How to filter out non-significant bi-grams? How to distribute and merge? What data structures to use to keep counters?

Write code to shift a string with rotation [a, b, c << 2 = 2 c, a, b].

Convert a one dimensional array of Strings containing [a,b,c, d,e ...z] to 2 dimensional array of a2[aa,bb,cc] .

Write a string splitting function that does not copy the string.

Partition an array in such way zeros to be moved on the left side of the array, other numbers on the right side of the array. Extra storage not allowed, only in-place.

Write an algorithm to test if n is power of 2.

How to replace a string in a templatized string where each placeholder starts with a specific delimiter, you are given a map that maps the placeholders to associated values.

Ref to the game, Boggle, given a 3x3 grid of letters represented as a char array, generate all possible word combinations. The legal character sequences are horizontally or vertically adjacent chars. So in [A, B, C; D, E, F; G, H, I], ABCFI is legal but DFH is illegal.

Write the code to determine if a 2D point is inside a 2D closed polygon.

Implement a blocking queue in Java.

Implement heap-based priority queues to run as quickly as possible; at what values of n are they faster than sequential structures.

Algorithm for converting uniformly distributed random numbers into a logarithmic distribution.

Given a picture of a skip list data structure, implement the algorithm to maintain skip lists (add, search, delete).

Base class has one method takes argument as Object and inherited class overrides it to pass String as parameter.If called with argument of Object whcih method will be called.

Design and implement a garbage collection algorithm in any programming language. also needed to give out run time analysis.

you have a sequence where each number is a multiple of 2 or 5 (so: 2^i * 5^j). he gave the beginning of the sequence as 1,2,3,4,5,8,10,16... and asked me to find an algorithm to calculate the next number in the sequence.

Write code to check whether every element of an array has been visited (you are given a starting position and each position stores a pointer to a subsequent position).

Sample uniformly on a circle centered at zero with radius 1.

Write algorithm to compute a Log to the base 2 of a number (integral results no need for floating point). Solution should not assume a particular size of integer .

Implement a base 3 adder which takes two strings as input and returns a string .

Quad tree bitmap: describe data structure and an algorithm.

For a n-node tree, you are given a left leaf node pointer and right leaf node pointer. Write a function to find the number of nodes between the leaves pointed by the left and the right pointer.

You have a simple search consisting of search terms that are OR'ed together, ie: Cats AND Dogs. You have a set of documents possibly containing those terms and you have the positions of those terms in each document. How would you search for Cats AND Dogs.

You're given a string, and you want to split it into as few strings as possible such that each string is a palindrome.

What are the issues with floating point calculation? For example, 1.0f + e-30, is there any issue with the statement.

Given a sorted array A[1..n] with n integers, and an integer t, find all pairs (x,y) of elements in A such that x+y is smaller than t.

You are given a huge stream of geo coordinates. Unordered. Return list of objects in a specified radius from some point X.

Write classes representing expression tree for a simple calculator. You care only about constants and basic binary operators. Write function evaluating expressions.

Suppose you have a set objects, identified by a unique name. How do you store them so that you can retrieve them easily?

Given a tree and leaf node, write the code to make the leaf node as root of tree.

Find the longest word in a dictionary, such that the word can be built one character at a time, and all substrings are words in the dictionary. A new character can be added anywhere.

You have a n number of cities. Lets say city 1 has some information that needs to be sent to all other n-1 cities using minimal cost. Cost between each pair of cities is given. any number of cities can transmit the information once they receive the information but the overall total cost should be minimum.

Write a function to modify a singly linked list, appending copies of the first N-1 elements to the end in reverse order. For example, ['A', 'B', 'C', 'D'] becomes ['A', 'B', 'C', 'D', 'C', 'B', 'A'].

Given an array of structs: write a function that finds equal values in all the structs in the array.

How to find the max number of a shift buffer queue. For instance, there is an array like 5, 6,8,11,1,2, how to find the max number, which is 11 in this example.

Given inputs from Google Search, you have K chunks. Each chunk is individually alphabetically ordered (apple, banana, cat) ... (*apple, *banan, *cat). You want to merge all chunks into a single list. How would you do it? What limitations are there to your approach?

What's the difference between finally, final and finalize in Java.

Can an abstract class be instantiated in Java? And does it have a constructor.

If we had a list of n nodes, what are the maximum number of edges there can be for a directed acyclic graph.

Write an algorithm for integer multiplication.

Design a class to serialize / deserialze a graph.

Describe a method to sift through dozens of emails and rank them based on 'importance'.

Given a list of integers of at least length 7, print the average of each consecutive 7 number long subsequence .

Write a memory manager oriented towards millions of very small allocations (less than 10 bytes).

Design an in memory word processor. Implement cut, copy, paste, formatting, recoverability.

What's the difference between a non-exclusive lock and an exclusive lock.

Generalized tower of Hanoi problem (n pegs).

5 stacks of coins that looks identical. each stack has 5 coins each. 4 of those stacks weigh the same, 1 stack has coins that weigh exactly 1 pound less. using a digital scale once, find out the position of the stack that is less weight.

Given standard 3x4 telephone touch pad, give all number combinations of letters you can make using only the chess "rook" move (ie, L shape: 3 down, 1 over). I think it was all 7 digit combinations. Letters are standard phone letters (2 button is "ABC", etc). * and # not valid.

Implement a program to play the battleship game.

Serialize and deserialize a collection of strings into a single one.

Design an iterator for a collection of collections in java. The iterator should hide the nesting, allowing you to iterate all of the elements belonging to all of the collections as if you were working with a single collection.

Millions of cache entries, each with a unique key, stored in several servers. how to sort all entries whose key falls within a specified range using Map Reduce? What if one server has a very small range of keys (say 1 to 100) and another server has a very large range of keys (say 100 to 1000000)? .

Rotate a one-dimensional vector of n elements left by i positions. For instance, with n=8 and i=3, the vector abcdefg is rotated to defghabc. Simple code uses an n-element intermediate vector to do the job in n steps. Can you rotate the vector in time proportional to n using only a few dozen extra bytes of storage?

Write functions for the following date problems: given two dates, compute the number of days between them; given a date, return it’s day of the week; given a month and year, produce a calendar of the month as an array of characters.

Build the fastest possible complete function to generate a sorted array of random integers without duplicates. (You need feel constrained to use any prescribed interface for set representation).

How do you sort Java object using Comparator?

Was asked how I would implement division without using division or mod operators. The answer should return in the form quotient r remainder.

Write a program to implement blocking queue in Java?

Write a program for producer-consumer problem?

Estimate the total storage size of GMAIL.

How to add a counter to www.google.com to track the billionth user.

You have a 64bit integer counter set to 0.

How long it will take to overflow the counter given that you are incrementing it at 4Ghz speed.

Quickly estimate 2^64 without using a pen/papar.

Given an integer, its digits are stored in the array, the very last digit is the very last element of the array. Write a function that adds one to this integer.

Given a large network of computers, each keeping log files of visited urls, find the top ten of the most visited urls. The result list must be exact, and the maps are too large to transmit over the network (especially sending all of them to a central server or using MapReduce directly, is not allowed.

A specific implementation of B-Tree when looked from one side.

You are given a sorted list of disjoint intervals and an interval, e.g. [(1, 5), (10, 15), (20, 25)] and (12, 27). Your task is to merge them into a sorted list of disjoint intervals: [(1, 5), (10, 27)].

Given a picture of pixels represented with RGB values, convert it to 256-colors (each pixel represented with an index from a 256-sized table holding the actual colors) .

Implement a counter with a method to retrieve the number of increments in the last 60 seconds.

How many ways can you make change given the following denominations? ie, if numCentsToMake is 6 and denominations is [25, 10, 5, 1], then it should return 2: either a nickel and a penny or 6 pennies. int numWaysToMakeChange(int numCentsToMake, int[] denominations) .

Implement a stack that pops out the most frequently added item.

Discuss and implementation of flattening a binary search tree into string representations.

How would you go about testing a distributed system such as Gmail, before releasing it to the public. How would you simulate realistic server load ?

We have m slots for ads and n ads, each ads will have different revenue on different slot, design an algorithm to find out the best fit (find m ads in n ads and order them so that they can make max money,.

Given a word that consists of two words such as "aman" give all the combination of valid two words. You are given a dictionary to test against. Example: aman--> a man, am an. watermelon--> water melo.

How to design a search engine? If each document contains a set of keywords, and is associated with a numeric attribute, how to build indices?

how many ways you can write n as N summation of other numbers 5 = 1+ 2 +2 = 2+ 2 + 1 = 1 + 3 +1 = 1 + 4 etc.

If a user types in a n digit number on the telephone, how do you write a function to deduce if the number constitutes a valid word. For example, if the user enters 123, then can a valid word be made out of (a/b/c) + (d/e/f) + (g/h/i) .

How to select a random sample of size K from a stream of numbers.

Implement a fast exponentiation function.

Write a function to calculate the angle between hour and minute hand on a clock.

Write a program to compute if one string is a rotation of another.

Given a set of integers from 1 to 100, find out the number of combination that sum up to 100. Each integers can only be used once.

You need to write a function to climb n steps you can climb either 1 step at a time or 2 steps a time, write a function to return number of ways to climb a ladder with n step.

How do you find middle element of a linked list in single pass. How do you find 3rd element from last in single pass.

How would you determine if someone has won a game of tic-tac-toe on a board of any size?

Given an array of integers which is circularly sorted, how do you find a given integer.

Find a loop in a linked list. How long is the loop.

Algorithm to get Index of a permutation when all permutations arranged in lexicographical order and its complexity.

Given two binary search trees, write function which tells if two such trees are the same – (i.e. same info in the nodes, same branching to left and right at every node.

Come out with an algorithm for getting the column number provided the column name in a excel sheet and vice versa. Excel has a naming convention of A,B..Z,AA,AB,AC..ZZ,AAA... This had to be converted to the column numbers. A will be 1 and AA will 27. Also the algorithm to find the name provided column number.

Write a function that computes the intersection of two arrays. The arrays are sorted. Then, what if one array is really larger than the other array.

Find the common numbers in two text files.

Given a list of integer e.g. (1,2,4,5,6,7,8...). Find a pair with a given sum.

Find 3 numbers in an array adding up to a given sum S.

In a sorted matrix (sorted by rows and columns), find an element.

Find kth largest element in unsorted array.

Given the pre-order and in-order traversing result of a binary tree, write a function to rebuild the tree.

Given two strings. Remove the letters in order from the first string that are in the second.

Create a function to return the angle between the clock hands if given the hours and minutes.

Giving a windows size K and an array of size N, find the minimum of each window as it slides through the array.

Write code for Fibonacci algorithm (iterative or recursive) and explain what's the performance.

For a given maze, how to write a function to check if there's a way from the top-left grid to bottom-right grid with obstacles in the middle.

Given an array of numbers, there is one number that has a duplicate. How would you find the number.

Find element in circular sorted array.

You are climbing a stair case. Each time you can either make 1 step or 2 steps. The staircase has n steps. In how many distinct ways can you climb the staircase.

Remove a node from a singly linked list without knowing the head node. All you have is the node itself.

Given a list of strings return the substring representing the longest common prefix .

Find longest common substring.

Write a priority queue<E> with three methods. Insert(E Element, int priority). E Pop(), update(E Element, int new_priority).

Describe what happens when making a google search, specifically with client and server .

Implement a binary tree iterator class .

Given search terms, find the minimum window containing all the terms in a string .

How would you find the power set of a set of numbers .

How would you implement a stack to achieve constant time for "push", "pop" and "find mininum" operations.

You are given a Binary-Search-tree. Find duplicated values.

How to reverse the words in a sentence in place.

If you have a computer with no division operator, how do you do it .

How to find two missing integers in an unsorted array.

You are given a number of coins, how can you generate the probability of 1/3, 2/3, etc.

How would you sort an array of one millions integers.

Given a grid that contains rectangles, write a function that will return all the rectangles that overlap with each other.

You have all of the prices for a given stock for the next year. You can buy once and sell once in that year. How do you determine when to buy and sell to maximize your profit?

Write an algorithm to sort and merge 2 LARGE data streams on a system where storage is essentially unlimited (but slow) and RAM is limited.

Find the median of a very large dataset distributed over several network nodes.

Given unsorted sequence of billions of numbers that cannot all fit in memory at the same time, find the median of these values.

Implement a method that matches an entire string with star wildcard pattern, e.g. returns true for ("*ogle", "Google"), but false for ("fragile*", "agile") - without using regular expression language support.

How to count frequencies of characters in a string. What if the string is too huge? When is it reasonable to distribute across machines? How to find the most frequent character in the distributed scenario with a minimum data exchange between machines?

Write a program for finding the kth-smallest element in the array x[0...n-1] in O(n) expected time. Your algorithm may permute the elements of x.

You are given two sets of integers, sizes M and N with M < N. Perform inner equal join on these two sets (i.e., find intersection of the two lists). How to perform it if both the lists are in files and the available memory is of size K < M < N.

You are a parking lot attendant in a lot that has one open spot, and you want to move the cars from their original positions into a new arrangement. Create a program that will print out instructions on how to move the cars most efficiently.

Given a sequential file that contains at most four billion 32-bit integers in random order, find a 32-bit integer that isn’t in the file (and there must be at least one missing...why?). How would you solve this with unlimited main memory? How would you solve it if you could use several external files but only a few bytes of main memory?

Given a file containing at most ten million 7-digit integers with no duplicates. What is an efficient way to print these numbers in ascending order using just 1.5Mb RAM and reading the data just once? What are the consequences of only having 1Mb of RAM and no other storage? How would your answer change if duplicates were permitted?

N pots, each with some number of gold coins, are arranged in a line. You are playing a game against another player. You take turns picking a pot of gold. You may pick a pot from either end of the line, remove the pot, and keep the gold pieces. The player with the most gold at the end wins.

Transmute one word to another by the shortest path of 1-letter changes through a path of actual words.

Create a cache with fast look up that only stores the N most recently accessed items (LRU policy). O(1) time complexity is expected.

Find the smallest window/substring in an string that contains all letters in a given string.

Write a function to find out longest palindrome in a given string.