Saturday, December 4, 2010

Call for paper and notable conferences

 Unisys India's Cloud 20/20 Version 2.0 is an online technical paper contest. 
The topics being focussed on are 
1)     Automation for Cloud Computing
2)     Virtualization (server, storage, networks)
3)     Application Development for the Cloud
4)     Moving Workloads from Datacenter to the Cloud
5)     Consumerization of IT
6)     Cloud Computing for Airports – Solutions and Benefits
The contest is for
1) Post-graduate with focus on computer science and related branches
2)  Pre-final & final year engineering students (BE/B.Tech in CS, IT and related branches) 
   The site says that the students submitting papers could have career opportunities at Unisys.
Last date for submission of abstract is 12th December.

The Computer Society of India’s International Conference series on Software Engineering (CONSEG)-11 has the theme “Software Quality- the Road Ahead
The conference will have both invited and contributed papers. World leaders in this area are delivering key note talks on advanced topics. These will be combined with Plenary Talk by Microsoft and Invited talks by stalwarts from Indian IT companies. Presentation of Talks and by Posters of contributed papers is also planned.
Some notable speakers are::
 Alain Abran,   Professor ,
  University of Quebec
  A Maturity Model of Software Product Quality
  Bill Curtis, SVP and Chief   Scientist, CAST
  Measuring and Managing the Non-Functional, Internal Quality of IT Software: The Next Wave Hitting Application   Customers and Suppliers
  Dan Galorath, CEO,
  Galorath   Consulting
  More Successful Projects With Viable Estimates: A 10 Step Estimation Process With Emphasis on Parametric Estimation
  K S Trivedi Professor,
  Duke University
  Software Aging and Rejuvenation
  Murali Chemuturi,
  Author and Consultant
  Requirements Management



Tuesday, November 16, 2010

Microsoft interview and written round

written rnd for MS had 5 quest ::
1) a simple quest to debug a given node from linklist :: remember dont rewrite the whole code again .. point out bugs and right code for tat
2) 2nd quest was a given code to find output, it ws the recursive LCS problem
3) 3 rd was the code to find an element in rotated array
4) 4th was to write test cases for server client 
   like client sends req : GET file1.txt  and server responds by sending file 
 5) Design a mobile application to allow blogging experience on the phone 

 Its always better to ask quest frm them to clarify as I did in 5th quest

Interviews::
1st interview 
 1)project quest .. project was antispamming so he asked abt ways to SEO
 2)2 arrays given first of size >(M+N-1) with M elements filled up 2nd array has N elements , both sorted .. return merged sorted array  "test it"
 3)Program Manager question : ways to improve Ranchi , my client is a Municipal Corporation Commisioner

2nd inerview
 1)spiral array print 
 2)write code to see if two rectangles overlap : simple rectangles, sides parallel to x or y axis 
 3)iterative tree traversal

3rd interview
 1)Project again : Anti-Spamming
 2)Given an array of size M*N , the place where 0 is present , say A[i,j] is 0 then make all elements of row i and all elements of col j 0.. for all 0s present in original array  "test it"

4th interview
 1)implement stricmp (ignore case strcmp)
2) 100 Kb block of memory given ... you have to allocate queues from the memory such that 
  - number of queues are dynamic
  - size of queues is dynamic
  - use 90% of space effeciently
  make class  for queue with behavior
                       getByte(char c) :: get a byte from memory and put char c there
                       putByte() :: send a byte back to memory                              

  I was finally chosen for Developer's profile 

Sunday, October 24, 2010

Google Scribe : An autocompletion service

Google Scribe was launched this september. Its beta phase looks deficient but promising. I had the notion of such a auto completion of text . Imagine yourself typing at a document and the autocompletion service saving your time. I am trying to extend the same idea over to Rhyme - Seeker ,"a ranked autocompletion of poetry". I saw a similar idea at ISI for Creative language use. I believe that the technology of Scribe could be extended to achieve the task.
       Google Scribe seems to be straightforward application of web n-gram language models covered in an AJAX interface. Some of its mistakes demonstrate the drawbacks of not utilizing long range word dependencies and topical context. It also adds numerous options for searching.
  Google states, “Scribe’s suggestions indicate correct or popular phrases to use". Most probably , as I aslo experimented with it, Google Scribe doesn't personalize the autocompletion yet. It would be amazing to see scibe getting enhanced and added to google docs .

The new Lucene Based Search Architecture of Twitter

A new backend search architecture that was launched few weeks ago. The previous system was based on the original Summize search system that Twitter acquired in 2008. The old system was based on MySQL. This system became difficult to scale for such large data. They turned toward a new, modern search architecture that is based on a highly efficient inverted index instead of a relational database, which is the obvious choice for large data indexing for quick search.
                   They did lots of tuning for improving the current version of  Lucene. Some of the highlights of the  changes include:
  • significantly improved garbage collection performance
  • lock-free data structures and algorithms
  • posting lists, that are traversable in reverse order
  • efficient early query termination 
We can hope to find these contributions to the Lucene soon.

According to the blog, the new search system was designed to handle over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day . Besides the challenging query volume, the data needs to be available quickly, a tweet needed to be searchable in less than 10 seconds.
The main benefit to users is that the new system is much more scalable and can support an index that is twice as large as previous versions which means that you can search for tweets further back in time.

Friday, August 20, 2010

Tejas Networks Software Written and Interview

The written part of Tejas 4 sections
1:Apti 10 questions
2-C apti
3- General Comp. Sc. : Networking, OS, compiler
4-Coding Section

Time:2 hrs
All sections had their sectional Cut-off

4-The Coding section consisted of 4 questions:
*Find the maximum Height/ Depth of a binary tree
*There are n stations numbered from 1 to n stored in an array
the distance between any two stations is given as::
D[i][j]=0, if no path exists between the stations i and j
D[i][j]=n , if the distance between two stations i and j is given to be n
The stations are arranged in a ordered linear sequence , like S1,S2,S3......Sn
*Given a tree store it in a file and retrieve it back from the file
*A file is given with many 0s stored in continuous way  , store it in another file such that when you store try saving the space by using minimum amount of space. When you want to create the original file , you should be able to do so with the new file created. Hint was given so as to utilize the lseek function (a fuction in the Unix)

3-General Comp. Sc. : Networking, OS, compiler
*Write down the Java Socket APIs
*What is deadlock? How do you prevent it?
*What is IPC . GIve some examples
*There were two questions based on Linker and Loader . How Compiler converts the HLL to Machine code and where it is stored ,etc
*Difference between IP and MAC , the two address
2- C apti(10 questions)
This section was easy as compared to other section
* two questions were from Pointers .. a 2-D array was declared and various values like (values of arr , of arr[1] , etc were asked )
*A question was asked from Function pointer, a function pointer had function pointer as its arguments . The question asked to identify the arguments and return Type.
*Virtual
*

1- General APTI(10 questions)
*The number 45 is broked into four parts , s.t. The x-2,y-2,z*2 and w/2 accounts to same value find the four words
*Identify the next in sequence:: Z,O,T,T,F,F,S,S,E,N
*In puzzle world one dozen of Pear costs Rs.16, Guava costs Rs. 20 and Grapes costs Rs. 24. How much does Mango costs?
*A car has speed of 72 64 56 in downhill, plain and uphill respectively . A guy travels in the car from Pt. A to pt. B in 4 Hrs and pt. B to pt. A in 4 Hrs and 40 min. what is the distance between A and B?
*A cube 9 27 , hw many coloured 0 ,1,2,3
*A Spider has 8 legs and no wings, Grasshopper has 1 pair of wing and 4 legs  Dragonfly has 2 pair of wings  , 6 legs  . I see 118 legs and  20 pair of wings. How many Spider, grasshopper and Dragon fly are there?
*ABCDE are the digits in order of a 5 digit number and if it is multiplied by 4 it becomes the number EDCBA


Interview Round
*Tell me about yourself
*Tell me about any of your Projects
*Given the string "My name is Sujeet" , list out all possible ways to convert it to "Sujeet is name My".
*Given a link list , give as many ways to reverse it.
*Difference between a Thread and process
*If you have a 32- bit system waht is the maximum possible size of Virtual memory it can can support (later asked Virtual memory only at an instance). what will happen if I put a 6GB RAM in this system
*Implement malloc in C
*Segmentation fault and Stack growing direction question came up during discussion
*Given a network connecting many points , how would u find if it has any disconnected points
*Give any implementation of Thread
*A pattern was given and I asked to loudly think about approaching it
*Why is MAC required other than its security reasons

Friday, July 23, 2010

Search Quality at Yandex

Yandex::The search engine used most in Russia. Here are some of its statistics and information ::
 
Russian Search Market
- Yandex has 60+% market share
- It's all about small attention to details about the search

A Yandex overview
- started in 1997
- no 7 search engine in the world by # of queries
- 150 million queries per day

Variety of Markets
- 15 countries with cyrillic alphabet
- 77 regions in Russia
-> different culture, standard of living, average income, for example: Moscow, Magadan
-> large semi-autonomous ethnic groups (tatar, chech, bashkir)
-> neighbouring bilingual markets

Geo-specific queries
- Relevant result sets very significantly across regions and countries

pFound
- a probablistic measure of user satisfaction
- optimization goal at Yandex sinces 2007
- Similar to ERR, Chapelle 2009 --> hopefully someone can fill in the exact formula
- pFound, pBreak, pRel

Geo-specific Ranking
query -> query + user's region
- may need to build a specific formula for countries/region because of the variance and missing/lacking features in some of them.

Alternatives in Regionalization
- separate local indices or unified indx with geo-coded pages
- one query or region specific query
- query based local intent detection vs. results based local intent detection
- single ranking function vs. co-ranking and re-ranking of local results
- train one formula or train many formulas on local pools

Why use MLR?
Machine learning as a conveyor
- Some query classes require specific ranking
- many features

MatrixNet
A learning method
- boosted decision tree, "oblivious" trees.
- optimize for pFound
- solve regression tasks, train classifiers

Complexity of ranking formulas
20 bytes - 2006
14 kb - 2008
220 kb - 2009
120 MB - 2010

A sequence of More and More complex rankers
- pruning with the static rank (static features)
- use of simply dynamic features (such as bm25)
- complex formula that uses all the features available
- potentially up to million of matrices/trees for the very top documents
- see camazoglu, 2010 early exit optimization

Geo-dependent queries: pFound
- a big jump in 2009 in Quality
- 3x more local results than competitors in Russia, than #2 player

Lessons
- MLR is the only to regional search: it provides us the possiblity of tuning many geo-specific models at the same time.

Challenges
Complexity of the models is increasingly rapidly
-> don't fit into memory!

MLR is in its current setting does not fit well to time-specific queries
-> features of the fresh content are very sparse and temporal

Opacity of results of the MLR
- The backside of ML

Number of featuers grows faster than the number of judgments
-> hard to train ranking

Learning from clicks and user behavior is hard
Tens of GB of data per day!

Yandex and IR
- Participation and Support
- Yandex MLR at IR context

Microsoft Releases Learning to Rank Datasets

Microsoft Research announced that it is releasing a new MS LTR dataset
         After the Recent Learn to Rank Contest held by Yahoo! , Microsoft has presented a similar Dataset :: I am presenting both the dataset's statistics ::

Yahoo! ::. The first dataset had:
29,921 queries
744,692 URLs
519 features
Microsoft :: Released two large scale datasets for research on learning to rank: MSLR-WEB30k with   more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 queries.136 features have been extracted for each query-url pair.The dataset is a retired dataset. What makes this quite interesting is that the features have been released.
You can see the feature list.

See also the Y! LTR datasets.