eBay

Title: 1. Seller Performance Prediction Seller Standards

Sellers/CS are complaining that client does not give them a heads-up if a seller's performance is going south. The idea is to predict a seller's performance and provide actionable insights to a seller if we see a strong likelihood that their performance is about to be impacted in the near future. The intern will help in increasing precision and recall of the machine learned models used to predict seller performance. The models will also help identify what variables are causing the seller's performance to drop and how they could improve. As part of this project the intern will closely work with CS ops in sharing these insights with our managed seller accounts.

Skills: Java/J2EE/C++ and Oracle/MySQL. Eclipse, Clearcase/GIT)



Title: 2. Seller Dashboard 2.0 - Test New Concept Seller Standards

Today's Seller Dashboard is a one size fits all application that serves as a report card on a sellers performance. The dashboard does not provide actionable insights to a seller on how they can improve their performance. The intern will help build a new prototype that can be tested on our garden that will help address some of the above concerns. The prototype will be built using Raptor. The intern will work on a short paced 2 week cycles to make quick changes to the new dashboard test variants and release them to the Garden. The intern will work closely with Product and UED to incorporate, test and analyze seller engagement of the new dashboard.

Skills: JSP, JSTL, JAVASCRIPT, AJAX, JQUERY, CSS, HTML5



Title: 3. Trading-API - Improved Regression Suite Checkout Platform

Today we have about 1.2M hits/day to GetOrders and 12M hits/day to GetItemTransactions. There is no existing regression suite for Trading Apis to monitor the quality of our code rolling to production. Our team has developed a basic version for testing trading APIs. Need to invest more time to solidify the Testing Suite. This can be QE/PD/DTS joint initiative. The intern will work closely with PD/QE and DTS to add more test cases and enhance the regression suite.

Skills: Java/J2EE



Title: 4. Complete End2End Visualization, monitoring and alerting tool for MMP Checkout Platform

MMP consists of multiple services, BES consumers, batch jobs, and web applications. All these need to interact and collaborate with each other correctly for the system to function. We have an existing PI DB Query Tool is at http://px-wv00-fc85007.phx.client.com:8080/admin/jsp/pidashboard/PIDashboardIndex.jsp. The tool retrieves data from different components. Adding visualization to the data and intelligence to highlight inconsistency among systems can be super powerful for triaging and can be leveraged by EWatch and CS. The intern will work with a mentor to add visualization to the data and add logic to check for data inconsistency.

Skills: LAMP stack or Java/J2EE



Title: 5. Using Hadoop for Real-Time Data Validation

Checkout Platform We have SOA based distributed components and Data validation is powerful to identify proactively any quality issues in our systems. In Phase-1of Payment Intermediation, We used data warehouse to be our source of data for validation. Data warehouse has a lag in data availability and cannot give real-time validation. The proposal is to use Hadoop for real-time data validation. The intern(s) will work with a mentor to provide a framework for real-time data validation.

Skills: Java/Hadoop



Title: 6. Vertex POC on Cloud using MySQL

Checkout Platform This POC would set up Vertex tax engine on cloud with MySQL to increase the scalability and availability of the tax engine.

Skills: Java/J2EE



Title: 7. Shipping data Shipping

Intern will identify key dimensions of shipping data: quantities of shipped items, time to ship, size/weight, cost. Data will be mined to establish baselines for the entire set of the data. Then we’ll look into various anomalies: location specific (i.e. some states had unusual drop in shipping quantities) or temporal (i.e. there is a general unusual drop in average price of items shipped). The biggest anomalies will be studied and intern will try to dig deep into their root causes. Some of them will be noise, some of them will be related to the external events (snowstorm for example), some of them will be related to the live site issues. Then intern will experiment with potential usage of these findings, for example we might be able to have good leading indicators of the live site issues to happen. Intern will also work with various data visualization technologies, making his/her observations easy to present.

Skills: Data mining, machine learning, databases, Java.



Title: 8. Performance Benchmark numbers. Core CartService

Core CartService is going to be the checkout platform for various clients like Mobile, SBC, eCAS etc. CCS is the interacting hub for other services like OMS, OPMS, Shipping etc. for these clients. Having performance benchmark numbers through our Hudson/Jenkins nightly build is a necessity. Intern will help us in evaluating and setting up open source framework performance plugins like OpenSTA, JMetr etc. on our nightly CI setup.

Skills: JAVA, ANT, DATABASE



Title: 9. RESTFUL Pricing Service on GINGER Core CartService

Our Architecture For MMP has a need for a pricing service. Our internal SOA team has released Ginger1.0 last week using open standards and REST APIs. Need to make use of this platform to develop a prototype of REST based pricing service. Right now most of the pricing service logic is tightly integrated with core cartservice, we can get help from intern to decouple and provide this as a separate REST based pricing service.

Skills: JAVA, J2EE, WebServices, XML. JAX-RS - optional.



10. Preparation for open source of eRules Fraud eRules, while designed to be a framework and as standalone as possible, has dependencies on certain V3 libraries and client services, such as the rules publish service is based on client FTS, such as certain service consumer depdencies in rule builder, such as client RulesNewHost to persist rules. The goal is to clean up the depdencies, so that eRules can be shipped as a standalone product on eBox. With the project, also scan and remove client specific or indicating documentations from source code. Remove dependencies on commercial software such as clearcase. As a result, eRules should be buildable and deployable outside client environment.

Skills: Java, Eclipse RCP, OSGi



11. Scalability of Elvis development via EDE plugin Fraud Elvis application code (client, server, RBO variables) follows known patterns. In order to reduce development time, improve consistency of code, reduce bugs and reduce complexity, automation of development of these components will be needed. One of the ways is to develop an EDE (Eclipse) plugin that can prompt for a set of known configurations, and can codegen the implementation or at least the skeleton that can guide developers to fill in missing details. The outcome of this project is that we can enable broader teams to develop elvis applications without specific knowledge.

Skills: Java, Eclipse Plugin



12. Fraud monitoring and response system Fraud Improve and consolidate various monitoring and reports available today, to have a holistic view of health, including both infrastructure and business metrics. For example, in current elvis filter hits monitoring, noises prevent us from reacting quickly to the real issues. Combining multiple dimensions of data points, filter volume, traffic, fraud trend can help filter out noises. For example, overlay multiple metrics/monitoring may help quickly identify the root causes and impact. Intern may use machine learning to gain insight of the clustering of problems vs. symptoms.

Skills: Machine learning, Analytical skills, Web development



13. Fraud: self-service triage portal Fraud & eBP/Resolutions We are working on collecting massive amount of data points around fraud decisions. These data can be used to troubleshoot, and to answer pressing questions. Build tools and derive understanding of the data, so that questions can be answered automatically. For example, for a high GMV ATO item, did detection flag the item, was the item delayed, how did the item gain conversion, etc. Up till now, it's a manual process for engineers to search CAL, to query production database, to interpret the results. The goal is to streamline these techniques, so that business users can arrive at the same conclusion without relying heavily on PD.

Skills: Analytical skills, Web development



14. Buyer sentiment root cause analysis. Feedback - Trust

Mine Feedback Text to look for root causes of pleasure/ displeasure. This project would be going deeper than the typical keyword based analytics. Feedback text is very short and is not well formed/formatted grammatically. Separate the generic Feedbacks like 'A++++++' stuff etc and bubble up the key issues that result in customer delight. Handle the issues like sarcasm in the buyer comments and other indicators of disengagement. Once the sentiment reliably mined, there is a potential for this to be used in Seller Dashboard so as to encourage good behaviors. Also these could be used as variables in the Fraud/Risk models.

Skills: Text Mining, NLP, Machine Learning, one or more of programming languages like java, Python. Distributed Computing environment like Hadoop



15. client Forum's Mining Forum Mine client Discussion Board Forums to highlight the issues faced by client Community. Run topic modeling to extract key themes. Quantify the distress in each theme and create a dashboard to be used internally by Execs and business leads to track the pain points.

Skills: Text Mining, Topic Modeling.



16. Photos-Trust Photos Correlate the Image Quality and Authenticity of Images with Trust/BBE. Buyers are confused and misled by the usage of professionally created stock photos when the item condition is 'Used'. Some unscrupulous sellers copy images from the original manufacturers websites and use them with client listing. This can mislead the buyer when the condition is not 'New'. Another aspect of this project is to mine the Exif metadata available in the images to extract attributes like Date/Time, Location, camera type etc and correlate that with other attributes to understand the discrepancies.

Skills: Image Mining, Machine Learning



17. This project has 2 objectives.

Most of our existing models use NN as the ML algorithm. We have done some comparison of different algorithms in the past, but we haven’t done so for a long period of time. Will other techniques perform better? What are the characteristics of the different algorithms when they are applied to our problems? The first objective is to understand the performance and characteristics of the various algorithms against real world problems. The 2nd objective is to provide better interpretation of our model output. We are constantly asked – why did we not catch certain fraud / risky event. Since we are using nonlinear model – NN, we don’t have a good way to answer that today. To provide an answer to this it might involve building models in using other techniques that could help provide better interpretation. It’s closely related to the first part above.

Skills: Machine Learning Algorithms



18. Hadoop Prediction environment Platform This intern project would turn loose an intern to play within client's 900-node Hadoop cluster. Developing a simple model end-to-end within the environment, or porting more mature model logic into the map-reduce framework, this project would expose the to a world-class size hadoop cluster and the computing and predictive power that comes with it.

Skills: Distributed computing, parallel cluster environments, cross-language familiarity



19. Model Automation Learning Platform Model This intern project is to develop an end-to-end system to enable all models with self-learning automation ability with minimal manual involvement. In this system, any given model will be automatically retrained to get parameters refreshed completely once a month, and the end user could easily access the monitoring information of model performance comparison.

Skills: Basic Machine Learning, Java/python, Scripting. Some Front end and Visualization skills



20. Rules Messaging eRules

Building on learnings from qualitative research done on effectiveness of changes of messages delivered by rules when flagged for policy violation the project would involve understanding better how we are impacting our customers. The main questions I would like to get some understanding on are

1. Current capabiliites of message editing product and what are the ways to improve turnaround time from analyst perspective.

2. Ways to improve messages integration into site flows. It is suboptimal now with customers allowed to click submit multiple times, getting multiple warnings and no clear next steps etc.

3. Measure customer activity metrics test vs control (or pre/post).

Skills: Data Mining, Product Dev, User research, business analytics



21. Generic Service Stub server Checkout/Payments Build a generic stub server that can be used to stub out any service. MMP has a lot of chained services dependencies. This server can be used to stub out the dependent services so each component can be tested independently.

Skills: Java



22. Data MMP

As part of Data++, one of the goals was to make data available for all the teams (trust science, pd,..) to enable the teams access to data to analyze and derive meaningful decisions. As part of this project, we will be standardizing the vocabulary around MMP and building a FAQ(Frequently asked queries) system with intelligent search/metadata management and query execution to enable these teams to easily find data. In addition, visualization would be provided to be able to pick and choose the data and view it appropriately.

Skills: NoSQL databases, Hadoop, Teradata, JQuery, “Big Data”, Lucene, Java



23. Visual Representation of Entity Resolution outcomes

CIMS (Customer Identity Management) & eBP/Resolutions Entity resolution is a process applied to client MP accounts to understand who the customer is owning accounts (customer and entity in this context are synonyms). It involves applying rules between pairs of accounts and transitivity (if A & B belong to the same customer, B & C belong to the same customer, it means A &C also belong to the same customer). Build a critical capability of the system to explain why accounts X & Y were found to belong to the same customer.

This will include:

  • improvements to ER process logging
  • function to find all rules applied and resulted in placing accounts under an entity
  • a really big plus and challenging feature: for account X & Y find a path in a graph of applied rules that connects X & Y under the same entity (customer)



24. Shipping feedback Shipping Mine feedback text for the shipping related words. Identify how much feedback is about shipping. Detect sentiment of the feedback and see how big role shipping plays in it. Cluster spikes in shipping feedback sentiments by location of buyer or seller and by time. Correlate it with the shipping performance. Develop a prototype of the system that will detect spikes of unfair shipping related feedbacks caused by external events, i.e. snowstorms or strikes.

Skills: Data mining, machine learning, databases, Java



25. How much shipping quality cost Shipping Find out how much shipping related features really cost. For example, if someone offers an item with one week delivery, how much more they can charge as compared with two weeks delivery identical item? If someone provides a return policy, how much they could charge more than sellers of identical items without return? How much more domestic sellers could charge than international ones?

Skills: Data mining, machine learning, databases, Java



26. Use of Distributed caching to improve Maestro decision performance Maestro & Limits Some of the data facts used in MDS change infrequently or with lesser frequency (e.g. seller standards level changes few times a month). A distributed caching scheme could be leveraged to improve decision response time for such data facts - immensely helpful in keeping timeout rate within threshold and predictable. Also, need a clean-up / recovery mechanism when decision evaluation turns out to be incorrect due to use of stale cache info.

Skills: Membased, MongoDB, databases, Java, BES, Rule Engines



27. Maestro Portal Maestro & Limits Portal to show decision service metadata - list of decisions, data points used, checkpoints and the correlations. Mock evaluation of decision also needs to be supported.

Skills: Spring MVC / Raptor, JQuery, PHP, Database



28. Identify Claim arrival pattern to reduce loss Maestro & Limits Data mining to predict claim arrival pattern so money could be held longer to offset losses. Existing methodologies available within client should be leveraged through close collaboration with Trust Sciences, Search Sciences, Shipping, Checkout etc teams who have done tons of work to solve problems in each respective areas.

Skills: Data mining, machine learning, databases, Java, Hadoop, Teradata



29. Signin Pool Error Analysis and Reduction TIDES Signin pool has large number of errors in CAL. Fact that the functionality works desipte this means these may not be errors and we are unnecessarily logging them as such. We should analyze each of this error and classify them into different buckets (errors, warnings, info or just not log). After that, we can make the code change to conform to the buckets. This will help the intern learning to use CAL and CAL related tools, Signin Code base and general logging strategy.

Skills: Java, Logging Strategy



30. IAF Policy Management System TIDES We need an automated way to manage Security Token Policies. Currently any change in policy requires a code rollout and consumers have to wait for the rollout for the policy change to take effect.

Skills: Java, Database, XML/XSD, WS-Security Standards



31. IAF/AuthnAuth Token Detail Console TIDES Given a token (IAF/AnA), the console should give the details of the token.

Skills: Java Joe Martin Jerry Louis Anand Bahety/Neb Pesic/Farhang Kassaei



32. Signin Code change to support better Monitoring and Alert TIDES Build a dashboard for Signin pool where we can monitor various things (errors, service calls, etc...). Building the dashboard is not the primary concern. Rather it is changing the Signin code to add meaningful events so monitoring is easy.

Skills: Java, Logging Strategy



33.Aegis/OREO/xAuth Enahancements TIDES There are various tasks in this areas and need help.

Skills: Java



34. Real time scalable Client Identity TIDES We have lot of unstructured data related to Client Identity like KG data, browser parameters, browser parameters-machine-user activity mapping etc which are today scattered across various tables. The current data storage format is not scalable for real-time analysis. Store the data is a better scalable format directly from App layer and make it available to internal and external data analysis.

Skills: HBase, NoSql



35. Shipping Labels Shipping Ability to recover from third party failures. Pre-generate labels for the next 48 hrs and use them if there are outages or glitches with the third party. Expire the labels if they are not used during this interval.

Skills: Java



36. Tracking Shipping Send item shipped and delivery confirmation notifications to mobile devices. Today there is buyer anxiety on when they would get the item. Users subscribe to the notifcations and get notified when item is shipped or "out for delivery" Java/Mobile



37. Shipping cost Shipping Ability to create dynamically (based on historical data) surface the delivery estimates for international users. Today we show delivery estimates as "varies" to all international buyers. Mine the international tracking and carrier data and create a framework that can show these estimates by country/region. When GSP launches this is a great feature to add on Java



38. Shipping (Search) Shipping Create a prototype search results widget (mobile) that can surface inventory that is closer and cheaper (time and cost actually go together) to the user Java/Mobile



39. Shipping (Eligibility) Shipping Create a eligibility rule framework for Shipping. We have many features today (site, seller, shipping history, category) and have rebuilt business logic. Consolidate a create a standard framework for enabling future shipping features Java



40. Shipping Dashboard Shipping Create a dashboard of the shipping performance (time and cost) for the seller. Today we spend a lot of time with the DW team for every new initiative for determining the target seller segments.

Skills: Java/C# or any visual tool



41. Hiring Management Tool MMP Org Tool to manage candidate pipeline, resume screening, interview panel assembly, feedback collection, decision making and interviewers measurement.

Skills: Java/CSS/HTML5/JS



42. Corticon Alternative Maestro Use an open source to import Corticon tools and create a parallel implementation. This will be used to benchmark coritcon and also serve as the backup plan in case we are unable to use corticon (price hikes by them, spending cuts by us) or as a leverage in upcoming Corticon negotiation.

Skills: Java



43. Reliable Computing Framework MMP Org A light wight Java framework for execute managed code. The framework will support notion of fail-open, fail-retry, fail-asycn recovery etc. The framework will be used to manage critical code in places such as payment or checkout to make sure revenue is not lost when a non-critical system fails, or when a task can be retried later. This is basically a very light weight version of transaction management middleware Java



44. Address Verification via "Street Identity" initiative of OIX MMP Org, TIDES Implement a prototype to interface with the "Street Identity" address verification system being built by Google and Verizon as part of OIX (open Identity Exchange). This is supposed to be a more accurate and cheaper alternative to traditional address verification services we use at client.

Skills: Java



45. Optimized allocation and routing for Shipped by or Fulfilled by client programs MMP Org GSP and eCP assumes "single warehouse" operation - and that is true for pilot. Assuming we are successful and move toward a larger Ship By client (GSP) or Fulfilled by client (eCP) program, we need to deal with Multiple warehouse, each with certain capabilities/constraints and we need to optimize the allocation of seller inventory to each warehouse. This is a project to create a prototype of model and algorithms.

Skills: Java, Machine Learning, Linear programming



46. Replace Propitiatory client SOA framework for IAF to client Open Source SOA framework TIDES IAF (the core STS service for client and Paypal) uses client SOA framework. This causes IAF to have a lot of indirect and non-ideal dependecnies to client specific components that adds to the size and complexity of IAF - and does not let us use it as a template for isolated projects. We need to replace the SOA propitiatory framework with SOA open source version (Turmorac) so that we can get rid of UN-neccessary dependencies and create a true example of a real world isolated project.

Skills: Java, Open Source



47. Create a Technical Debt Application for MMP MMP Org We need a tool that manages our technical debt portfolio and life cycle of individual debts. JIRA is not a proper tool for this. (Will: We should consider leveraging Sonar. http://www.sonarsource.org/)

Skills: Java, HTML5, CSS, JS