2014-11-24 Hashing vs Indexing

hashing 끝판왕
 
 
해싱은 하나의 문자열을 원래의 것을 상징하는 더 짧은 길이의 값이나 키로 변환하는 것.
 
 해싱(hashing)이란 한마디로 말해서 많은 양의 데이터(data)들을 그보다는 작은 크기의 테이블(table) 대응(mapping)시켜 저장할 있도록 하는 일종의 데이터 관리 기법이다.
데이터들을 저장하거나 찾을 때 해쉬함수(hash function) 사용하여 일정한 시간 내에 데이터들을 효과적으로 찾을 수 있도록 해주는 것이 바로 해싱이다
 따라서 데이터들은 순차적으로 저장되는 것이 아니라 테이블 영역에 걸쳐서 고루 분포하게 되며, 저장된 데이터를 찾을 때에도 해쉬함수를 사용하면 곧바로 위치를 수가 있기 때문에 빠르게 데이터를 검색할 수가 있게 된다.
 
Hashing is a specific case of indexing:

Indexing is a general name for a process of partitioning intended at speeding up data look-ups. Indexing can partition the data set based on a value of a field or a combination of fields. It can also partition the data set based on a value of a function, called hash function, computed from the data in a field or a combination of fields. In this specific case, indexing is called data hashing.
 
인덱스란, 간단하게 말해 빠른 검색을 하기위해 사용하는 독립된 객체
 ex) 카카오톡 초성검색
인덱스의 종류는 B-tree, 해시(Hash) 등등 여러가지가 있지만 오늘은 해시(Hash)인덱스에 대하여 배워보도록 하겠다. 해시(Hash)인덱스는 검색하고자하는 값을 주면 해시 함수를 거쳐 찾고자하는 키 값이 포함된 버켓을 찾아낸다 
*버킷이란, 인덱스 각 키값과 레코드의 주소값등의 정보를 두는 공간이다.
 

 
 
What is indexing?
Indexing is a way of sorting a number of records on multiple fields. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing Binary Searches to be performed on it.
 
What is hashing?
Hashing is the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value.
 
Hash is sort of an index: it can be used to locate a record based on a key -- but it doesn't preserve any order of records. Based on hash, one can't iterate to the succeeding or preceding element. This is however, what index does (in the context of databases.)
 
짧은 해시 키를 사용하여 항목을 찾으면 원래의 값을 이용하여 찾는 것보다 더 빠르기 때문에, 해싱은 데이터베이스 내의 항목들을 색인하고 검색하는데 사용된다. 
 
예제.
key   value
7864 Abernathy, Sara 
9802 Epperdingle, Roscoe 
1990 Moore, Wilfred 
8822 Smith, David 
 
각 문자가 26개의 경우를 갖는 예측할 수 없는 값의 길이에서 찾는 것보다, 각각이 오직 9개의 경우를 갖는 네 자리 수에서 일치하는 것을 찾는 것이 더 빠르다. 
 
해싱 알고리즘을 해시 함수라고 부른다. 해싱은 빠른 속도의 데이터 검색 외에도, 전자서명을 암호화하고 복호화하는 데에도 사용된다. 
 
가장 이상적인 해싱 함수
 키 집합의 한 레코드와 버켓 주소 집합의 한 레코드가 1:1 대응. (가급적 중복제외)
 충돌이 적어야 한다.
 
2.1. 완전 해싱 (Perfect Hashing)
- 완전 해싱은 나중에 좋은 해싱 기법으로 언급될 simple uniform 해싱을 의미한다. 서로 다른 (key)값이 해싱에 의해 주소값을 할당받을 , 주소값이 절대로 겹치지 않는 이상적인 해싱을 의미한다. 물론 이런 방식은 일대일대응 이외에는 존재하지 않는 방식으로 이상적인 것이다.
2.2. 정형 해싱 (Conventional Hashing)
- 데이타 개수를 이미 알고 있어서, 데이타들이 저장될 주소 범위를 미리 데이타 개수만큼 지정해 두는 방식을 의미한다. , 필요한 메모리의 크기는 미리 측정되고 미리 할당받아야 한다.
2.3. 동적 해싱 (Dynamic Hashing)
- 정형 해싱의 문제점은 데이타를 입력하기 이전에 데이타 개수에 대한 정보가 있어서 메모리를 미리 할당받아야 한다는 점이다. 일반적으로 시간이 지남에 따라서 데이터의 양의 증가하게 되므로 잘못된 측정으로 데이터가 메모리의 범위를 넘게 되면, 메모리 크기를 잡고 다시 해싱을 해야 하는 시간적, 자원적 낭비가 일어나게 된다. 동적 해싱은 이러한 데이터의 증감에 적응하기 위해서 나타난 것으로, 동적으로 메모리의 크기를 변화시키는 해싱 기법을 의미한다.
 
 
 


결론만 먼저 말하면 아래의 표와 같다.  

단순히 급여만 비교한 것이 아니라, 생활비와 삶의 질 측면에서 종합적으로 접근한 index라는 점이 의미 있는 듯하여 스크랩하였다. 

Published Jul 18, 2017Last updated Nov 07, 2017 How Much Do Software Engineers Really Make in Each City? What is the best city in the world for a software engineer? San Francisco? New York?

From a salary standpoint, Silicon Valley is the clear winner, boasting an average income of $110,554 per year, according to data from Glassdoor. But even with six-figure salaries, many developers are finding it difficult to afford the sky-high rent in the Bay Area.

But if not the Valley, then where should developers go? It is not enough to only consider nominal income — high living costs can eat away at your earning power.

For a more complete analysis, we compared the real earnings of software engineers in 43 cities across the globe to find where they would have the most purchasing power. Real earnings were calculated as follows:

Real Earnings = Income - Taxes - Social Security - Living Costs - Rent

Quick Summary of Our Results software engineers real salary

Seattle is the clear winner, with wages close to those in Silicon Valley but with significantly lower rent costs. Also, cities in the United States ranked higher than international cities across the board, with few exceptions. Interestingly, San Jose (our proxy for Silicon Valley), still ranked 3rd of 21 cities in the U.S., whereas San Francisco is in at 19th, again, due to difference in rent.

On the East Coast, New York and Washington, D.C. fared even worse, taking the last two spots respectively. Phoenix took second, and Austin and Houston rounded out the top 5. Internationally, Tel Aviv, cities in Canada, and Berlin are our recommendations. Our data suggest you avoid London, Singapore, and China, contrary to what one might expect.

Methodology Previously, Glassdoor listed the “25 Best Paying Cities for Software Engineers” in the U.S. and calculated a “Real Adjusted Salary” by scaling salary with a cost-of-living factor. Their model ranks San Jose and San Francisco as number 2 and 3 on their list respectively, contrary to anecdotal evidence. It is worth noting that they indicate that San Jose has a higher cost of living than San Francisco, while our data will show you why it’s quite the opposite.

What does "scaling income with cost of living mean?" It is a way to put all cities with different costs of living on equal footing. Here is Glassdoor's formula:

Scaled Income = ((average cost of living)/(actual cost of living)) * Base Income

But scaling income with cost of living is not a very accurate way to compare across cities. People just want to know how much money will be in their pocket at the end of each year.

Instead of scaling income like Glassdoor, we used our real earnings formula to calculate the earnings of an average software engineer living alone in the city, and used that as a ranking metric. This leads to vastly different results.

Real Earnings = Income - Taxes - Social Security - Living Costs - Rent

Since tax takes out a significant portion of each paycheck and varies regionally, we used the same Glassdoor base-salary numbers to calculate the after-tax income for each city, and then subtracted average annual living and rent costs, based on data from Numbeo. You don’t want to move to a city with few job openings, so we chose cities with a large number of job listings, and expanded our scope globally.

U.S. Cities: Findings United States with higher salaries Cities in the United States beat out nearly all international cities, and so to simplify things, we separated the two into different rankings. We will refer to non-U.S. cities as “international” cities, from now on.

Most assume that although wages are lower abroad, lower living costs are enough to compensate. With a lower base-salary, your real earnings will be lower, but even as a ratio of living costs, wages abroad typically cannot compete with those in the United States. For more information, see our “Affordability” section below.

Pack your bags for Seattle

Our data confirm the findings of similar reports, suggesting that Seattle is the best place for a software engineer right now. Companies like Amazon and Microsoft raise the average wage to a value close to Silicon Valley’s; however, lower rent means more money in your pocket each month.

US real earnings vs US job openings

Phoenix, Austin, and Houston all seem to be good choices as well, with real earnings above $30,000. These markets are still growing though, nearing the bottom of our list for number of job openings. Raleigh, North Carolina comes in at 6th just below Houston, but again, the job market there is relatively small.

See our mobile, software, and web app developers in Seattle →

Rent is more influential than cost of living

From the chart above, you may have noticed the disparity between San Jose, at 3rd, and San Francisco, at 19th. These two cities are often grouped into the same category of “Bay Area”. Why do regions with similar markets and geographic proximity have such different real earnings?

Our data suggest salary and cost of living are nearly identical, but cheaper rent in San Jose makes the difference.

US rent and cost of living

Living costs do not vary dramatically across cities, but rents do. For example, take a look at Phoenix in third with an average rent of $972 per month, whereas San Francisco in nineteenth with an average rent of $3272 per month.

Washington D.C., New York, and Boston have the largest software engineer job markets outside the Bay Area, but sky-high rents push them to the bottom of the list. Washington D.C. has the largest number of job openings in our analysis, but below average salary and high costs make it the worst choice among large U.S. cities.

International Cities: Findings Check out Tel Aviv, and Don’t Forget Canada Beyond the cities in the U.S., we wanted to know more about international cities as well. We chose the cities mentioned in similar reports as either established or “emerging” tech hubs and ran the numbers for them. Keep in mind that nearly all the cities listed below do not outnumber their U.S. counterparts in both real earnings and number of job openings.

best international cities for software engineers

Oslo tops the list in terms of earning power, but we did not mention it earlier because the job market is the smallest among the 43 cities we analyzed in this report. Compare its 106 job offerings to 22,554 in New York City.

Your best bet is Tel Aviv. It is a fairly mature tech hub, with the second highest real earnings and a reasonably sized tech market. Furthermore, there are tax breaks available for new immigrants to the country, which are not included in our tax calculation.

Canada makes a good showing as well. Toronto, Montreal, and Vancouver take spots 3, 4, and 5 on our list for real earnings. Also, the job markets for all three cities exceed Tel Aviv. Berlin, often mentioned as a strong European tech hub, is not far behind in 6th.

Bangalore is worth a mention as well. It has one of the most software engineer job openings in the world, exceeding San Francisco and San Jose, and only inferior to New York and Washington D.C. In contrast, it has the lowest average pre-tax income in U.S. dollar terms for all the cities we considered.

Costs are low abroad, but so are salaries

In most cases, the living costs internationally are much lower than in the United States, but not so low that they compensate for the drop in salary.

top international cities for software engineers

London’s real earnings is below-average internationally, and in turn, much lower than cities in the United States, according to our data. Another city that may surprise you is Beijing. Many people are excited about the technological innovation there, and local startups have preferential access to a huge, yet unique, domestic market. However, our data suggest that, in strictly financial terms, there are better options.

Singapore and Hong Kong’s real earnings values are near zero. Warsaw and Moscow’s are even lower, to the point that we calculate a negative value for real earnings. This means that if you are a software engineer making an average salary in one of these three cities, you cannot afford to rent a single apartment in the city-center.

Other Factors to Consider Affordability Lower wages internationally usually means lower earnings, even with lower living costs. Perhaps it is more important to consider income as a ratio to expenses. We calculated an “affordability” ratio for international cities, which is after-tax income divided by expenses, then indexed it to San Francisco. Thus, the “affordability” of San Francisco is defined as 100, and a value of 150 means the city in question is 50% more “affordable” than San Francisco.

Income Multiple = (After-tax Income/Expenses) Index Value of City A = 100 * (Income Multiple of City A)/(Income Multiple of SF)

most affordable cities for developers

For the most part, this gives a similar ranking to real earnings. More interesting is the placement of San Francisco. While San Francisco is not particularly affordable (most U.S. cities have values above 100), about half of international cities have a lower ratio. So even considering income as a ratio to expenses, international cities struggle to compete.

Hire front-end developers in San Francisco with CodementorX

Quality of Life: To give some non-financial context for each city, we cite Numbeo’s Quality of Life Index, which is a weighted index based on several factors:

cost of living and purchasing power affordability of housing, pollution (including air, water, etc.) crime rates, health system quality, and traffic (commute times) This number is not indexed so it is just a means for comparison. Since there is less of a stark contrast between U.S. and international cities, we included all global cities below except for Baltimore and Detroit, which did not have sufficient data to calculate the index.

cities for engineers by quality of life

Many international cities exceed the largest cities in the U.S. Consider, for example, that 13 of 22 international cities exceed New York. Beside Melbourne, 10 of the top 11 spots are all smaller American cities. Take these results as you will; some have mentioned it might weigh pollution too heavily.

Caveats: Our model assumes that you are a software engineer living alone in the city with an average software engineer salary. Although your personal “real earnings” may vary widely depending on job role, marital status, and living situation, we find it reasonable to use this as a proxy. For example, we assume that if a software engineer is paid more in New York, then a web developer will also be paid higher in New York. This is simply a way for us to compare “apples to apples” across cities.

Secondly, we are not tax professionals, and it is possible we made a mistake (respect for you accountants out there— international tax code is quite a doozy). We assume you are a resident of the country for the full-tax year. To get the real earnings, we deducted income tax, local social security payments and mandatory insurance contributions. We assume you are not being doubly taxed by your home country if you are not a citizen of the country in question.

Finally, inherent biases may arise from our data sets. Salary and job openings data are from Glassdoor, cost of living and rent is from Numbeo, and tax calculations are from the IRS and KPMG. Notably, there were less salary and job data for international cities, so in some cases, we used other sources. For example, we used Chinese websites to find data for Beijing and Shanghai. Also, it is possible there are less job listings for international cities simply because Glassdoor is based in the United States.

The raw data we used can be found here.

Conclusion Here, it behooves us to say that there are many other factors for software engineers to consider when choosing a city to live in, but we hope this may be a starting point for your decision. Sometimes the cities perceived to be the most “exciting” or to have the most “opportunity” are in fact the worst choices in terms of real earnings.

From a financial standpoint, it is easier to draw conclusions. Overall, the United States is the best country to live in for a software engineer. Consider Seattle, and avoid New York and D.C. Pick San Jose over San Francisco, and do not forget about Phoenix, Austin, Houston, and Raleigh. If you must go abroad, look at Tel Aviv, Canada, and Berlin, but avoid London, Singapore, and China.

Finally, please let us know what you think in the comments below!

+ Recent posts