OK – we now have the full data file for all regions. Firstly a big thank you to Xarthisius who has worked hard to implement the solution, and has singlehandedly kept the scraper working as much as possible. I also want to thank everyone who participated in the effort. Over 500 people helped, and of those people, more than 30 people scraped at least 1000 case numbers.
So – what have we got for that effort? Well I am going to make the data file public so that people can see the data. It is important to understand that we will have to continue to scrape over the coming months in order to capture progress. We will want to see ongoing numbers changes to the data. However, we won’t have to check every number from now on – so there is no need for people to put so much effort in. We don’t have to continue to check case numbers already shown to be holes, and also we don’t have to check certain statuses like refused or issued. Furthermore, we will restrict the checking to only those case numbers that are current. So, if people can just come in and scrape a few numbers each day, that will be enough to capture all we need to see and for the data to remain fresh.
Now – what do we see from this data? Well, this will be a long post because I have to explain some concepts and show how it relates to our data. Some of this is basic understanding – but let’s go over it anyway – just to make sure everyone understands.
The first thing to understand is the draw process. The draw is actually regionalized. Case numbers are assigned in numeric order within each region – all starting at 1. So, there is a case number 2018AF1, and there is also 2018EU1, 2018AS1 and so on. So, we know that based on the country of chargeability, entries are allocated into one of the six regions, and each of these regions has it’s own draw process in effect. I tend to discuss the five large regions (AF, EU, AS, SA and OC). North America (2018NAXXXXX) is so small there are only 9 or 10 case numbers. If you are concerned about NA region – contact me direct.
So – let’s say you are Australian by birth. When you enter, you have the same chance of selection as everyone in the OC region. Some countries get more selectees simply because those countries have more entries – so if country A has twice as many selectees as country B, it means they had twice as many entries. Some countries have MASSIVE numbers of entries, and that is probably due to “agents” publicizing the lottery and either facilitating the entries OR even generating entries programmatically – sometimes without the knowledge of the supposed entrant, but more of that later. The draw system is random and each entry has the same chance as any other entry in the region, so if a single country submits 30% of the entries for the region as a whole, that country will get 30% of the selectees.
OK, having explained that, let’s talk about holes. A hole is when a case number has no selectee. So for example if you check the following cases 2018AF1, 2018AF2 and 2018AF3 you will see that 2018AF2 doesn’t exist. That is a hole.
Holes come from two sources – the first source is cases that are disqualified during the draw process for various reasons such as an improper photo, or duplicate entries. Secondly holes come from countries that are limited during the draw process. Let me explain that second type of hole.
If you look at EU selectees by country as announced by USCIS (on the July 2017 visa bulletin), you can see there are 5 countries that have around 4500 selectees (Uzbekistan, Ukraine, Russia, Turkey and Albania). That number (4500), is also seen in other regions – so it is obvious it is a type of artificial limit placed on the country during the draw process. I have been seeing this same phenomenon for several years – and this results in a stepped reduction in “density”. Density is the number of real cases per N number of case numbers. So – let’s imagine we took cases from 2018AF1 to 2018AF100. If there were 40 holes we would have a density of 60% in that range of 100 case numbers. Why does that happen? I can explain it this way:
For simplicity let’s ignore derivatives for a moment.
Let’s imagine a region that has only 3 countries, and instead of 4500 being the limit, let’s assume the limit is 5000.
Country A has 100k entries. Country B has 1 million entries. and Country C has 4 million entries.
The chance of being selected in that region is 1%. The numbers are picked at random, so the countries would get the following distribution. Each time country A gets a selectee, country B would have 10, and country C would get 40.
So country A would get 1000 selectees and those selectees would be spread out across the whole case number range of case numbers.
Country B should get 10,000 selectees, but because of the limit, they only get 5000. However, ALL their 5000 at in the first half of the number range.
Country C should get 40,000 selectees, but because of the limut, they only get 5000. However, ALL their 5000 would be in the first 12.5% of the total case number range.
If the case numbers went up to 50000 (which takes into account some number of disqualification holes), we would see density reduction at 6250 (country C), then at 25,000 (country B).
So – as you can see the density is very important. Why? Well in the last VB, the max case number made current went from 8200 to 10700. Looking at the data, that 2500 number increase included 1991 real cases. The rest of the numbers were holes. So – because the density decreases after 10700, to get the same number of cases (1991) would now require an increase to 2018EU14333 – that is 3633 case numbers. Now – in reality, the way the VB progress is determined is more complicated that that simple formula – but it makes the point clear. As density decreases, VB progress can increase. And, three regions (AF, AS and EU have density decreases because of draw limited countries.
By the way – this 4500 number and the concept of limiting countries duringb the draw is only loosely related to a rule you may have heard of that no single country can receive more than 7% of available visas (globally). That rule is correct and real, BUT limiting a country to 4500 selectees is NOT how that 7% rule is enforced.
OK – so now I am going to show the density charts for each region and highlight on the charts when there are country limited density drops. The following charts all express density as the number of cases per 100 case numbers.
So – from the above charts and explanation you should be able to understand that because density decreases for the three large regions, those regions can see an acceleration of VB progress in later months of the DV year. I cannot pick exactly which countries cause which density drops in all regions, other than the very clear drop in Asia region which pinpoints the limitation of Nepal and Iran (at about 2018AS7100).
By the way, the file also gives us the maximum case numbers we have found in each region. Those case numbers are:
AF – 52581
EU – 39695
AS – 13396
OC – 2500
SA – 2457
There is a chance that there might be a few cases above our max case numbers. The scraper was programmed to assume the cutoff had been reached once it had found nothing but holes for over 100 numbers. So – if you have a case number higher than this data suggests, let me know.
Xarthisius has created a tool to present all this data in a very useful format. You can download the data in csv format, get nice graphs showing the counts of all the status types for each region, and each country. It really is a nice tool!
As far as VB progress goes, as I have already detailed there is going to be a slightly increased acceleration over the numbers I had suggested in my VB progress explained post. That post was written (in early October) with a focus on EU region. Rather than the 2500 increase I saw for the March interviews, I think we will instead see something around 3500 increase (to about 14000/14400). Once the next density decrease is hit, progress can move faster still in later months. However, please read my earlier post to get a reminder about VB progress NOT being the perfect predictor for final cutoffs. That will take more analysis, and must consider response rate, issued rate, derivative growth rate and so on. EU is easier to predict because none of the countries are being restricted during the VB as they are in AF and AS region. AF and AS are NOT so easy to predict.
AF sees a number of density decreases which will each help accelerate VB progress BUT the restricted countries (Egypt and Ethiopia) make the calculation more difficult. Over the coming days we will see the next VB, and I will try and come up with predictions after that – at least in broad terms. I would like to hear from Egyptians to determine the max Egypt case numbers assigned. Egypt enjoys a high success rate and a high number of selectees, so it is possible that not all Egyptian cases will be interviewed, BUT I don’t know the max case number for Egypt so I have no way to guess where the cutoff might come.
For Asia, the travel ban is a BIG influence, BUT we really don’t know how KCC will handle the ban from this point forward NOR do we know whether the ban will stay in place. So – whilst we know that Nepal runs out of selectees by 7100, we can expect slow progress until that point. It is somewhat surprising that Nepal has so many selectees concentrated in the first 7100. That does make me think we will see a limit hit for Nepal at a lower than 7000. This is because of the 7% limit I mentioned earlier in the article. Like Egypt, Nepal enjoys a high success rate and a high number of selectees – so Nepalese cases above 6500 must be considered somewhat at risk.
So – it still remains a case of wait and see for a lot of people. I will spend some time on alaysis of the data in the coming days and will publish any findings. I hope this article has explained some things for you.
If you want more detail about the concepts discussed here, you can read these posts