I am very proud of the community of people we have here. As a community, there are people all over the world solving captchas for the benefit of all of us. I’m humbled and impressed to be connected to you all in that way.

It’s important that we finish the initial scrape of data as soon as possible. We want to get the data as a “point in time”. The initial results show for the first time what are the highest case numbers in each region, and show the holes that affect the “density” of cases within a region. Some people won’t understand what I mean by density, but I have explained this previously. For DV2016 I published a series of articles that explained that in detail. You can start to read those articles here. In simple terms though if you were case number 1000 in a given region, you might think there are 999 cases in front of you. That will not be the case, since there are “holes” which are case numbers without any real case attached. These are cases that were removed before the winners were announced, either because the cases were disqualified (duplicate entries, invalid country selections etc) or the country was stopped because there were already “enough” winners from that specific country. That makes more sense when you understand the draw process, which you can read about here.

Now, as I mentioned in my last blog post I was hoping I would wake up this morning and find that AF region was completed. It is. So I am going to show you the chart from Xarthisius that shows all the cases for AF region. I will also show you the same view from DV2018 and DV2015 to show the differences. I have data going back several years, but the charts that Xarthisius has produced illustrate the data very well .

So – first of all, here is the chart for DV2019 AF region.

DV2019 AF region CEAC data

So – what does that show? Well first of all we can see there are no case numbers above 2019AF48XXX. So – if you have a number in that final range you now know you are at the back of the line. But another important finding is the holes. The holes are in blue on the graph above. You can see the holes rate starts at about 30 – 35% on average, meaning that each one thousand case numbers only has around 600 to 700 actual cases. The holes rate increases after about 20000, which happens as countries become “limited”. These countries are countries such as Ghana, Egypt, Ethiopia and DRC Congo. Remember I am talking about countries limited for SELECTEES – this does NOT necessarily mean they will be limited for visas to be issued.

Now, let’s look at the contrast with DV2018 and DV2015.

DV2018 AF region CEAC data
DV2015 AF region CEAC data

In the other charts (which are available on Xarthisius’ site along with the full data set for download) we can see that the starting holes rate in previous years was closer to 20%. That is the case for several years (DV2018, DV2017, DV2016, DV2015 and DV2014 at least). So – there has been a sudden increase in the holes rate. A sudden increase in cases disqualified before the winners were announced. That is very significant. I cannot be certain why that would be the case, but I can speculate the following possibilities:

  • Increased information leading to better techniques to disqualify duplicate entries.
  • Entries being disqualified for re-using photos from previous years (that was threatened/warned about in DV2018 but not implemented in DV2018).
  • Possible targeting of entries from certain countries known for high levels of fraud (Ghana springs to mind).

As I said, I cannot be certain what happened, but it is clear that some different technique was used this year. Now – this does not affect you if you have a case number, you got through that initial screening, but it is interesting nonetheless.

The DV2019 chart also shows less obvious country drops. For example, in the other two examples you can more clearly identify sudden drops in real case density indicating a precise point where a country stopped getting selectees. That is harder to pinpoint on the DV2019 chart. There is a reduction starting at around case number 20000, but the reduction is more of a gentle slope toward the final density which shows around 60 to 65% of the case numbers are holes by the upper case number ranges.

The DV2019 data also shows the cases that have already been scheduled, or issuances. Once we have all the data we will be able to see those issued numbers and see how many cases are on AP, or get refused. All very valuable information.

As we get the same data for other regions we will know more about the density in general. I hope you can see how valuable this data is. Obviously the CEAC data is interesting to each individual case, but by looking at all the data together we can get some valuable insights.

Thanks once again to the scrapers – keep scraping!!! If you haven’t yet scraped many numbers, please use this update as encouragement to give us a few hours of your time.