Now that 2NLs are being sent, we need to update the CEAC data. Xarthisius has made more numbers available to scrape, based on the cases that are current. You can help with the scraping effort by reading the instructions at this link.

I also want to explain a bit about the scraping effort and what we will need to do.

First, a repeat of the understanding of what CEAC data means.

Between now and the end of the year the CEAC data will be constantly updated by the embassies. They when cases are waiting to be scheduled, CEAC shows “at NVC”. That status simply means the case is waiting. Once people have submitted their DS260s AND their documents AND are current (ALL THREE!), they can be scheduled. The 2Nls are sent and for a short time (a few days) we see “in Transit” as the status in CEAC. Once the embassy “receives” the file (electronically), they update the status to READY. As cases are then interviewed, the embassy updates cases to issued, refused or AP. There can also be “date updates” (with no status change) which may or may not be important. We cannot know why those date updates occurred, so my advice is always to ignore them. There is no magic number of these updates that means a good thing or a bad thing.

OK – so what can we learn from CEAC. Well obviously we learn how many visas are issued, on AP or refused. That progress is important to know when the visas will run out. We also can see derivative numbers, important for analysis, we can see embassy distribution (to infer where the applicants are charged and so on. Another big thing we can see is the number of cases scheduled for interview. That helps us understand capacity of the embassies and helps us estimate VB progress. That is because KCC will make more or less cases current based on embassy capacity across each region, and their overall processing capacity.

Now – our AMAZING scrapers (please take a look at the hall of fame to see some awesome effort) have been doing the heavy work up to now. We do have a way to pull the data in a completely automated way BUT it would be somewhat expensive to continue that way. I get a little ad revenue from this site which pays toward my hosting costs, but it isn’t enough to cover everything. So – we may need to proceed with a mix of human scraping and automated scraping. Thankfully we do not have to scrape the entire population every time. We only need to scrape the current cases not already marked as holes, issued or refused. So – if we share the workload between lots of people the work can get done quickly. Ideally we would want to refresh the data twice a month. Please feel free to comment whether human some scraping is OK with you all, or whether we instead find a way to fund automatic scraping (by donations for that purpose for example).

Hope that helps people understand what the data is used for and how everyone can help. As ever a BIG thanks to Xarthisius and to all those that have helped with the scraping.